Cache repeated context to reduce costs and latency. Pay 75% less for cached tokens.
Send a prompt with your context (system prompt, documents, etc.)
We cache the processed context for future requests
Identical prefixes hit the cache, saving tokens and time
Prompt caching is enabled by default. When you send requests with identical prefixes, the cached portion is charged at a discounted rate.
# First request - full price for all tokens response1 = client.chat.completions.create( model="mythic-4", messages=[ {"role": "system", "content": """You are an expert legal assistant. Here is the full text of the contract: [10,000 words...]"""}, {"role": "user", "content": "What are the termination clauses?"} ] ) # Second request - same prefix is cached at 75% discount! response2 = client.chat.completions.create( model="mythic-4", messages=[ {"role": "system", "content": """You are an expert legal assistant. Here is the full text of the contract: [10,000 words...]"""}, {"role": "user", "content": "Are there any liability limitations?"} ] ) # Check usage print(response2.usage.prompt_tokens_details.cached_tokens) # 12000 (these were charged at 75% discount)
Check usage.prompt_tokens_details.cached_tokens in the response to see how many tokens were served from cache.
Include large documents in system prompt, ask multiple questions.
Save 70%+ on follow-up questionsCache repository context for multiple code generation requests.
Save 60%+ on token costsConversation history is automatically cached between turns.
Save 50%+ per conversationCache example-heavy prompts used across many requests.
Save 80%+ on examples| Token Type | Price per 1M tokens | Discount |
|---|---|---|
| Regular Input Tokens | $2.50 | - |
| Cached Input Tokens | $0.625 | 75% off |
| Output Tokens | $10.00 | - |
Put static content (system prompts, documents, examples) at the beginning. Put dynamic content (user query) at the end. This maximizes the cacheable prefix.
# ✅ Good: Static content first (cacheable) messages = [ {"role": "system", "content": "[Long static instructions...]"}, {"role": "user", "content": "[Examples...]"}, {"role": "assistant", "content": "[Example responses...]"}, {"role": "user", "content": user_query} # Dynamic at end ] # ❌ Bad: Dynamic content breaks cache prefix messages = [ {"role": "system", "content": f"Today is {date}. You are..."}, # Dynamic! ... ]