Cache repeated context for faster responses and significant cost savings. Pay once to cache, reuse at a fraction of the cost.
Static context
Stored for reuse
Skip processing
When you send a request, we check if the prompt prefix matches a recent request. If it does, we skip reprocessing those tokens — you get faster responses and pay less.
Caching is automatic for prompts over 1,024 tokens. Structure your prompts with static content first:
# Long system prompt gets cached automatically
system_prompt = """You are a helpful assistant for Acme Corp.
Company background: [... 5,000 tokens of context ...]
Product catalog: [... 3,000 tokens ...]
Support policies: [... 2,000 tokens ...]
"""
# First request: Full processing, creates cache
response1 = client.chat.completions.create(
model="mythic-4",
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": "What's your return policy?"}
]
)
# Second request: Cache hit! Faster and cheaper
response2 = client.chat.completions.create(
model="mythic-4",
messages=[
{"role": "system", "content": system_prompt}, # Same prefix
{"role": "user", "content": "How do I track my order?"}
]
)
Cache system prompts and company context across user sessions.
Cache embedded documents for repeated questions about the same content.
Cache codebase context for faster code completion and review.
Process multiple items with the same instructions efficiently.
Enable automatic caching in your applications today.
Learn More →