Prompt Caching

Cache repeated context for faster responses and significant cost savings. Pay once to cache, reuse at a fraction of the cost.

50%
Cost Reduction
~85%
Faster Time to First Token
5 min
Cache TTL

How It Works

📝

System Prompt

Static context

💾

Cache

Stored for reuse

Fast Response

Skip processing

When you send a request, we check if the prompt prefix matches a recent request. If it does, we skip reprocessing those tokens — you get faster responses and pay less.

Comparison

❌ Without Caching

Slower
Input cost (10K tokens) $0.05
Time to first token ~500ms
10 requests/day $0.50/day

✓ With Caching

Faster
Input cost (10K tokens) $0.025
Time to first token ~75ms
10 requests/day $0.25/day

Usage

Caching is automatic for prompts over 1,024 tokens. Structure your prompts with static content first:

Python - Automatic Caching
# Long system prompt gets cached automatically system_prompt = """You are a helpful assistant for Acme Corp. Company background: [... 5,000 tokens of context ...] Product catalog: [... 3,000 tokens ...] Support policies: [... 2,000 tokens ...] """ # First request: Full processing, creates cache response1 = client.chat.completions.create( model="mythic-4", messages=[ {"role": "system", "content": system_prompt}, {"role": "user", "content": "What's your return policy?"} ] ) # Second request: Cache hit! Faster and cheaper response2 = client.chat.completions.create( model="mythic-4", messages=[ {"role": "system", "content": system_prompt}, # Same prefix {"role": "user", "content": "How do I track my order?"} ] )

Best Use Cases

🤖 Chatbots

Cache system prompts and company context across user sessions.

📚 Document QA

Cache embedded documents for repeated questions about the same content.

💻 Code Assistants

Cache codebase context for faster code completion and review.

🔄 Batch Processing

Process multiple items with the same instructions efficiently.

Start Saving

Enable automatic caching in your applications today.

Learn More →