Context Caching - MythicDot.AI

How It Works

1

First Request

Send a prompt with your context (system prompt, documents, etc.)

2

Automatic Caching

We cache the processed context for future requests

3

Subsequent Requests

Identical prefixes hit the cache, saving tokens and time

Automatic Prompt Caching

Prompt caching is enabled by default. When you send requests with identical prefixes, the cached portion is charged at a discounted rate.

Python

                    # First request - full price for all tokens
response1 = client.chat.completions.create(
    model="mythic-4",
    messages=[
        {"role": "system", "content": """You are an expert legal assistant.
        Here is the full text of the contract: [10,000 words...]"""},
        {"role": "user", "content": "What are the termination clauses?"}
    ]
)

# Second request - same prefix is cached at 75% discount!
response2 = client.chat.completions.create(
    model="mythic-4",
    messages=[
        {"role": "system", "content": """You are an expert legal assistant.
        Here is the full text of the contract: [10,000 words...]"""},
        {"role": "user", "content": "Are there any liability limitations?"}
    ]
)

# Check usage
print(response2.usage.prompt_tokens_details.cached_tokens)
# 12000  (these were charged at 75% discount)
                

💡 Cache Hits

Check usage.prompt_tokens_details.cached_tokens in the response to see how many tokens were served from cache.

Before & After

Without Caching

With Caching

10 requests × 15K tokens each

150K tokens

12K cached + 3K new × 10

45K effective

Best Use Cases

📚 Document Q&A

Include large documents in system prompt, ask multiple questions.

Save 70%+ on follow-up questions

💻 Code Assistants

Cache repository context for multiple code generation requests.

Save 60%+ on token costs

🤖 Multi-turn Chat

Conversation history is automatically cached between turns.

Save 50%+ per conversation

📋 Few-shot Prompts

Cache example-heavy prompts used across many requests.

Save 80%+ on examples

Pricing

Token Type	Price per 1M tokens	Discount
Regular Input Tokens	$2.50	-
Cached Input Tokens	$0.625	75% off
Output Tokens	$10.00	-

Cache Requirements

Minimum 1,024 tokens: Prompts must be at least 1,024 tokens to be cached
Exact prefix match: Cache is keyed on the exact token sequence
Same model: Cache is specific to each model version
TTL of 5-10 minutes: Cache expires after inactivity (varies by load)
Automatic: No explicit API calls needed, caching happens automatically

Optimizing for Cache Hits

🎯 Structure Your Prompts

Put static content (system prompts, documents, examples) at the beginning. Put dynamic content (user query) at the end. This maximizes the cacheable prefix.

Optimal Prompt Structure

                    # ✅ Good: Static content first (cacheable)
messages = [
    {"role": "system", "content": "[Long static instructions...]"},
    {"role": "user", "content": "[Examples...]"},
    {"role": "assistant", "content": "[Example responses...]"},
    {"role": "user", "content": user_query}  # Dynamic at end
]

# ❌ Bad: Dynamic content breaks cache prefix
messages = [
    {"role": "system", "content": f"Today is {date}. You are..."},  # Dynamic!
    ...
]