🎯 Predicted Outputs

Speed up completions by providing expected output. When you know what most of the response will look like, give the model a head start.

Faster for code edits

80%

Fewer output tokens

60%

Cost reduction

How It Works

When you provide a prediction, the model uses speculative decoding to verify and extend your prediction rather than generating every token from scratch. Matching tokens are nearly free.

Token Generation Comparison

Without

500 tokens generated

With Prediction

400 tokens predicted

100 new

Predicted (fast/cheap)

Generated (normal cost)

Example: Code Refactoring

The most common use case is code edits where most of the file stays the same:

Python

from mythicdot import MythicDot

client = MythicDot()

# Original code that needs a small change
original_code = """
def calculate_total(items):
    total = 0
    for item in items:
        total += item.price
    return total
"""

response = client.chat.completions.create(
    model="mythic-4",
    messages=[
        {
            "role": "user",
            "content": f"Add tax calculation (8%) to this function:\n{original_code}"
        }
    ],
    prediction={
        "type": "content",
        "content": original_code  # Predict most of output matches input
    }
)

# Model only generates the diff - 5x faster!
print(response.choices[0].message.content)
                

Best Use Cases

🔧

Code Refactoring

Small edits to large files - variable renames, function modifications, style fixes

📝

Document Editing

Grammar fixes, formatting changes, or small updates to existing text

🔄

Template Filling

Structured outputs where the template is known but values change

🌐

Translation Updates

Updating translations where most content stays the same

💡 Tips for Best Results

Predict at least 50% of the expected output for significant speedups
Works best when the prediction is close to the actual output
Use for editing tasks, not creative generation
Combine with streaming for the best user experience
Mismatched predictions are handled gracefully - no errors, just normal generation

Speed Up Your Edits

Give the model a head start on predictable outputs.

Latency Guide → Streaming →