Speed up completions by providing expected output. When you know what most of the response will look like, give the model a head start.
When you provide a prediction, the model uses speculative decoding to verify and extend your prediction rather than generating every token from scratch. Matching tokens are nearly free.
The most common use case is code edits where most of the file stays the same:
from mythicdot import MythicDot
client = MythicDot()
# Original code that needs a small change
original_code = """
def calculate_total(items):
total = 0
for item in items:
total += item.price
return total
"""
response = client.chat.completions.create(
model="mythic-4",
messages=[
{
"role": "user",
"content": f"Add tax calculation (8%) to this function:\n{original_code}"
}
],
prediction={
"type": "content",
"content": original_code # Predict most of output matches input
}
)
# Model only generates the diff - 5x faster!
print(response.choices[0].message.content)
Small edits to large files - variable renames, function modifications, style fixes
Grammar fixes, formatting changes, or small updates to existing text
Structured outputs where the template is known but values change
Updating translations where most content stays the same
Give the model a head start on predictable outputs.