Streaming

Get responses in real-time as they're generated. Reduce latency and build responsive user experiences.

Machine learning is a subset of artificial intelligence that enables systems to learn and improve from experience. It uses algorithms to identify patterns in data, make decisions, and improve over time without being explicitly programmed.
~50ms
Time to First Token
85 tok/s
Token Rate
Real-time
Response Display

Why Use Streaming?

âš¡

Lower Latency

See output immediately instead of waiting for the full response

💬

Better UX

Users can read as content appears, like a chat conversation

🚫

Early Cancellation

Stop generation early if output isn't what you need

Quick Start

Enable streaming by setting stream: true in your request:

# Stream responses with Python
from mythicdot import MythicDot

client = MythicDot()

stream = client.chat.completions.create(
    model="mythic-4",
    messages=[{"role": "user", "content": "Explain quantum computing"}],
    stream=True
)

# Print each chunk as it arrives
for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

💡 Pro Tip

Use flush=True in Python to ensure output appears immediately without buffering.

Event Format

Streaming uses Server-Sent Events (SSE). Each event contains a chunk of the response:

Stream Events
data: {"id":"...","choices":[{"delta":{"content":"Hello"}}]}

Content chunks arrive as delta objects with partial content.

data: {"id":"...","choices":[{"delta":{},"finish_reason":"stop"}]}

Final chunk includes finish_reason indicating why generation stopped.

data: [DONE]

Special marker indicating the stream has ended.

Raw SSE Response
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1234567890,"model":"mythic-4","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1234567890,"model":"mythic-4","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1234567890,"model":"mythic-4","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1234567890,"model":"mythic-4","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Async Streaming

Use async streaming for better performance in web applications:

Python (async)
import asyncio
from mythicdot import AsyncMythicDot

async def main():
    client = AsyncMythicDot()
    
    stream = await client.chat.completions.create(
        model="mythic-4",
        messages=[{"role": "user", "content": "Tell me a story"}],
        stream=True
    )
    
    async for chunk in stream:
        if chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="")

asyncio.run(main())

Streaming vs Non-Streaming

Feature Streaming Non-Streaming
Time to first token ~50ms ✓ Full response time
Progressive display ✓ Yes ✗ No
Early cancellation ✓ Yes ✗ No
Complexity More complex Simple
Usage tracking Available on final chunk In response

Stream Options

Get token usage statistics with streaming:

Python
# Include usage stats in stream
stream = client.chat.completions.create(
    model="mythic-4",
    messages=[{"role": "user", "content": "Hello"}],
    stream=True,
    stream_options={"include_usage": True# Usage is included in the final chunk
for chunk in stream:
    if chunk.usage:
        print(f"Tokens: {chunk.usage.total_tokens}")