Get responses in real-time as they're generated. Reduce latency and build responsive user experiences.
See output immediately instead of waiting for the full response
Users can read as content appears, like a chat conversation
Stop generation early if output isn't what you need
Enable streaming by setting stream: true in your request:
# Stream responses with Python from mythicdot import MythicDot client = MythicDot() stream = client.chat.completions.create( model="mythic-4", messages=[{"role": "user", "content": "Explain quantum computing"}], stream=True ) # Print each chunk as it arrives for chunk in stream: if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="", flush=True)
Use flush=True in Python to ensure output appears immediately without buffering.
Streaming uses Server-Sent Events (SSE). Each event contains a chunk of the response:
Content chunks arrive as delta objects with partial content.
Final chunk includes finish_reason indicating why generation stopped.
Special marker indicating the stream has ended.
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1234567890,"model":"mythic-4","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1234567890,"model":"mythic-4","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1234567890,"model":"mythic-4","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1234567890,"model":"mythic-4","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]
Use async streaming for better performance in web applications:
import asyncio
from mythicdot import AsyncMythicDot
async def main():
client = AsyncMythicDot()
stream = await client.chat.completions.create(
model="mythic-4",
messages=[{"role": "user", "content": "Tell me a story"}],
stream=True
)
async for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
asyncio.run(main())
| Feature | Streaming | Non-Streaming |
|---|---|---|
| Time to first token | ~50ms ✓ | Full response time |
| Progressive display | ✓ Yes | ✗ No |
| Early cancellation | ✓ Yes | ✗ No |
| Complexity | More complex | Simple |
| Usage tracking | Available on final chunk | In response |
Get token usage statistics with streaming:
# Include usage stats in stream stream = client.chat.completions.create( model="mythic-4", messages=[{"role": "user", "content": "Hello"}], stream=True, stream_options={"include_usage": True# Usage is included in the final chunk for chunk in stream: if chunk.usage: print(f"Tokens: {chunk.usage.total_tokens}")