Streaming - MythicDot.AI

Machine learning is a subset of artificial intelligence that enables systems to learn and improve from experience. It uses algorithms to identify patterns in data, make decisions, and improve over time without being explicitly programmed.

~50ms

Time to First Token

85 tok/s

Token Rate

Real-time

Response Display

Why Use Streaming?

⚡

Lower Latency

See output immediately instead of waiting for the full response

💬

Better UX

Users can read as content appears, like a chat conversation

🚫

Early Cancellation

Stop generation early if output isn't what you need

Quick Start

Enable streaming by setting stream: true in your request:

# Stream responses with Python
from mythicdot import MythicDot

client = MythicDot()

stream = client.chat.completions.create(
    model="mythic-4",
    messages=[{"role": "user", "content": "Explain quantum computing"}],
    stream=True
)

# Print each chunk as it arrives
for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

💡 Pro Tip

Use flush=True in Python to ensure output appears immediately without buffering.

Event Format

Streaming uses Server-Sent Events (SSE). Each event contains a chunk of the response:

Stream Events

data: {"id":"...","choices":[{"delta":{"content":"Hello"}}]}

Content chunks arrive as delta objects with partial content.

data: {"id":"...","choices":[{"delta":{},"finish_reason":"stop"}]}

Final chunk includes finish_reason indicating why generation stopped.

data: [DONE]

Special marker indicating the stream has ended.

                    Raw SSE Response
                
                    data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1234567890,"model":"mythic-4","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1234567890,"model":"mythic-4","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1234567890,"model":"mythic-4","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1234567890,"model":"mythic-4","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Async Streaming

Use async streaming for better performance in web applications:

                    Python (async)
                

                    import asyncio
from mythicdot import AsyncMythicDot

async def main():
    client = AsyncMythicDot()
    
    stream = await client.chat.completions.create(
        model="mythic-4",
        messages=[{"role": "user", "content": "Tell me a story"}],
        stream=True
    )
    
    async for chunk in stream:
        if chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="")

asyncio.run(main())
                

Streaming vs Non-Streaming

Feature	Streaming	Non-Streaming
Time to first token	~50ms ✓	Full response time
Progressive display	✓ Yes	✗ No
Early cancellation	✓ Yes	✗ No
Complexity	More complex	Simple
Usage tracking	Available on final chunk	In response

Stream Options

Get token usage statistics with streaming:

                    Python
                

                    # Include usage stats in stream
stream = client.chat.completions.create(
    model="mythic-4",
    messages=[{"role": "user", "content": "Hello"}],
    stream=True,
    stream_options={"include_usage": True# Usage is included in the final chunk
for chunk in stream:
    if chunk.usage:
        print(f"Tokens: {chunk.usage.total_tokens}")