🎙️ Voice + Text

Realtime API

Build voice assistants and live chat with sub-second latency. Natural conversations powered by WebSockets.

"What's the weather like today?"

~320ms response time

~300ms

Average latency

Full

Duplex streaming

Voice options

100+

Languages

Use Cases

📞

Voice Assistants

Build Siri-like experiences with natural, responsive voice interaction.

🎧

Customer Support

24/7 voice support agents that handle calls naturally.

🌍

Live Translation

Real-time speech translation for multilingual conversations.

🎮

Game NPCs

Voice-enabled game characters with dynamic dialogue.

📚

Language Learning

Interactive tutors for pronunciation and conversation practice.

♿

Accessibility

Voice interfaces for users who can't use traditional inputs.

Architecture

👤

User Audio

→

🔌
WebSocket

⟷

🤖
Realtime API

→

🔊

AI Audio

Quick Start

JavaScript - Browser

                    // Connect to Realtime API
const ws = new WebSocket("wss://api.mythicdot.ai/v1/realtime", [
    "realtime",
    `mythicdot-api-key.${API_KEY}`
]);

ws.onopen = () => {
    // Configure session
    ws.send(JSON.stringify({
        type: "session.update",
        session: {
            modalities: ["text", "audio"],
            voice: "nova",
            instructions: "You are a helpful voice assistant."
        }
    }));
};

// Send audio from microphone
ws.send(JSON.stringify({
    type: "input_audio_buffer.append",
    audio: base64AudioChunk
}));

// Receive audio responses
ws.onmessage = (event) => {
    const data = JSON.parse(event.data);
    if (data.type === "response.audio.delta") {
        playAudio(data.delta);  // Stream to speakers
    }
};
                

Available Voices

✨

Nova

Warm, friendly

🎯

Alloy

Neutral, clear

🌊

Echo

Deep, resonant

⚡

Fable

Expressive, dynamic

🔮

Onyx

Rich, authoritative

🌸

Shimmer

Bright, cheerful

Event Types

session.created

Connection established, session ready

input_audio_buffer.speech_started

User started speaking (VAD detected)

input_audio_buffer.speech_stopped

User stopped speaking

response.created

AI response generation started

response.audio.delta

Audio chunk ready to play

response.text.delta

Transcript text chunk

response.done

Response complete

error

Error occurred during session

Best Practices

💡 Tips for Low Latency

Use a region close to your users. Enable Voice Activity Detection (VAD) to automatically detect when users stop speaking. Buffer audio in 20-100ms chunks for optimal streaming. Use the response.cancel event if the user interrupts.