🎙️ Voice + Text

Realtime API

Build voice assistants and live chat with sub-second latency. Natural conversations powered by WebSockets.

"What's the weather like today?"
~320ms response time
~300ms
Average latency
Full
Duplex streaming
6
Voice options
100+
Languages

Use Cases

📞

Voice Assistants

Build Siri-like experiences with natural, responsive voice interaction.

🎧

Customer Support

24/7 voice support agents that handle calls naturally.

🌍

Live Translation

Real-time speech translation for multilingual conversations.

🎮

Game NPCs

Voice-enabled game characters with dynamic dialogue.

📚

Language Learning

Interactive tutors for pronunciation and conversation practice.

Accessibility

Voice interfaces for users who can't use traditional inputs.

Architecture

👤
User Audio
🔌
WebSocket
🤖
Realtime API
🔊
AI Audio

Quick Start

JavaScript - Browser
// Connect to Realtime API
const ws = new WebSocket("wss://api.mythicdot.ai/v1/realtime", [
    "realtime",
    `mythicdot-api-key.${API_KEY}`
]);

ws.onopen = () => {
    // Configure session
    ws.send(JSON.stringify({
        type: "session.update",
        session: {
            modalities: ["text", "audio"],
            voice: "nova",
            instructions: "You are a helpful voice assistant."
        }
    }));
};

// Send audio from microphone
ws.send(JSON.stringify({
    type: "input_audio_buffer.append",
    audio: base64AudioChunk
}));

// Receive audio responses
ws.onmessage = (event) => {
    const data = JSON.parse(event.data);
    if (data.type === "response.audio.delta") {
        playAudio(data.delta);  // Stream to speakers
    }
};

Available Voices

Nova
Warm, friendly
🎯
Alloy
Neutral, clear
🌊
Echo
Deep, resonant
Fable
Expressive, dynamic
🔮
Onyx
Rich, authoritative
🌸
Shimmer
Bright, cheerful

Event Types

session.created
Connection established, session ready
input_audio_buffer.speech_started
User started speaking (VAD detected)
input_audio_buffer.speech_stopped
User stopped speaking
response.created
AI response generation started
response.audio.delta
Audio chunk ready to play
response.text.delta
Transcript text chunk
response.done
Response complete
error
Error occurred during session

Best Practices

💡 Tips for Low Latency

Use a region close to your users. Enable Voice Activity Detection (VAD) to automatically detect when users stop speaking. Buffer audio in 20-100ms chunks for optimal streaming. Use the response.cancel event if the user interrupts.