Speech API

Convert speech to text with industry-leading accuracy, and generate lifelike speech from text. Build voice-enabled applications with ease.

View Documentation Try Playground

"Hello, and welcome to MythicDot AI. Our speech API allows you to transcribe audio in real-time with incredible accuracy..."

Two Powerful APIs

🎤

Speech to Text

Transcribe audio files or real-time audio streams with state-of-the-art accuracy. Supports 50+ languages with automatic detection.

Real-time streaming transcription
Speaker diarization
Punctuation and formatting
Word-level timestamps
Custom vocabulary

🔊

Text to Speech

Generate natural, lifelike speech from text. Choose from 6 voices with different styles and personalities.

6 distinct voices
Adjustable speed and pitch
SSML support
Multiple output formats
Real-time streaming

Choose Your Voice

Six unique voices designed for different use cases.

👨

Nova

Warm, professional

👩

Alloy

Friendly, conversational

👨‍💼

Echo

Deep, authoritative

👩‍🔬

Fable

Expressive, storytelling

👨‍💻

Onyx

Clear, technical

👩‍🎤

Shimmer

Bright, energetic

50+ Languages Supported

English Spanish French German Italian Portuguese Dutch Russian Chinese Japanese Korean Arabic Hindi Turkish Polish Swedish Danish Norwegian Finnish + 30 more

Quick Start

Python - Speech to Text

                    from mythicdot import MythicDot

client = MythicDot()

# Transcribe an audio file
with open("meeting.mp3", "rb") as audio_file:
    transcript = client.audio.transcriptions.create(
        model="mythic-whisper-1",
        file=audio_file,
        response_format="verbose_json",
        timestamp_granularities=["word"]
    )

print(transcript.text)

# Access word-level timestamps
for word in transcript.words:
    print(f"{word.word} [{word.start}s - {word.end}s]")

# Real-time streaming transcription
async for chunk in client.audio.transcriptions.stream(
    model="mythic-whisper-1",
    audio_stream=microphone_stream
):
    print(chunk.text, end="", flush=True)
                

Simple Pricing

🎤 Speech to Text

$0.006

per minute of audio

All 50+ languages
Word timestamps
Speaker diarization
Custom vocabulary

🔊 Text to Speech

$0.015

per 1,000 characters

All 6 voices
SSML support
Streaming output
MP3, WAV, FLAC

Give Your App a Voice

Start building voice-enabled experiences today. Free tier includes 60 minutes of transcription per month.

Read the Docs Get Started Free