Speech API

Convert speech to text with industry-leading accuracy, and generate lifelike speech from text. Build voice-enabled applications with ease.

"Hello, and welcome to MythicDot AI. Our speech API allows you to transcribe audio in real-time with incredible accuracy..."

Two Powerful APIs

🎤

Speech to Text

Transcribe audio files or real-time audio streams with state-of-the-art accuracy. Supports 50+ languages with automatic detection.

  • Real-time streaming transcription
  • Speaker diarization
  • Punctuation and formatting
  • Word-level timestamps
  • Custom vocabulary
🔊

Text to Speech

Generate natural, lifelike speech from text. Choose from 6 voices with different styles and personalities.

  • 6 distinct voices
  • Adjustable speed and pitch
  • SSML support
  • Multiple output formats
  • Real-time streaming

Choose Your Voice

Six unique voices designed for different use cases.

👨

Nova

Warm, professional

👩

Alloy

Friendly, conversational

👨‍💼

Echo

Deep, authoritative

👩‍🔬

Fable

Expressive, storytelling

👨‍💻

Onyx

Clear, technical

👩‍🎤

Shimmer

Bright, energetic

50+ Languages Supported

English Spanish French German Italian Portuguese Dutch Russian Chinese Japanese Korean Arabic Hindi Turkish Polish Swedish Danish Norwegian Finnish + 30 more

Quick Start

Python - Speech to Text
from mythicdot import MythicDot

client = MythicDot()

# Transcribe an audio file
with open("meeting.mp3", "rb") as audio_file:
    transcript = client.audio.transcriptions.create(
        model="mythic-whisper-1",
        file=audio_file,
        response_format="verbose_json",
        timestamp_granularities=["word"]
    )

print(transcript.text)

# Access word-level timestamps
for word in transcript.words:
    print(f"{word.word} [{word.start}s - {word.end}s]")

# Real-time streaming transcription
async for chunk in client.audio.transcriptions.stream(
    model="mythic-whisper-1",
    audio_stream=microphone_stream
):
    print(chunk.text, end="", flush=True)

Simple Pricing

🎤 Speech to Text

$0.006
per minute of audio
  • All 50+ languages
  • Word timestamps
  • Speaker diarization
  • Custom vocabulary

🔊 Text to Speech

$0.015
per 1,000 characters
  • All 6 voices
  • SSML support
  • Streaming output
  • MP3, WAV, FLAC

Give Your App a Voice

Start building voice-enabled experiences today. Free tier includes 60 minutes of transcription per month.

Read the Docs Get Started Free