One model that understands text, images, audio, and video. Build applications that see, hear, and reason across all modalities.
Natural language understanding and generation
Visual understanding and image generation
Speech recognition and synthesis
Video understanding and analysis
Unified Multi-Modal
from mythicdot import MythicDot
client = MythicDot()
# Analyze an image with text
response = client.chat.completions.create(
model="mythic-4-vision",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{
"type": "image_url",
"image_url": {"url": "https://example.com/photo.jpg"}
}
]
}]
)
print(response.choices[0].message.content)
Analyze product images, extract details, and generate descriptions automatically.
Transcribe and analyze customer calls, extract sentiment and action items.
Extract data from charts, graphs, and infographics in documents.
Analyze video content, generate summaries, and extract key moments.
Generate audio descriptions of images for visually impaired users.
Search products or content using images instead of text queries.
Explore our APIs for each modality.