Vision API

Analyze and understand images with state-of-the-art multimodal AI. Extract text, detect objects, describe scenes, and answer questions about visual content.

View Documentation Try Playground

🏙️

city skyline skyscrapers sunset

A modern city skyline at sunset with tall glass skyscrapers reflecting the orange and pink hues of the evening sky.

What Can Vision Do?

📝

Text Extraction (OCR)

Extract text from images, documents, receipts, signs, and handwritten notes with high accuracy.

🎯

Object Detection

Identify and locate objects within images. Get bounding boxes and confidence scores.

💬

Image Q&A

Ask natural language questions about images and get detailed, contextual answers.

🏷️

Classification

Categorize images into custom categories. Content moderation, product tagging, and more.

📊

Chart Understanding

Extract data and insights from charts, graphs, and data visualizations.

📄

Document Analysis

Process invoices, forms, IDs, and structured documents. Extract key fields automatically.

Industry Applications

🛒

E-Commerce

Auto-tag products & visual search

🏥

Healthcare

Medical image analysis

🏭

Manufacturing

Quality control inspection

🏦

Finance

Document processing & ID verification

🚗

Automotive

Damage assessment

🏠

Real Estate

Property image analysis

🛡️

Security

Content moderation

♿

Accessibility

Alt-text generation

Simple Integration

                Python
            

                from mythicdot import MythicDot

client = MythicDot()

# Analyze an image from URL
response = client.chat.completions.create(
    model="mythic-4-vision",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "image_url",
                    "image_url": {"url": "https://example.com/image.jpg"}
                }
            ]
        }
    ]
)

print(response.choices[0].message.content)

# Or use base64 encoded image
import base64

with open("photo.jpg", "rb") as f:
    image_data = base64.b64encode(f.read()).decode()

response = client.chat.completions.create(
    model="mythic-4-vision",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe this image in detail"},
            {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_data}"}}
        ]
    }]
)
            

Key Features

✓

Multiple Image Support

Analyze up to 10 images in a single request

✓

High Resolution Mode

Process images up to 4096x4096 pixels

✓

URL or Base64

Accept images from URLs or base64 encoded

✓

All Image Formats

PNG, JPEG, GIF, WebP, and more

✓

Streaming Support

Stream responses for real-time UX

✓

JSON Mode

Structured output for easy parsing

Simple Pricing

Resolution	Cost per Image	Best For
Low (512px)	$0.00065	Quick classification
Standard (1024px)	$0.00255	General analysis
High (2048px)	$0.00765	Detailed extraction
Ultra (4096px)	$0.01530	Document OCR