Rate Limits

Understand usage tiers, request limits, and how to optimize your API usage for the best performance.

5

Usage Tiers

TPM

Tokens Per Minute

RPM

Requests Per Minute

RPD

Requests Per Day

Usage Tiers

Rate limits are determined by your usage tier, which is based on your spending history and account age. You can view your current tier in the dashboard.

Tier	Qualification	Usage Limit	RPM	TPM
Free	New accounts	$100 / month	20	40,000
Tier 1	$5 paid	$100 / month	500	200,000
Tier 2	$50 paid + 7 days	$500 / month	2,000	1,000,000
Tier 3	$100 paid + 14 days	$2,000 / month	5,000	5,000,000
Tier 4	$500 paid + 30 days	$50,000 / month	10,000	30,000,000

💡 Need Higher Limits?

Enterprise customers can request custom rate limits. Contact our sales team to discuss your requirements.

Model-Specific Limits

Different models have different rate limits based on their compute requirements.

🚀 Mythic-4

Max context128K tokens
Max output16K tokens
RPM (Tier 1)500
TPM (Tier 1)200K

⚡ Mythic-4-mini

Max context128K tokens
Max output16K tokens
RPM (Tier 1)2,000
TPM (Tier 1)500K

🔮 Mythic-4-vision

Max context128K tokens
Max images10 per request
RPM (Tier 1)300
TPM (Tier 1)150K

📊 Mythic-embed-3

Max input8,191 tokens
Batch size2,048 inputs
RPM (Tier 1)3,000
TPM (Tier 1)1M

Rate Limit Headers

Every API response includes headers to help you track your usage:

x-ratelimit-limit-requests

Max requests allowed in the current window

x-ratelimit-limit-tokens

Max tokens allowed in the current window

x-ratelimit-remaining-requests

Requests remaining in the current window

x-ratelimit-remaining-tokens

Tokens remaining in the current window

x-ratelimit-reset-requests

Time until request limit resets (seconds)

x-ratelimit-reset-tokens

Time until token limit resets (seconds)

Handling Rate Limits

When you exceed rate limits, the API returns a 429 status code. Implement exponential backoff:

Python - Retry with Backoff

                    import time
from mythicdot import MythicDot, RateLimitError

client = MythicDot()

def make_request_with_retry(prompt, max_retries=5):
    for attempt in range(max_retries):
        try:
            return client.chat.completions.create(
                model="mythic-4",
                messages=[{"role": "user", "content": prompt}]
            )
        except RateLimitError as e:
            if attempt == max_retries - 1:
                raise
            
            # Exponential backoff: 1s, 2s, 4s, 8s, 16s
            wait_time = 2 ** attempt
            print(f"Rate limited. Waiting {wait_time}s...")
            time.sleep(wait_time)

# Or use the built-in retry (recommended)
client = MythicDot(max_retries=5)
                

Best Practices

⚡ Use Batching

Group multiple requests together using batch endpoints to reduce RPM usage and save 50% on costs.

📉 Implement Backoff

Use exponential backoff with jitter to handle 429 errors gracefully without hammering the API.

🔄 Cache Responses

Cache responses for identical requests to avoid unnecessary API calls and reduce token usage.

📊 Monitor Usage

Track rate limit headers and monitor your dashboard to stay within limits and plan scaling.