Rate Limits

Understand usage tiers, request limits, and how to optimize your API usage for the best performance.

5

Usage Tiers

TPM

Tokens Per Minute

RPM

Requests Per Minute

RPD

Requests Per Day

Usage Tiers

Rate limits are determined by your usage tier, which is based on your spending history and account age. You can view your current tier in the dashboard.

Tier Qualification Usage Limit RPM TPM
Free New accounts $100 / month 20 40,000
Tier 1 $5 paid $100 / month 500 200,000
Tier 2 $50 paid + 7 days $500 / month 2,000 1,000,000
Tier 3 $100 paid + 14 days $2,000 / month 5,000 5,000,000
Tier 4 $500 paid + 30 days $50,000 / month 10,000 30,000,000

💡 Need Higher Limits?

Enterprise customers can request custom rate limits. Contact our sales team to discuss your requirements.

Model-Specific Limits

Different models have different rate limits based on their compute requirements.

🚀 Mythic-4

  • Max context128K tokens
  • Max output16K tokens
  • RPM (Tier 1)500
  • TPM (Tier 1)200K

âš¡ Mythic-4-mini

  • Max context128K tokens
  • Max output16K tokens
  • RPM (Tier 1)2,000
  • TPM (Tier 1)500K

🔮 Mythic-4-vision

  • Max context128K tokens
  • Max images10 per request
  • RPM (Tier 1)300
  • TPM (Tier 1)150K

📊 Mythic-embed-3

  • Max input8,191 tokens
  • Batch size2,048 inputs
  • RPM (Tier 1)3,000
  • TPM (Tier 1)1M

Rate Limit Headers

Every API response includes headers to help you track your usage:

x-ratelimit-limit-requests
Max requests allowed in the current window
x-ratelimit-limit-tokens
Max tokens allowed in the current window
x-ratelimit-remaining-requests
Requests remaining in the current window
x-ratelimit-remaining-tokens
Tokens remaining in the current window
x-ratelimit-reset-requests
Time until request limit resets (seconds)
x-ratelimit-reset-tokens
Time until token limit resets (seconds)

Handling Rate Limits

When you exceed rate limits, the API returns a 429 status code. Implement exponential backoff:

Python - Retry with Backoff
import time
from mythicdot import MythicDot, RateLimitError

client = MythicDot()

def make_request_with_retry(prompt, max_retries=5):
    for attempt in range(max_retries):
        try:
            return client.chat.completions.create(
                model="mythic-4",
                messages=[{"role": "user", "content": prompt}]
            )
        except RateLimitError as e:
            if attempt == max_retries - 1:
                raise
            
            # Exponential backoff: 1s, 2s, 4s, 8s, 16s
            wait_time = 2 ** attempt
            print(f"Rate limited. Waiting {wait_time}s...")
            time.sleep(wait_time)

# Or use the built-in retry (recommended)
client = MythicDot(max_retries=5)

Best Practices

âš¡ Use Batching

Group multiple requests together using batch endpoints to reduce RPM usage and save 50% on costs.

📉 Implement Backoff

Use exponential backoff with jitter to handle 429 errors gracefully without hammering the API.

🔄 Cache Responses

Cache responses for identical requests to avoid unnecessary API calls and reduce token usage.

📊 Monitor Usage

Track rate limit headers and monitor your dashboard to stay within limits and plan scaling.