Best Practices

Expert recommendations for building reliable, secure, and efficient AI applications in production.

🔒 Security

🔑

Secure API Key Management

Essential
  • Never embed API keys in client-side code or version control
  • Use environment variables or secure vaults (AWS Secrets Manager, Azure Key Vault)
  • Rotate keys periodically and immediately after any potential exposure
  • Use separate keys for development, staging, and production
# Good: Load from environment
api_key = os.environ.get("MYTHIC_API_KEY")

# Bad: Hardcoded key
api_key = "sk-abc123..." # Never do this!
🛡️

Input Validation & Sanitization

Essential
  • Validate and sanitize all user inputs before sending to the API
  • Set maximum input length limits to prevent abuse
  • Use content moderation for user-generated prompts
  • Implement rate limiting on your application layer

⚠️ Prompt Injection Warning

Never directly concatenate user input into system prompts. Always validate and sanitize inputs, and consider using structured formats like JSON for user data.

⚡ Reliability

🔄

Implement Retry Logic

Essential

Handle transient errors gracefully with exponential backoff:

import time
import random

def retry_with_backoff(func, max_retries=3):
for attempt in range(max_retries):
try:
return func()
except RateLimitError:
wait = (2 ** attempt) + random.random()
time.sleep(wait)
📊

Monitor & Log Everything

Recommended
  • Log all API requests with request IDs for debugging
  • Track latency, error rates, and token usage
  • Set up alerts for anomalies (latency spikes, error rate increases)
  • Use structured logging for easier analysis

🚀 Performance

🌊

Use Streaming for Long Responses

Recommended

Streaming reduces time-to-first-token and improves perceived performance:

  • Enable streaming for user-facing applications
  • Process tokens as they arrive for real-time display
  • Handle stream interruptions gracefully

Batch Requests When Possible

Advanced
  • Use the Batch API for non-time-sensitive workloads
  • Group related embeddings requests
  • Process multiple items in parallel (respecting rate limits)

💰 Cost Optimization

📦

Use Context Caching

Recommended
  • Cache system prompts and common context (75% token savings)
  • Structure prompts with static content at the beginning
  • Use appropriate TTL for your use case
🎯

Choose the Right Model

Essential
  • Use mythic-4-mini for simple tasks (10x cheaper)
  • Reserve mythic-4 for complex reasoning
  • Use mythic-embed-3-small when full precision isn't needed
  • Consider fine-tuning for specialized, high-volume use cases

📝 Prompt Engineering

✍️

Write Clear, Specific Prompts

Essential
  • Be explicit about the desired output format
  • Provide examples (few-shot prompting) for complex tasks
  • Break complex tasks into steps
  • Use JSON mode for structured outputs

🚀 Production Readiness Checklist

API keys stored securely (not in code)
Retry logic with exponential backoff
Error handling for all API responses
Logging and monitoring configured
Rate limiting on application layer
Input validation and sanitization
Content moderation enabled
Streaming implemented for UX