Understand how text is converted to tokens, manage context windows, and optimize for cost efficiency.
Tokens are the basic units that our models process. They can be words, parts of words, or characters. On average, 1 token β 4 characters in English, or about 0.75 words.
The context window is the maximum number of tokens a model can process in a single request, including both input and output.
| Model | Context Window | Max Output | Visual |
|---|---|---|---|
| mythic-4 | 128,000 tokens | 16,384 tokens | |
| mythic-4-mini | 128,000 tokens | 16,384 tokens | |
| mythic-3.5-turbo | 16,000 tokens | 4,096 tokens | |
| mythic-embed | 8,192 tokens | N/A |
import tiktoken # Load the encoding for our models encoding = tiktoken.get_encoding("cl100k_base") # Count tokens in a string text = "Hello, how are you doing today?" tokens = encoding.encode(text) print(f"Token count: {len(tokens)}") # Output: 8 # Decode tokens back to text decoded = encoding.decode(tokens) print(decoded) # "Hello, how are you doing today?" # Count tokens for chat messages def count_chat_tokens(messages): num_tokens = 0 for message in messages: num_tokens += 4 # overhead per message for key, value in message.items(): num_tokens += len(encoding.encode(value)) num_tokens += 2 # priming return num_tokens
When your content exceeds the context window, consider these strategies:
Split long documents into overlapping chunks, process separately, then combine results.
Create summaries of earlier content and include those instead of full text.
Use embeddings to retrieve only the most relevant chunks for each query.
Store key facts externally and inject them as needed for each request.