Tokenization

Understand how text is converted to tokens, manage context windows, and optimize for cost efficiency.

What Are Tokens?

Tokens are the basic units that our models process. They can be words, parts of words, or characters. On average, 1 token β‰ˆ 4 characters in English, or about 0.75 words.

πŸ“
~4
Characters per token (average)
πŸ“–
~750
Words per 1,000 tokens
πŸ“„
~2
Pages per 1,000 tokens

Interactive Tokenizer

Hello , Myth ic Dot . AI ! How are you today ?
13 tokens Β· 38 characters Β· 0.34 tokens per character

Context Window Limits

The context window is the maximum number of tokens a model can process in a single request, including both input and output.

Model Context Window Max Output Visual
mythic-4 128,000 tokens 16,384 tokens
mythic-4-mini 128,000 tokens 16,384 tokens
mythic-3.5-turbo 16,000 tokens 4,096 tokens
mythic-embed 8,192 tokens N/A

Token Examples

πŸ”€ Common Words

"the" "and" "for" "are"
1 token each

πŸ”’ Numbers

"2024" β†’ 1 token
"123456789" β†’ 3 tokens
Varies by length

🌐 Non-English

"こんにけは" β†’ 5 tokens
"ΠŸΡ€ΠΈΠ²Π΅Ρ‚" β†’ 6 tokens
~3x more than English
πŸ’» Code
def hello():
return "hi"
9 tokens

Counting Tokens

Python - tiktoken
import tiktoken

# Load the encoding for our models
encoding = tiktoken.get_encoding("cl100k_base")

# Count tokens in a string
text = "Hello, how are you doing today?"
tokens = encoding.encode(text)
print(f"Token count: {len(tokens)}")  # Output: 8

# Decode tokens back to text
decoded = encoding.decode(tokens)
print(decoded)  # "Hello, how are you doing today?"

# Count tokens for chat messages
def count_chat_tokens(messages):
    num_tokens = 0
    for message in messages:
        num_tokens += 4  # overhead per message
        for key, value in message.items():
            num_tokens += len(encoding.encode(value))
    num_tokens += 2  # priming
    return num_tokens

Optimization Tips

πŸ’‘ Reduce Token Usage

  • Be concise: Remove unnecessary words and filler text
  • Use abbreviations: When context is clear, use shorter forms
  • Summarize context: Instead of full documents, include summaries
  • Truncate history: Keep only recent conversation turns
  • Use caching: Leverage context caching for repeated prefixes

❌ Inefficient

Long system prompt 2,000 tokens
Full chat history 5,000 tokens
Verbose user query 500 tokens
Total 7,500 tokens

βœ… Optimized

Concise system prompt 500 tokens
Last 5 turns only 1,500 tokens
Clear user query 100 tokens
Total 2,100 tokens

Handling Long Content

When your content exceeds the context window, consider these strategies:

βœ‚οΈ Chunking

Split long documents into overlapping chunks, process separately, then combine results.

πŸ“Š Summarization

Create summaries of earlier content and include those instead of full text.

πŸ” RAG

Use embeddings to retrieve only the most relevant chunks for each query.

πŸ’Ύ Memory

Store key facts externally and inject them as needed for each request.