Measured savings across 11 LLMs — Claude Opus 4.7 to Gemini Flash.→ See per-model data
Get free API key →
Engineering

Same Prompt, Different Token Count: The Hidden Cost of Switching Providers

GPT-4o uses BPE with 200K vocab. Gemini uses SentencePiece with 262K. Claude uses its own BPE variant. The gaps reach 15–25% on code-heavy prompts and they compound at scale.

James Hollingsworth(Contributor)Published 5 min~694 words

You benchmarked GPT-4o. The cost looked right. You switched to Claude for the longer context window. Your bill went up 20%.

You didn't change your prompts. You didn't change your usage patterns. The only thing that changed was the tokenizer, and tokenizers disagree, sometimes substantially, on how many tokens the same text is worth.

Three Tokenizers, Three Different Answers

The major frontier models use different tokenization schemes:

OpenAI (GPT-4o, GPT-4 Turbo) uses Byte Pair Encoding (BPE) with a 200,000-token vocabulary, implemented via the tiktoken library. BPE builds its vocabulary by iteratively merging the most frequent byte pairs in the training corpus.

Google (Gemini 1.5 Pro, Gemini 2.0) uses SentencePiece with a 262,144-token vocabulary. SentencePiece operates directly on raw text without pre-tokenization, using unigram language model or BPE internally, with vocabulary construction optimized differently from tiktoken.

Anthropic (Claude 3.5, Claude 4) uses a BPE variant with its own vocabulary, sized and trained on Anthropic's corpus mix.

These aren't minor implementation differences. The vocabulary sizes, merge rules, and training corpora produce meaningfully different token counts for the same input text.

Where the Gaps Are Largest

Tokenizer disagreement is not uniform across content types. English prose from typical web text is where all three tokenizers were most heavily optimized. Gaps are smallest here, typically 2–5%.

The gaps grow significantly in three areas:

Code and technical content. Code uses a lot of tokens that appear rarely in natural language: indentation, brackets, underscores, camelCase identifiers, and language-specific syntax. A Python function with heavy use of list comprehensions and f-strings will tokenize very differently across providers. Gaps of 15–25% on code-heavy prompts are common.

Non-English text. BPE vocabularies trained primarily on English text handle other languages less efficiently. A 500-character Japanese paragraph might use 80 tokens under one tokenizer and 140 under another. For multilingual applications, tokenizer choice can double your effective cost.

Structured data formats. JSON, XML, YAML, and Markdown all contain repeated structural characters that tokenizers handle differently. A large JSON payload with many short string values will produce very different token counts across providers.

Checking Token Counts Before You Commit

Each provider exposes tokenizer access:

``python # OpenAI / tiktoken import tiktoken enc = tiktoken.encoding_for_model("gpt-4o") tokens = enc.encode(your_text) print(len(tokens)) `

`python # Anthropic / count_tokens API method import anthropic client = anthropic.Anthropic() response = client.messages.count_tokens( model="claude-opus-4-7-20251101", messages=[{"role": "user", "content": your_text}] ) print(response.input_tokens) `

For Google's Gemini, the countTokens` method in the Gemini API provides equivalent functionality.

The important discipline: benchmark your actual prompts through each provider's tokenizer before committing to a cost model. Generic token-per-word estimates (the common shortcut of "1 token ≈ 4 characters") obscure provider-specific variation that can swing your budget by 20% or more on technical workloads.

The Compression Lens

Tokenizer differences matter most when you're managing context at scale: long system prompts, large codebase injections, multi-turn agent sessions. This is where compression earns its value.

A 15× compression of a long codebase context doesn't just reduce tokens proportionally. It also reduces the surface area where tokenizer inefficiency compounds. A 100,000-token JSON payload compressed to 6,700 tokens through semantic compression doesn't just cost 15× less; it also eliminates 93,300 tokens worth of tokenizer-specific overhead.

For teams running on multiple providers simultaneously (common for cost arbitrage or reliability), compression normalizes the token footprint before it reaches the provider's tokenizer. The compressed representation uses fewer total tokens on every provider, and the relative gaps between providers shrink because compressed text has a higher information density per character.

Practical Recommendations

If you're switching providers or running a cost comparison:

  • Run your top 10 most common prompts through each provider's tokenizer before finalizing cost models. Don't use character counts or word counts as proxies.
  • Weight by content type. If 60% of your token spend is on code-related prompts, the tokenizer gap will be at the high end. If it's primarily English prose, it will be at the low end.
  • Re-check after prompt changes. Prompt engineering changes that reduce token count under one tokenizer may not reduce it proportionally under another.
  • Use compression on high-volume context injections. System prompts, codebase context, and long document injections are where tokenizer-specific overhead compounds most aggressively. Compressing these before they reach the API reduces both the base cost and the provider-specific variance.
  • The tokenizer is part of the pricing mechanism. It deserves the same scrutiny as the per-token rate.

    Cut your token footprint across every provider with gotcontext.ai →

    Cite this

    Researchers, analysts, or journalists referencing this post can use either format below — both are copyable.

    BibTeXbibtex
    @misc{tokenizer-differences-provider-cost-reality-2026,
      title  = {Same Prompt, Different Token Count: The Hidden Cost of Switching Providers},
      author = {James Hollingsworth},
      year   = {2026},
      month  = {May},
      url    = {https://www.gotcontext.ai/blog/tokenizer-differences-provider-cost-reality},
      note   = {gotcontext.ai engineering blog.},
    }
    APAtext
    James Hollingsworth. (2026, May 8). Same Prompt, Different Token Count: The Hidden Cost of Switching Providers. gotcontext.ai. Retrieved from https://www.gotcontext.ai/blog/tokenizer-differences-provider-cost-reality.

    Contribute