AI Model Compression Benchmark

Name: AI Model Compression Benchmark — Cost & Quality at 5×
Creator: gotcontext.ai
Published: 2026-05-09
License: https://creativecommons.org/licenses/by/4.0/

Cost and quality metrics for frontier LLMs evaluated at 5× context compression. How much do you save when you compress before sending?

v0 data: Pricing is hand-verified from each provider's pricing page where one exists, with a small number of rows sourced from authoritative aggregators when no canonical page is published (Llama 4 Maverick, Mistral Large 3) — see “Pricing sources” below for the full list. Last verified 2026-05-09. Quality scores (HumanEval, RULER, LLM-judge) are not yet collected and show — until live benchmark runs complete.

Sort by

Price:

Context:

Model	Provider	Input $/1M	Output $/1M	Context	HumanEval@5x	RULER@5x	LLM-Judge@5x	$/Quality
DeepSeek V4-Flashsource↗	DeepSeek	$0.14	$0.28	128K	—	—	—	—
GPT-5.4 nanosource↗	OpenAI	$0.20	$1.25	128K	—	—	—	—
Gemini 3.1 Flash-Litesource↗	Google	$0.25	$1.50	1M	—	—	—	—
Llama 4 Mavericksource↗	Meta	$0.27	$0.85	524K	—	—	—	—
Gemini 3 Flashsource↗	Google	$0.50	$3.00	1M	—	—	—	—
Mistral Large 3source↗	Mistral	$0.50	$1.50	128K	—	—	—	—
GPT-5.4 minisource↗	OpenAI	$0.75	$4.50	128K	—	—	—	—
Claude Haiku 4.5source↗	Anthropic	$1.00	$5.00	200K	—	—	—	—
Gemini 3.1 Prosource↗	Google	$1.25	$10.00	1M	—	—	—	—
Grok 4.3source↗	xAI	$1.25	$2.50	131K	—	—	—	—
Claude Sonnet 4.6source↗	Anthropic	$3.00	$15.00	200K	—	—	—	—
GPT-5.5source↗	OpenAI	$5.00	$30.00	128K	—	—	—	—
Claude Opus 4.7source↗	Anthropic	$15.00	$75.00	1M	—	—	—	—

13 of 13 models shown

+ Submit a benchmark

Methodology

Compression ratio

Quality scores will be measured at exactly 5× compression using the gotcontext production compressor once live benchmark runs complete. Costs already reflect post-compression token counts.

Quality metrics

HumanEval pass@1, RULER long-context subset, and LLM-as-judge (0-100) measure how much quality survives compression. Higher is better.

Pricing

Input and output prices in USD per 1M tokens, verified against each provider's official pricing page where one exists, with the remaining rows sourced from authoritative aggregators. All rows include a source link.

Pricing sources

Anthropic — docs.anthropic.com/en/docs/about-claude/pricing
DeepSeek — api-docs.deepseek.com/quick_start/pricing
Google — ai.google.dev/gemini-api/docs/pricing
Meta (Llama 4 Maverick) — tokencost.app/models/llama-4-maverick
Mistral — tokencost.app/models/mistral-large-3
OpenAI — platform.openai.com/docs/pricing
xAI — docs.x.ai/models

See how much you save with compression

Use our calculator to estimate token savings across any model in this benchmark.

Open savings calculator