Measured savings across 11 LLMs — Claude Opus 4.7 to Gemini Flash.→ See per-model data
Connect your client
Community benchmark repository · open submissions

AI inference benchmarks —

your build vs the world.

Community-submitted results across GPUs, cloud instances, and quantized models. Ranked by throughput, cost-efficiency, and power draw.

Submit yoursBrowse all leaderboards
#Model · HardwareTokens / secStatusSubmitted by
1
unsloth/Qwen3.5-4B-MTP-GGUF :: Qwen3.5-4B-UD-Q4_K_XL.ggufRTX 4070 12GB (Ada sm_89) + RTX 5070 12GB (Blackwell sm_120), layer-split via llama.cpp on Ryzen 5800XT Zen 3, 128 GB DDR4, Docker Desktop WSL2
93.9
Unverified
2
lmstudio-community/gpt-oss-20b-GGUF :: gpt-oss-20b-MXFP4.ggufRTX 4070 12GB (Ada sm_89) + RTX 5070 12GB (Blackwell sm_120), layer-split via llama.cpp on Ryzen 5800XT Zen 3, 128 GB DDR4, Docker Desktop WSL2
45
Unverified
3
unsloth/Qwen3.6-27B-MTP-GGUF :: Qwen3.6-27B-UD-Q4_K_XL.ggufRTX 4070 12GB (Ada sm_89) + RTX 5070 12GB (Blackwell sm_120), layer-split via llama.cpp on Ryzen 5800XT Zen 3, 128 GB DDR4, Docker Desktop WSL2
37
Unverified
4
lmstudio-community/gemma-4-26B-A4B-it-GGUF :: gemma-4-26B-A4B-it-Q4_K_M.ggufRTX 4070 12GB (Ada sm_89) + RTX 5070 12GB (Blackwell sm_120), layer-split via llama.cpp on Ryzen 5800XT Zen 3, 128 GB DDR4, Docker Desktop WSL2
35
Unverified
5
unsloth/gemma-4-26B-A4B-it-GGUF :: gemma-4-26B-A4B-it-UD-Q4_K_XL.ggufRTX 4070 12GB (Ada sm_89) + RTX 5070 12GB (Blackwell sm_120), layer-split via llama.cpp on Ryzen 5800XT Zen 3, 128 GB DDR4, Docker Desktop WSL2
26.8
Unverified
6
unsloth/granite-4.1-30b-GGUF :: granite-4.1-30b-UD-Q4_K_XL.ggufRTX 4070 12GB (Ada sm_89) + RTX 5070 12GB (Blackwell sm_120), layer-split via llama.cpp on Ryzen 5800XT Zen 3, 128 GB DDR4, Docker Desktop WSL2
26.4
Unverified
7
unsloth/gemma-4-31B-it-GGUF :: gemma-4-31B-it-Q3_K_M.ggufRTX 4070 12GB (Ada sm_89) + RTX 5070 12GB (Blackwell sm_120), layer-split via llama.cpp on Ryzen 5800XT Zen 3, 128 GB DDR4, Docker Desktop WSL2
17.5
Unverified
8
bartowski/nvidia_Nemotron-Cascade-2-30B-A3B-GGUF :: nvidia_Nemotron-Cascade-2-30B-A3B-Q4_K_M.ggufRTX 4070 12GB (Ada sm_89) + RTX 5070 12GB (Blackwell sm_120), layer-split via llama.cpp on Ryzen 5800XT Zen 3, 128 GB DDR4, Docker Desktop WSL2
10.5
Unverified
9
unsloth/Qwen3.6-35B-A3B-MTP-GGUF :: Qwen3.6-35B-A3B-UD-Q4_K_XL.ggufRTX 4070 12GB (Ada sm_89) + RTX 5070 12GB (Blackwell sm_120), layer-split via llama.cpp on Ryzen 5800XT Zen 3, 128 GB DDR4, Docker Desktop WSL2
9.6
Unverified
10
unsloth/NVIDIA-Nemotron-3-Nano-Omni-30B-A3B-Reasoning-GGUF :: NVIDIA-Nemotron-3-Nano-Omni-30B-A3B-Reasoning-UD-Q4_K_XL.ggufRTX 4070 12GB (Ada sm_89) + RTX 5070 12GB (Blackwell sm_120), layer-split via llama.cpp on Ryzen 5800XT Zen 3, 128 GB DDR4, Docker Desktop WSL2
6.3
Unverified
11
unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF :: Qwen3-Coder-30B-A3B-Instruct-UD-Q4_K_XL.ggufRTX 4070 12GB (Ada sm_89) + RTX 5070 12GB (Blackwell sm_120), layer-split via llama.cpp on Ryzen 5800XT Zen 3, 128 GB DDR4, Docker Desktop WSL2
5.7
Unverified
12
ggml-org/gpt-oss-120b-GGUF :: gpt-oss-120b-mxfp4-00001-of-00003.ggufRTX 4070 12GB (Ada sm_89) + RTX 5070 12GB (Blackwell sm_120), layer-split via llama.cpp on Ryzen 5800XT Zen 3, 128 GB DDR4, Docker Desktop WSL2
5.3
Unverified
13
bartowski/Qwen_Qwen3-30B-A3B-Instruct-2507-GGUF :: Qwen_Qwen3-30B-A3B-Instruct-2507-Q4_K_M.ggufRTX 4070 12GB (Ada sm_89) + RTX 5070 12GB (Blackwell sm_120), layer-split via llama.cpp on Ryzen 5800XT Zen 3, 128 GB DDR4, Docker Desktop WSL2
4.5
Unverified

Looking for the gotcontext compression benchmark (gotcontext vs Headroom, April 2026)? See the compression leaderboard

13 benchmarks · open submissions