Community benchmark repository · open submissions
AI inference benchmarks —
your build vs the world.
Community-submitted results across GPUs, cloud instances, and quantized models. Ranked by throughput, cost-efficiency, and power draw.
Top results
+ Submit a result| # | Model · Hardware | Ctx | Batch | Tokens / sec | TTFT | Status | Submitted by |
|---|---|---|---|---|---|---|---|
| 1 | unsloth/Qwen3.5-4B-MTP-GGUF :: Qwen3.5-4B-UD-Q4_K_XL.ggufRTX 4070 12GB (Ada sm_89) + RTX 5070 12GB (Blackwell sm_120), layer-split via llama.cpp on Ryzen 5800XT Zen 3, 128 GB DDR4, Docker Desktop WSL2 | — | — | 93.9 | — | Unverified | anonymous |
| 2 | lmstudio-community/gpt-oss-20b-GGUF :: gpt-oss-20b-MXFP4.ggufRTX 4070 12GB (Ada sm_89) + RTX 5070 12GB (Blackwell sm_120), layer-split via llama.cpp on Ryzen 5800XT Zen 3, 128 GB DDR4, Docker Desktop WSL2 | — | — | 45 | — | Unverified | anonymous |
| 3 | unsloth/Qwen3.6-27B-MTP-GGUF :: Qwen3.6-27B-UD-Q4_K_XL.ggufRTX 4070 12GB (Ada sm_89) + RTX 5070 12GB (Blackwell sm_120), layer-split via llama.cpp on Ryzen 5800XT Zen 3, 128 GB DDR4, Docker Desktop WSL2 | — | — | 37 | — | Unverified | anonymous |
| 4 | lmstudio-community/gemma-4-26B-A4B-it-GGUF :: gemma-4-26B-A4B-it-Q4_K_M.ggufRTX 4070 12GB (Ada sm_89) + RTX 5070 12GB (Blackwell sm_120), layer-split via llama.cpp on Ryzen 5800XT Zen 3, 128 GB DDR4, Docker Desktop WSL2 | — | — | 35 | — | Unverified | anonymous |
| 5 | unsloth/gemma-4-26B-A4B-it-GGUF :: gemma-4-26B-A4B-it-UD-Q4_K_XL.ggufRTX 4070 12GB (Ada sm_89) + RTX 5070 12GB (Blackwell sm_120), layer-split via llama.cpp on Ryzen 5800XT Zen 3, 128 GB DDR4, Docker Desktop WSL2 | — | — | 26.8 | — | Unverified | anonymous |
| 6 | unsloth/granite-4.1-30b-GGUF :: granite-4.1-30b-UD-Q4_K_XL.ggufRTX 4070 12GB (Ada sm_89) + RTX 5070 12GB (Blackwell sm_120), layer-split via llama.cpp on Ryzen 5800XT Zen 3, 128 GB DDR4, Docker Desktop WSL2 | — | — | 26.4 | — | Unverified | anonymous |
| 7 | unsloth/gemma-4-31B-it-GGUF :: gemma-4-31B-it-Q3_K_M.ggufRTX 4070 12GB (Ada sm_89) + RTX 5070 12GB (Blackwell sm_120), layer-split via llama.cpp on Ryzen 5800XT Zen 3, 128 GB DDR4, Docker Desktop WSL2 | — | — | 17.5 | — | Unverified | anonymous |
| 8 | bartowski/nvidia_Nemotron-Cascade-2-30B-A3B-GGUF :: nvidia_Nemotron-Cascade-2-30B-A3B-Q4_K_M.ggufRTX 4070 12GB (Ada sm_89) + RTX 5070 12GB (Blackwell sm_120), layer-split via llama.cpp on Ryzen 5800XT Zen 3, 128 GB DDR4, Docker Desktop WSL2 | — | — | 10.5 | — | Unverified | anonymous |
| 9 | unsloth/Qwen3.6-35B-A3B-MTP-GGUF :: Qwen3.6-35B-A3B-UD-Q4_K_XL.ggufRTX 4070 12GB (Ada sm_89) + RTX 5070 12GB (Blackwell sm_120), layer-split via llama.cpp on Ryzen 5800XT Zen 3, 128 GB DDR4, Docker Desktop WSL2 | — | — | 9.6 | — | Unverified | anonymous |
| 10 | unsloth/NVIDIA-Nemotron-3-Nano-Omni-30B-A3B-Reasoning-GGUF :: NVIDIA-Nemotron-3-Nano-Omni-30B-A3B-Reasoning-UD-Q4_K_XL.ggufRTX 4070 12GB (Ada sm_89) + RTX 5070 12GB (Blackwell sm_120), layer-split via llama.cpp on Ryzen 5800XT Zen 3, 128 GB DDR4, Docker Desktop WSL2 | — | — | 6.3 | — | Unverified | anonymous |
| 11 | unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF :: Qwen3-Coder-30B-A3B-Instruct-UD-Q4_K_XL.ggufRTX 4070 12GB (Ada sm_89) + RTX 5070 12GB (Blackwell sm_120), layer-split via llama.cpp on Ryzen 5800XT Zen 3, 128 GB DDR4, Docker Desktop WSL2 | — | — | 5.7 | — | Unverified | anonymous |
| 12 | ggml-org/gpt-oss-120b-GGUF :: gpt-oss-120b-mxfp4-00001-of-00003.ggufRTX 4070 12GB (Ada sm_89) + RTX 5070 12GB (Blackwell sm_120), layer-split via llama.cpp on Ryzen 5800XT Zen 3, 128 GB DDR4, Docker Desktop WSL2 | — | — | 5.3 | — | Unverified | anonymous |
| 13 | bartowski/Qwen_Qwen3-30B-A3B-Instruct-2507-GGUF :: Qwen_Qwen3-30B-A3B-Instruct-2507-Q4_K_M.ggufRTX 4070 12GB (Ada sm_89) + RTX 5070 12GB (Blackwell sm_120), layer-split via llama.cpp on Ryzen 5800XT Zen 3, 128 GB DDR4, Docker Desktop WSL2 | — | — | 4.5 | — | Unverified | anonymous |
Looking for the gotcontext compression benchmark (gotcontext vs Headroom, April 2026)? See the compression leaderboard
13 benchmarks · open submissions