Tooling
Qwen3.6 27B achieves 45+ tokens/sec on consumer GPUs with llama.cpp
Alibaba's Qwen3.6 27B model running on two AMD RX 9070 XTs via llama.cpp delivers 45–51 tokens per second with speculative decoding, proving dense open models remain viable for local agentic workloads.
1 min read
Sourcer/localllama
Alibaba's Qwen3.6 27B model, deployed on consumer-grade AMD GPUs via llama.cpp, is delivering throughput and reasoning quality that challenges the assumption that dense open models require cloud infrastructure for practical use.
A developer running the model on two RX 9070 XTs (PCIe 5.0 x8/x8, powe...
Sign in to read the full analysis
Free — just an email. Get full analysis on LLM unit economics, plus the weekly Cost-of-Inference column.
Method & sources
- Source type
- Primary publication (lab/vendor blog) — our analysis + implication
- Source link
- r/localllama
- Published
- UTC
- Byline
- By the gotcontext.ai team (editorial standards)
- Correction?
- corrections@gotcontext.ai