Tooling
Qwen3.6 27B achieves 46 tokens/sec on dual RX 9070 XTs with llama.cpp
A developer running Qwen3.6-27B at Q5_K_XL quantization on two AMD GPUs achieved 46–47 tokens/second inference speed with 83% speculative decoding acceptance rates, demonstrating viable local deployment for agentic
1 min read
Sourcer/localllama
A developer deployed Qwen3.6-27B on two AMD RX 9070 XT GPUs using llama.cpp and reported sustained inference throughput of 46–47 tokens per second with strong agentic performance on production debugging tasks.
The setup achieved concrete performance metrics: prompt evaluation at 2.31–2.51 ms per to...
Sign in to read the full analysis
Free — just an email. Get full analysis on LLM unit economics, plus the weekly Cost-of-Inference column.
Method & sources
- Source type
- Primary publication (lab/vendor blog) — our analysis + implication
- Source link
- r/localllama
- Published
- UTC
- Byline
- By the gotcontext.ai team (editorial standards)
- Correction?
- corrections@gotcontext.ai