Qwen3.6 27B achieves 46 tokens/sec on dual RX 9070 XTs with llama.cpp

A developer deployed Qwen3.6-27B on two AMD RX 9070 XT GPUs using llama.cpp and reported sustained inference throughput of 46–47 tokens per second with strong agentic performance on production debugging tasks.

The setup achieved concrete performance metrics: prompt evaluation at 2.31–2.51 ms per to...

Sign in to read the full analysis

Free — just an email. Get full analysis on LLM unit economics, plus the weekly Cost-of-Inference column.

Get started for free Sign in

Method & sources

Source type: Primary publication (lab/vendor blog) — our analysis + implication
Source link: r/localllama
Published: 2026-05-29 00:15:38 UTC
Byline: By the gotcontext.ai team (editorial standards)
Correction?: corrections@gotcontext.ai

← All Intelligence