HalBench ranks 29 open models on hallucination resistance, Qwen 3.6 outperforms
A new benchmark tested 29 open-source language models on sycophancy and hallucination, finding that Qwen 3.6 achieves 36.6% pushback against false premises, outperforming larger models and some frontier systems.
A custom benchmark called HalBench has evaluated 29 open-source language models on their ability to resist sycophancy and hallucination by measuring how often they push back against false premises rather than playing along. The benchmark expanded from its initial version testing four frontier models...
Sign in to read the full analysis
Free account. Full analysis on LLM unit economics, plus the weekly Cost-of-Inference column.
Try it on your own context
You just read the writeup. Now run the thing. Paste a doc or some verbose tool output and watch it shrink — free, no signup.
- Source type
- Primary publication (lab/vendor blog) — our analysis + implication
- Source link
- r/localllama
- Published
- UTC
- Byline
- By the gotcontext.ai team (editorial standards)
- Correction?
- corrections@gotcontext.ai