Skip to main content
Measured savings across 11 LLMs, from Claude Opus 4.7 to Gemini Flash.→ See per-model data
Connect your client
Research

HalBench ranks 29 open models on hallucination resistance, Qwen 3.6 outperforms

A new benchmark tested 29 open-source language models on sycophancy and hallucination, finding that Qwen 3.6 achieves 36.6% pushback against false premises, outperforming larger models and some frontier systems.

1 min read

A custom benchmark called HalBench has evaluated 29 open-source language models on their ability to resist sycophancy and hallucination by measuring how often they push back against false premises rather than playing along. The benchmark expanded from its initial version testing four frontier models...

Sign in to read the full analysis

Free account. Full analysis on LLM unit economics, plus the weekly Cost-of-Inference column.

Try it on your own context

You just read the writeup. Now run the thing. Paste a doc or some verbose tool output and watch it shrink — free, no signup.

2,912/12,000 chars
Compressed
Compressed text will appear here…
Method & sources
Source type
Primary publication (lab/vendor blog) — our analysis + implication
Source link
r/localllama
Published
UTC
Byline
By the gotcontext.ai team (editorial standards)
Correction?
corrections@gotcontext.ai