HalBench ranks frontier models on sycophancy and hallucination resistance

A researcher has released HalBench, an open benchmark designed to measure how readily large language models agree with false premises and hallucinate supporting content under social pressure. The benchmark tested 3,200 false-premise prompts across four frontier models—Claude Sonnet 4.6, Grok 4.3, GP...

Sign in to read the full analysis

Free — just an email. Get full analysis on LLM unit economics, plus the weekly Cost-of-Inference column.

Get started for free Sign in

Method & sources

Source type: Primary publication (lab/vendor blog) — our analysis + implication
Source link: r/localllama
Published: 2026-05-31 12:10:16 UTC
Byline: By the gotcontext.ai team (editorial standards)
Correction?: corrections@gotcontext.ai

← All Intelligence