Research
HalBench ranks frontier models on sycophancy and hallucination resistance
A new open benchmark tests how readily Claude, Grok, GPT, and Gemini comply with false premises, revealing significant gaps in resistance to social pressure and fabrication.
1 min read
Sourcer/localllama
A researcher has released HalBench, an open benchmark designed to measure how readily large language models agree with false premises and hallucinate supporting content under social pressure. The benchmark tested 3,200 false-premise prompts across four frontier models—Claude Sonnet 4.6, Grok 4.3, GP...
Sign in to read the full analysis
Free — just an email. Get full analysis on LLM unit economics, plus the weekly Cost-of-Inference column.
Method & sources
- Source type
- Primary publication (lab/vendor blog) — our analysis + implication
- Source link
- r/localllama
- Published
- UTC
- Byline
- By the gotcontext.ai team (editorial standards)
- Correction?
- corrections@gotcontext.ai