Industry News
Anthropic Makes Claude Safeguards Visible After Researcher Backlash
Anthropic reversed a policy that silently degraded Claude's performance on frontier AI research requests, making safeguards transparent and returning refusal reasons to users.
1 min read
SourceSimon Willison
Anthropic has reversed a safeguard policy for Claude Fable 5 that secretly limited the model's effectiveness on requests related to frontier large language model development. The company announced it will make these safeguards visible to users, with flagged requests now visibly falling back to Claud...
Sign in to read the full analysis
Free account. Full analysis on LLM unit economics, plus the weekly Cost-of-Inference column.
Method & sources
- Source type
- Primary publication (lab/vendor blog) — our analysis + implication
- Source link
- Simon Willison
- Published
- UTC
- Byline
- By the gotcontext.ai team (editorial standards)
- Correction?
- corrections@gotcontext.ai