OpenAI Opus 4.6 withstands 6,000 hacking attempts in public challenge
Fernando Irarrávaval's public challenge to leak secrets from an AI assistant running OpenAI's Opus 4.6 ended with zero successful exploits after 6,000 attempts, suggesting frontier models are becoming harder to compromis
Fernando Irarrávaval ran a public challenge on hackmyclaw.com inviting 2,000 people to attempt prompt injection attacks against an AI assistant instance. After 6,000 attempts across the challenge period, consuming $500 in token spend and triggering a Google account suspension from the volume of inbo...
Sign in to read the full analysis
Free account. Full analysis on LLM unit economics, plus the weekly Cost-of-Inference column.
Try it on your own context
You just read the writeup. Now run the thing. Paste a doc or some verbose tool output and watch it shrink — free, no signup.
- Source type
- Primary publication (lab/vendor blog) — our analysis + implication
- Source link
- Simon Willison
- Published
- UTC
- Byline
- By the gotcontext.ai team (editorial standards)
- Correction?
- corrections@gotcontext.ai
Related
- Verifier quality determines agent loop success, not model capabilityResearch
- Nesbitt's hypothetical incident exposes multi-agent security loop risksResearch
- AI image generators struggle with anatomically coherent children's illustrationsResearch
- AI models show measurable political bias across major benchmarksResearch