Research
Browser Agent Benchmarks Reveal Hidden Retry Costs Beyond Token Pricing
Open-weight models outperform Gemini 2.5 Flash on WebVoyager tasks when retry overhead is factored in. MiniMax M2.5 costs 2.3x less per successful task despite higher per-token pricing.
1 min read
Sourcer/localllama
A benchmark of 720 browser agent tasks across four models shows that token-per-dollar comparisons mask a critical cost driver: retry overhead from parse failures and dropped calls. One model incurred a 22.9% Agent Execution Tax—wasted inference cycles—that inflated its effective task cost by more th...
Sign in to read the full analysis
Free account. Full analysis on LLM unit economics, plus the weekly Cost-of-Inference column.
Method & sources
- Source type
- Primary publication (lab/vendor blog) — our analysis + implication
- Source link
- r/localllama
- Published
- UTC
- Updated
- UTC
- Byline
- By the gotcontext.ai team (editorial standards)
- Correction?
- corrections@gotcontext.ai