Measured savings across 11 LLMs — Claude Opus 4.7 to Gemini Flash.→ See per-model data
Connect your client
Research

ServiceNow releases EVA-Bench 2.0 with 213 tool-use scenarios across 3 domains

ServiceNow AI published EVA-Bench Data 2.0, a benchmark covering 121 tools and 213 real-world scenarios for evaluating agent performance across e-commerce, software support, and travel domains.

1 min read

ServiceNow AI released EVA-Bench Data 2.0, expanding its agent evaluation framework to include 121 tools and 213 distinct scenarios spanning three industry domains: e-commerce, software support, and travel. The benchmark measures how well AI agents handle tool selection, sequencing, and execution in...

Sign in to read the full analysis

Free account. Full analysis on LLM unit economics, plus the weekly Cost-of-Inference column.

Method & sources
Source type
Primary publication (lab/vendor blog) — our analysis + implication
Source link
Hugging Face Blog
Published
UTC
Byline
By the gotcontext.ai team (editorial standards)
Correction?
corrections@gotcontext.ai