A smaller context made the agent more accurate
A Microsoft team ran a 50-task MCP tool-use benchmark four ways. Pruning the agent's context and adding a short summary cut tokens by 62% and raised task completion from 71% to 91.6%.
If you run an LLM agent against real tools, you have watched its context window fill with tool output it will never read again. The reflex is to keep all of it, in case the model needs it later. A team at Microsoft tested that reflex on a 50-task expense-itemization benchmark built on Model Context Protocol tools, and the result runs the other way: keeping the full history finished 71.0% of the tasks, while pruning the context to the last five tool calls and adding a short summary finished 91.6%, on 62% fewer tokens (Lodha et al., 2026).
We build a compression API, so we have a stake in this. The numbers here are not ours, though, and they stand on their own.
The setup: an expense agent drowning in tool output ¶
The benchmark is narrow and concrete, which is what makes it useful. The agent itemizes hotel expenses inside Microsoft Dynamics 365 Finance and Operations, calling MCP tools to read and write records. Each tool response carries a lot of structured data the agent mostly does not need on the next turn. Across 50 tasks and five independent runs, the authors compared four ways of handling that growing pile of context.
The weakest setup gave the agent no running model of the user at all. It completed 8.0% of the itemizations. That is the floor: an agent with tools but no memory of what it has already done.
Full history: 1.48 million tokens to finish 71% ¶
Giving the agent its complete conversation history is the obvious fix, and it helps. Completion climbs to 71.0%. The cost is steep. That configuration burned 1,480,996 tokens and 14.56 hours per benchmark run (Lodha et al., 2026).
Two things happen at once inside that number. The agent pays for every stale tool response it re-sends on every turn, and it also has to read them. The second cost is the one people forget. A long context is not free attention. It is a pile the model searches every time, and the noise in it competes with the signal.
Pruning plus a summary: 62% fewer tokens, 91.6% done ¶
The configuration that won kept only the last five tool call and response pairs and replaced the older history with an automated summary. It finished 91.6% of the itemizations, with 99.64% of the dollar amounts correct, using 553,374 tokens and 5.79 hours (Lodha et al., 2026).
Set those side by side. Full history: 71.0% done, 1,480,996 tokens. Prune and summarize: 91.6% done, 553,374 tokens. The compressed run cost 62.6% fewer tokens and finished 20 percentage points more of the work. The authors saw the same pattern with Claude Sonnet 4.5, so it is not a single-model artifact.
This is the line for anyone who still treats compression as a cost-versus-quality trade-off: here it was a cost win and a quality win in the same run.
Why less context can help ¶
The mechanism is not mysterious. As an agent's history grows, two failure modes set in. The first is stale state: an early tool response describes a record that a later write has already changed, and the model trusts the old copy. The second is lost signal: the fact that matters is buried under thousands of tokens of routine output, and the model's attention spreads too thin to find it.
This matches a result we wrote about in Perfect retrieval isn't enough, where models lose accuracy as context grows even when the right document is present and perfectly retrievable. Pruning and summarizing attack both failure modes. Drop the stale records, and the model stops trusting outdated state. Compress the rest into a short summary, and the relevant facts move back into the part of the window the model actually reads.
This is the case for compression, measured ¶
The expense study is one workflow, but it lands in a growing pile of evidence pointing the same way. A 2025 RAG study found that compressing retrieved documents to 3% of their original length improved Exact Match by 3.3 points over feeding the model the full documents (Cui et al., 2025). The first systems-level measurement of MCP agents found that the protocol's system prompts, tool definitions, and context histories inflate token usage sharply, which turns context management into a real cost lever instead of a tuning detail (Ding et al., 2025).
The shared finding is that the model does not want all of your context. It wants the part that bears on the current step, in a form short enough to read. That is the premise of what we build at gotcontext: an MCP gateway that compresses tool output, documents, and codebases before they reach the model, so the agent reads the signal instead of the pile. The Microsoft result is a clean outside measurement of why that helps.
What this does not prove ¶
One benchmark, one domain, one primary model. The expense workflow has a property that flatters compression: most tool output is structured data the agent references once and then never again, which is exactly the case where pruning loses little. A workflow where the agent has to reason across the full history, a long legal document or a multi-file refactor, will not compress as cleanly, and aggressive summarization there can drop a fact the model needed. The honest version of the claim is narrow: for tool-heavy agent workflows where most context is reference data, a short recent window plus a summary beat keeping everything, on both cost and accuracy, in this study. That is a strong result. It is not a universal law, and we would rather you read it as the first.
Cite this¶
Researchers, analysts, or journalists referencing this post can use either format below — both are copyable.
@misc{smaller-context-made-the-agent-more-accurate-2026,
title = {A smaller context made the agent more accurate},
author = {James Hollingsworth},
year = {2026},
month = {June},
url = {https://www.gotcontext.ai/blog/smaller-context-made-the-agent-more-accurate},
note = {gotcontext.ai engineering blog.},
}James Hollingsworth. (2026, June 13). A smaller context made the agent more accurate. gotcontext.ai. Retrieved from https://www.gotcontext.ai/blog/smaller-context-made-the-agent-more-accurate.