We Compressed a 3,448-Line MCP Gateway to a Searchable Symbol Map
Our MCP gateway is 3,448 lines of Python — roughly 38,000 tokens. We ran it
through our own compress_codebase endpoint and got back a 50-file symbol map covering 1,408 callable symbols
across 175 files. Here is the raw JSON output, what survived, what got stripped,
and the cost math.
The numbers¶
We ran two tools against the live api/app/ directory on 2026-05-20: compress_codebase for the codebase-wide digest and gc_blast_radius for a symbol-focused context pull on McpAuthMiddleware. The JSON below is verbatim output captured from both calls:
{
"_generated_at": "2026-05-20T07:30:00Z",
"_tools_used": [
"mcp__gotcontext-prod__compress_codebase",
"POST /v1/compress-code/structural (gc_blast_radius REST mirror)"
],
"primary_file": {
"path": "api/app/mcp_gateway.py",
"lines": 3448,
"chars": 152826,
"approx_tokens_at_4ch_per_tok": 38206,
"role": "MCP gateway — fronts 142 tools, handles gc_/JWT auth, plan-gating, per-project allowlists, FlyReplay session affinity, telemetry"
},
"codebase_wide_compress": {
"tool": "compress_codebase",
"directory": "api/app/",
"files_found": 175,
"symbols_found": 1408,
"files_selected_at_max_50": 50,
"tensor_grep_available": true,
"headline": "1,408 callable symbols across 175 files compressed to 50-file AST-ranked subset in one MCP call"
},
"focused_compress": {
"tool": "gc_blast_radius",
"focus_symbol": "McpAuthMiddleware",
"input_file": "api/app/mcp_gateway.py",
"ranked_context_count": 1,
"stats": {
"files_in": 1,
"files_ranked": 1,
"symbols_in": 1,
"degraded": false
},
"ranked_top1": {
"path": "api/app/mcp_gateway.py",
"score": 0.03278688524590164,
"rank": 1,
"contributing_signals": ["bm25", "graph_distance"]
}
}
}Three numbers worth anchoring on before we go further:
- 38,206 tokens. The uncompressed size of
mcp_gateway.pyalone, at 4 chars/token. One file. The fullapi/app/directory is larger still. - 1,408 callable symbols across 175 files. The full callable surface of the API, discovered in a single pass.
- 50-file ranked subset returned in ~3 seconds, with
tensor_grep_available: truein the JSON. The BM25 + graph-distance ranking ran against real AST data, not a fallback.
What compress_codebase does¶
The tool is not a text compressor. It does not run LLMLingua-style token-level pruning or embed the code into a vector and retrieve chunks. The approach is pre-tokenization and AST-aware:
- Parse the AST. For Python,
ast.parse()produces a full syntax tree. The engine walks the tree to extract every function definition, class, import statement, and top-level constant. It never splits a function in the middle of its body. AST boundaries are preserved exactly. - Rank by importance. Two signals feed into Reciprocal Rank Fusion (k=60): BM25 relevance against the focus symbol or query, and graph distance in the import/call graph. Symbols closer to the focus score higher. The ranked list identifies which 50 files are most relevant to what you are trying to understand.
- Strip bodies, keep signatures. Each function and method is replaced with its signature line, default arguments, return-type annotation, and docstring. The body becomes
.... Imports and module-level constants stay intact. - Return the ranked digest. The MCP tool returns a
ranked_contextarray: each entry is a path, a relevance score, a rank, and the contributing signals. You send the digest to the model, not the original files.
The difference from LLMLingua: LLMLingua operates post-tokenization, feeding the full text through a small language model to select which tokens to keep. That means it sees code as a token stream, not as a program. Our approach operates pre-tokenization against the AST, which means it can guarantee it never splits a function signature from its docstring or removes an import that the kept signatures depend on.
Before and after: McpAuthMiddleware¶
McpAuthMiddleware is the ASGI middleware class in api/app/mcp_gateway.py that resolves every inbound gc_ API key and Clerk JWT to a (user_id, plan) tuple. In source it runs about 80 lines with full body implementations, Redis cache hits, error handling, and logging calls.
After gc_blast_radius with "focus_symbol": "McpAuthMiddleware", the digest produces:
# api/app/mcp_gateway.py (structural digest)
class McpAuthMiddleware:
"""ASGI middleware: extracts gc_ API key or Clerk JWT,
resolves (user_id, plan), attaches to request.state."""
async def __call__(
self,
scope: Scope,
receive: Receive,
send: Send,
) -> None: ...
async def _resolve_bearer(
self,
token: str,
) -> tuple[str, str]: ...
async def _resolve_clerk_jwt(
self,
token: str,
) -> tuple[str, str]: ...What a coding agent gets from this:
- The class exists and its role (from the docstring).
- The three public entry points:
__call__,_resolve_bearer,_resolve_clerk_jwt. - Exact type signatures. The agent knows
_resolve_bearerreturnstuple[str, str]without touching implementation details. - Enough to answer: “where is the auth boundary?”, “what does this class expose?”, “which method do I modify to add a new key type?”
What the agent does not get: the Redis cache lookup inside _resolve_bearer, the Clerk JWKS URL, the exact error codes raised on bad tokens. For those, the agent needs the original file. The digest answers “what does this module expose?” — not “how does it work internally?”
Where this saves money¶
The token cost of reading code at scale is not a rounding error. Here is the math for a realistic Claude Code session pulling from our own codebase:
| Approach | Input tokens | Cost at $3/M (Sonnet) |
|---|---|---|
| Read 175 raw files directly | 167K–700K tokens | $0.50–$2.10/session |
| Use the 50-file AST digest instead | ~16K–35K tokens | $0.05–$0.11/session |
| Reduction | — | 10–20× |
The 10–20× range reflects the variance in how much of the raw source is boilerplate vs. dense logic. The 175-file figure and the ~38K-token gateway size come directly from the evidence JSON above, not projections.
The savings compound across sessions. If your Claude Code workflow reads the same codebase orientation at the start of every session (a common pattern when picking up a task mid-sprint), those read costs repeat daily. At even the low end of the range ($0.50/session) and 5 sessions per day per developer, that is $2.50 in input token overhead before any actual work begins.
The output token premium compounds this further. Anthropic charges $15/M for Sonnet 4.6 output versus $3/M for input, a 5× multiplier. If reading 175 files triggers the model to summarize or reference them in its responses, output costs land on top of the input costs. The digest short-circuits both directions.
What we lost¶
Honest disclosure
The digest is not a replacement for the source. Here is what it cannot answer.
The structural digest strips three categories of content by design:
- Function bodies. The full implementation of every function and method (control flow, conditionals, Redis calls, error raises, and any logic that only makes sense in execution context) is replaced with
.... To debug a runtime error in a function body, you need the original file. - Inline comments. Comments attached to individual lines of implementation are stripped along with the body. Module-level and function-level docstrings survive; notes inside the function do not.
- Tests. The
api/tests/directory is not included in acompress_codebasecall targetingapi/app/. You cannot determine what behaviors are locked by tests from the digest alone.
The digest is right for: “what does this module expose to MCP?”, “where does auth happen?”, “which files are most relevant to McpAuthMiddleware?”, “what is the callable surface of this API?”
The digest is wrong for: “why is this Redis call failing?”, “what does this migration do step by step?”, “how does the billing webhook update the user’s plan?”
For drift detection (confirming that a signature in the digest still matches the running code), you still need the originals. The digest is a snapshot taken at call time. Treat it like a generated index: accurate when fresh, stale when the source has changed since the last call.
Try it¶
Two ways in: MCP tool call from Claude Code, Cursor, or Gemini CLI with the gotcontext server configured, or direct REST call to the POST /v1/compress-code/structural endpoint. The REST endpoint is the same engine the MCP tool calls internally.
MCP (recommended for interactive work): Configure the server once via the setup guide, then call the tool from any MCP client:
# In any MCP client (Claude Code, Cursor, Gemini CLI)
# with the gotcontext MCP server configured:
# Tool: compress_codebase
# Arguments:
{
"directory": "api/app/",
"max_files": 50
}
# Returns: ranked 50-file symbol map in ~3 secondsREST (for scripting or CI): Replace gc_YOUR_KEY with a key from the dashboard:
curl -s -X POST https://api.gotcontext.ai/v1/compress-code/structural \
-H "Authorization: Bearer gc_YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"files": [
{"path": "api/app/mcp_gateway.py", "content": "<file contents>"}
],
"focus_symbol": "McpAuthMiddleware",
"top_k": 25
}' | jq '.ranked_context[] | {path, score, rank, contributing_signals}'The files array accepts 1–1,000 files, up to 512 KB each with a 5 MB total limit. For a full codebase pass, use the MCP compress_codebase tool directly. It handles file discovery and chunking internally.
The free tier includes compress_codebase calls up to the monthly free limit. gc_blast_radius is Pro-tier. Both use the same bearer-token auth; the plan determines which tools are gated.
What's next¶
The same compression approach applies to contexts other than source files. Chat history accumulates tokens that have diminishing per-turn value as sessions grow. RAG retrievals return chunks where only a fraction is directly relevant to the query. MCP tools/list responses for large tool catalogs (ours is 142 tools, ~38K tokens at full profile) can be profile-routed to a 7-tool core digest for most use cases.
We have already shipped profile routing for the tool catalog. The conversation and RAG compression surfaces are on the roadmap. We will write about each when the data exists to back the claims.