gotcontext + rtk: stack two compression layers, kill 90%+ of your token bill

Two compression layers that don't compete ¶

If you run an agentic coding loop (Claude Code, Cursor, Codex, Gemini CLI) your tokens come from two distinct sources, and the same compression strategy doesn't fix both.

Source 1: structured command output. Every git status, pytest, tsc, ls, grep your agent runs returns text the LLM has to read. A single 30-minute coding session typically routes ~118K tokens through the Bash tool, and most of it is whitespace, repeated headers, and noise.

Source 2: unstructured documentation. Specs, plans, READMEs, conversation history, ingested files. This is what gets piped into the agent's context window every time you ask "what does this codebase do?"

Compressing each source needs a different mechanism. rtk (rtk-ai/rtk, 44k stars, Apache 2.0) handles source 1. gotcontext handles source 2. Together they kill 90%+ of total token volume.

Where each one operates ¶

	rtk	gotcontext
Domain	Structured command output	Arbitrary text
Method	Deterministic per-command filters	Semantic chunking + PageRank importance scoring
Architecture	Local CLI hook	Remote API + MCP gateway
Where it sits	Between agent's Bash tool and the shell	Between agent and your KB / docs / specs
License	Apache 2.0	Proprietary (free + paid plans)
Setup	`brew install rtk && rtk init -g`	MCP config + API key

rtk rewrites Bash commands transparently: when your agent calls git status, the hook intercepts and runs rtk git status instead. The agent never sees the rewrite, it just gets compressed output.

gotcontext exposes MCP tools (ingest_context, compress_codebase, gc_kb_query, gc_blast_radius, etc.) that your agent calls explicitly when it needs to compress arbitrary content.

The two systems literally cannot conflict: they intercept different points in the agent's data flow.

The math, joint impact ¶

rtk's published numbers per 30-min Claude Code session (rtk-ai/rtk README):

Operation	Standard	rtk	Savings
`git` family	17,100	3,720	-78%
Test runners	39,000	3,900	-90%
Lint / build	6,000	1,200	-80%
File reads / search	56,000	15,200	-73%
Total dev commands	~118K	~24K	-80%

gotcontext's typical results on documentation + code corpora:

Document Type	Original	Compressed	Savings
API docs	7,200	1,440	-80%
Source code (500 lines)	4,200	1,260	-70%
Large codebase (50 files)	48,000	7,200	-85%

If your agent burns 50K tokens on commands AND 50K on context per session, raw cost is 100K tokens. With rtk + gotcontext: ~10K + ~7.5K = 17.5K. Roughly an 82% joint reduction, dollar-for-dollar.

Setup: 90 seconds for both ¶

rtk

``bash brew install rtk # or curl install: see rtk-ai.app rtk init -g # installs Claude Code hook # restart Claude Code`

That's it. Your next git status will route through rtk transparently. The hook only fires on Bash tool calls. Read, Grep, Glob builtins don't pass through, so call rtk read / rtk grep directly when you want them compressed.

`gotcontext`

Add to your Claude Code MCP config (~/.claude/claude_desktop_config.json):

`json { "mcpServers": { "gotcontext": { "url": "https://api.gotcontext.ai/mcp", "headers": { "Authorization": "Bearer gc_live_YOUR_KEY" } } } }`

Get a key at gotcontext.ai/sign-up. Free tier covers 1,000 compressions/month, no card required.

`Why we're recommending a "competitor" ¶`

rtk and gotcontext sit at different layers of the agent's stack. rtk's hook intercepts the Bash tool; gotcontext serves MCP tools. We can't compress your git status from the API side because we never see it. rtk can't compress your 200KB design doc because it doesn't have a semantic graph engine.

The token-savings space is big enough that two products can serve it without bumping. The thing that hurts users is choosing one and missing the other half. So: install both.

`Operational notes ¶`

rtk is local, gotcontext is remote. rtk runs on your machine; no data leaves. gotcontext requires sending content to our API for compression. If your KB is sensitive, evaluate the data-flow shape per source.

rtk's rtk init -g writes hooks to your Claude Code config. Inspect ~/.claude/settings.json` after install if you want to see exactly what changed.

gotcontext's free tier is real: 1,000 compressions/month with no credit card. Pro is $49/mo for 50K compressions + 100+ MCP tools.

Both are MIT/Apache-2.0 in spirit. rtk's source is fully open; gotcontext is proprietary on the server but client SDKs are MIT.

TL;DR ¶

rtk kills ~80% of your dev-command token bill (git, test, lint, build, ls, cat, grep)

gotcontext kills ~85% of your documentation / context-window token bill

They cannot conflict: different layers of the agent stack

Install both for ~90% joint reduction

90 seconds total setup

Get gotcontext free → · Install rtk →

Try it on your own context

You just read the writeup. Now run the thing. Paste a doc or some verbose tool output and watch it shrink — free, no signup.

Your text

# Service Operations Runbook: Payments API

## Purpose and scope

This runbook covers the payments-api service: what it does, how it is deployed, what its dependencies are, and what to do when it misbehaves. It is written for the on-call engineer. Every procedure here assumes you have production read access and the ability to trigger a deploy through the standard pipeline. Nothing in this document requires direct database write access, and no procedure here should be improvised under pressure: if the situation is not covered, page the service owner rather than inventing a fix at 3am.

The payments-api accepts charge requests from the checkout frontend, validates them against the pricing catalog, forwards them to the payment processor, and records the outcome in the orders database. It is the only service permitted to talk to the processor. Average traffic is steady during business hours with a daily peak around 19:00 UTC and a weekly peak on Friday evenings.

## Architecture and dependencies

The service runs as three replicas behind the regional load balancer. Each replica is stateless; all persistent state lives in the orders database and the idempotency-key store. The service depends on four things: the orders database (primary and one read replica), the idempotency-key store, the pricing catalog service, and the external payment processor. Of these, only the processor is outside our control.

Dependency failure behavior is deliberate and asymmetric. If the pricing catalog is unreachable, the service serves prices from its local cache for up to ten minutes and emits a degraded-mode metric. If the idempotency store is unreachable, the service refuses new charges entirely, because accepting a charge without idempotency protection risks double-billing, and double-billing is strictly worse than downtime. If the processor times out, the charge is recorded as pending and a reconciliation job resolves it within the hour.

## Deployment

Deploys go through the standard pipeline: merge to main, automated tests, staging deploy, a thirty-minute soak with synthetic checkout traffic, then production rollout one replica at a time. The pipeline aborts automatically if the error rate on the new replica exceeds the old baseline. A full rollout takes about twenty minutes. Rollback is the same pipeline in reverse and takes about six minutes; the on-call engineer can trigger it without approval.

## Monitoring and alerts

Three alerts page the on-call engineer. High charge failure rate fires when more than two percent of charge attempts fail over five minutes; the usual causes are a processor incident or a bad deploy, in that order. Idempotency store unavailable fires immediately on connection failure. Reconciliation backlog fires when pending charges older than ninety minutes accumulate, which usually means the reconciliation job is stuck rather than the processor being slow.

2,912/12,000 chars

Compressed

Compressed text will appear here…

Cite this¶

Researchers, analysts, or journalists referencing this post can use either format below — both are copyable.

BibTeXbibtex

@misc{rtk-companion-token-savings-2026,
  title  = {gotcontext + rtk: stack two compression layers, kill 90%+ of your token bill},
  author = {James Hollingsworth},
  year   = {2026},
  month  = {May},
  url    = {https://gotcontext.ai/blog/rtk-companion-token-savings},
  note   = {gotcontext.ai engineering blog.},
}

APAtext

James Hollingsworth. (2026, May 8). gotcontext + rtk: stack two compression layers, kill 90%+ of your token bill. gotcontext.ai. Retrieved from https://gotcontext.ai/blog/rtk-companion-token-savings.

Contribute¶

Suggest an edit

Spotted a typo, a stale benchmark, or a missing nuance? Open a GitHub issue.

Discuss this post

Counterexamples, follow-up questions, and adjacent research welcome.

Email us

Bigger story? Hit us directly at hello@gotcontext.ai.

← Back to all posts