Contributor

James Hollingsworth

Writes about compression internals, model evals, cost economics, and the operational reality of running a token-priced API at scale.

49 published posts

2026-07-08Token-level compression breaks agents at any ratioResearch · 6 min read
2026-07-08Compressing prompts harder made them more expensiveResearch · 6 min read
2026-07-08When to compact matters more than how muchResearch · 6 min read
2026-06-13Smaller code context, higher fix rateResearch · 6 min read
2026-06-13A smaller context made the agent more accurateResearch · 6 min read
2026-06-01Perfect retrieval isn't enough: the hidden cost of long LLM contextResearch · 7 min read
2026-05-29How to Reduce Claude Code Token CostsTutorial · 9 min
2026-05-20One Query Parameter Cut 19% of Our Claude Code Context BurnEngineering · 7 min
2026-05-20Why Google Killed CNET's AI Article Strategy (And What We're Doing Instead)Research · 11 min
2026-05-20MCP Token Costs: A Quantitative BreakdownCost · 9 min
2026-05-20We Compressed a 3,448-Line MCP Gateway to a Searchable Symbol MapEngineering · 10 min
2026-05-19Resend bounce handling, PostHog funnel instrumentation, and an AST fence against billing driftEngineering · 8 min
2026-05-09How we measure quality at 5× compression: corpora, judge prompts, and anti-bias controlsResearch · 12 min
2026-05-09gotcontext + TokenSpeed: stack input compression with TRT-LLM-class inference for self-hostersEngineering · 6 min
2026-05-08gotcontext + rtk: stack two compression layers, kill 90%+ of your token billEngineering · 6 min
2026-05-08Your LLM Gets Measurably Worse as the Conversation Grows. All of Them Do.Research · 5 min
2026-05-08You Can Cut Chain-of-Thought Token Costs ~66% With One Prompt ChangeEngineering · 5 min
2026-05-08Why Long Agent Sessions Fall Apart (And the Paper That Explains It)Engineering · 6 min
2026-05-08What Chunking Strategy Actually Matters for RAG QualityResearch · 6 min
2026-05-08Vision Tokens Are Expensive and Nobody Reads the Pricing PageCost · 5 min
2026-05-08The Batch API Playbook: 50% Off for Workloads That Can WaitCost · 5 min
2026-05-08The 1,000x Token Multiplier: What Agentic AI Actually CostsCost · 6 min
2026-05-08Speculative Decoding: The Throughput Trick That Changes What Fast MeansEngineering · 6 min
2026-05-08Semantic Caching Can Cut LLM API Calls by 68.8%. But the Threshold Is EverythingResearch · 5 min
2026-05-08Same Prompt, Different Token Count: The Hidden Cost of Switching ProvidersEngineering · 5 min
2026-05-08RAG Is 60-65% Cheaper Than Long Context. But Only If Your Retrieval Is PreciseCost · 7 min
2026-05-08Prompt Caching: Anthropic vs OpenAI vs Google. The Mechanics That Actually Determine Your BillCost · 6 min
2026-05-08Output Tokens Cost 5x More Than Input. And Most Teams Budget as If They Don'tCost · 5 min
2026-05-08NVIDIA's kvpress Library Puts 30 KV Cache Compression Methods Behind One APIEngineering · 6 min
2026-05-08Model Routing: How to Use Frontier Models at 24% of Their CostEngineering · 6 min
2026-05-08KV Cache Compression: A Field Guide for PractitionersEngineering · 7 min
2026-05-08How to Measure Context Waste Before It Becomes a Cost ProblemEngineering · 7 min
2026-05-08GraphRAG vs. Vector RAG: When Relationships Beat SimilarityEngineering · 6 min
2026-05-08Fine-Tuning Costs 100x More Than Few-Shot Prompting and Rarely WinsCost · 6 min
2026-05-0897% of MCP Tool Descriptions Have Quality Problems. And Your Agent Pays for ItEngineering · 5 min
2026-04-23Claude Opus 4.7 Quietly Costs 35% More. Here's How to Claw It BackCost · 5 min
2026-04-14How to Reduce LLM Token Costs by 85%Tutorial · 8 min
2026-04-10Context Window Optimization: Beyond Naive TruncationEngineering · 6 min
2026-04-07Connect gotcontext.ai to Claude Code in 30 SecondsTutorial · 4 min
2026-03-28Token budgets that survive a model swapResearch · 7 min
2026-03-22Streaming compression and 4x faster cold startsEngineering · 6 min
2026-03-15How a Linear-style team cut their RAG context bill 71% with semantic dedupeCustomer story · 9 min
2026-03-08A taxonomy of bad chunking, with examplesEngineering · 8 min
2026-03-01Why we replaced our vector DB with a SQLite extensionEngineering · 11 min
2026-02-22Benchmarking compression loss across 14 retrieval tasksResearch · 10 min
2026-02-14Per-route compression policiesEngineering · 5 min
2026-02-07Field notes from a 4M-token customer support corpusCustomer story · 12 min
2026-01-28The OpenTelemetry trace we wish every LLM provider emittedEngineering · 7 min
2026-01-19A short defense of writing your own eval harnessResearch · 6 min

Reach

For procurement, pricing, or enterprise compliance reviews — see /pricing#enterprise-contact. For sales: sales@gotcontext.ai. For everything else: hello@gotcontext.ai.