James Hollingsworth
Last updated · May 14, 2026
Contributor
42 published posts on the gotcontext.ai blog.
§01
About
Writes about compression internals, model evals, cost economics, and the operational reality of running a token-priced API at scale.
§02
All posts (42)
- 2026-05-20Why Google Killed CNET's AI Article Strategy (And What We're Doing Instead)Research · 11 min
- 2026-05-19Resend bounce handling, PostHog funnel instrumentation, and an AST fence against billing driftEngineering · 8 min
- 2026-05-09How we measure quality at 5× compression: corpora, judge prompts, and anti-bias controlsResearch · 12 min
- 2026-05-09gotcontext + TokenSpeed: stack input compression with TRT-LLM-class inference for self-hostersEngineering · 6 min
- 2026-05-08gotcontext + rtk: stack two compression layers, kill 90%+ of your token billEngineering · 6 min
- 2026-05-08Speculative Decoding: The Throughput Trick That Changes What Fast MeansEngineering · 6 min
- 2026-05-08Semantic Caching Can Cut LLM API Calls by 68.8% — But the Threshold Is EverythingResearch · 5 min
- 2026-05-08Same Prompt, Different Token Count: The Hidden Cost of Switching ProvidersEngineering · 5 min
- 2026-05-08RAG Is 60-65% Cheaper Than Long Context — But Only If Your Retrieval Is PreciseCost · 7 min
- 2026-05-08Output Tokens Cost 5x More Than Input — And Most Teams Budget as If They Don'tCost · 5 min
- 2026-05-08NVIDIA's kvpress Library Puts 30 KV Cache Compression Methods Behind One APIEngineering · 6 min
- 2026-05-0897% of MCP Tool Descriptions Have Quality Problems — and Your Agent Pays for ItEngineering · 5 min
- 2026-03-28Token budgets that survive a model swapResearch · 7 min
- 2026-03-22Streaming compression and 4x faster cold startsEngineering · 6 min
- 2026-03-15How a Linear-style team cut their RAG context bill 71% with semantic dedupeCustomer story · 9 min
- 2026-03-08A taxonomy of bad chunking, with examplesEngineering · 8 min
- 2026-03-01Why we replaced our vector DB with a SQLite extensionEngineering · 11 min
- 2026-02-22Benchmarking compression loss across 14 retrieval tasksResearch · 10 min
- 2026-02-14Per-route compression policiesEngineering · 5 min
- 2026-02-07Field notes from a 4M-token customer support corpusCustomer story · 12 min
- 2026-01-28The OpenTelemetry trace we wish every LLM provider emittedEngineering · 7 min
- 2026-01-19A short defense of writing your own eval harnessResearch · 6 min
§03
Reach
For procurement, pricing, or enterprise compliance reviews — see /pricing#enterprise-contact. For sales: sales@gotcontext.ai. For everything else: hello@gotcontext.ai.