Measured savings across 11 LLMs — Claude Opus 4.7 to Gemini Flash.→ See per-model data
Get free API key →
Plans and pricing

Monthly plans for the MCP compression gateway.

A compression is one POST /v1/compress call (or one MCP tool call that wraps it), capped at the per-tier document size below. Five plans — from a free developer tier through reserved-capacity Enterprise. All MCP tools included on every paid plan.

Prices in USD. Annual contracts on Business and above. Procurement artifacts, sub-processor list, and DPA are in Procurement below.

What compression saves in practice
Pro · $49/mo
~$450/mo in token cost
Claude Sonnet 4 at $3/MTok input
at 60% avg reduction, 5K avg doc
Team · $99/mo
~$900/mo in token cost
Claude Sonnet 4 at $3/MTok input
pooled across unlimited seats
Business · $199/mo
~$4,500/mo in token cost
Claude Sonnet 4 at $3/MTok input
metered overage past 500K limit
Enterprise Dedicated
Custom ROI projection
Any model, any scale
we model your traffic, you see the delta

Estimates use the live 60% rolling average from /v1/global-savings × Claude Sonnet 4 list rate ($3/MTok input) × 5K avg token doc. Your actual savings depend on doc mix and model — use the calculator below to project your volume.

Tier estimator

What does your usage cost?

Drag the slider to your expected monthly compression volume. We’ll recommend a tier and project your monthly cost. Real reduction depends on document mix; the calculator gives the floor.

Billing cadence
50K
5005M
Recommended tier
Pro
Effective monthly cost
$49/mo
Tier limits
Up to 50K compressions / month · 1 MB max doc size · 30 days payload retention · 1 seat.
Start Pro

Numbers are calculator estimates. Real reduction depends on document mix, fidelity, and downstream model.

Live production data from /v1/global-savingsRolling avg: ~60% token reductionMCP tools: 142 included on every paid planUptime: status.gotcontext.ai ↗
Billing cadence (Pro and Team)
Free: 1 seatPro: 1 seatTeam, Business, Enterprise Dedicated: unlimited seats — flat price
Free
Solo dev · try before you buy
$0/mo

For individuals validating semantic compression on their own inputs.

Start free

Works with Claude Code, Cursor, and any MCP client

Included
  • 1,000 compressions / month — hard-stop at 1,200
  • 100 KB max document size
  • 1 concurrent compression slot
  • 17 core MCP tools (compression + filter_cli + search_semantic)
  • 14-day payload retention
  • Community support · no SLA
Pro
Individual developer · solo AI engineering
$49/mo

For individual developers running 50k context compressions per month.

Get Pro — $49/mo
Included
  • 50,000 compressions / month — hard-stop at 60,000
  • 1 MB max document size
  • 2 concurrent compression slots · 60 req/min rate limit
  • All MCP tools (compression, memory, code analysis, multimodal, ACE)
  • Accelerated ONNX embedding tier (3-5× throughput)
  • 30-day payload retention
  • Email support · 2 business-day first response · no SLA
  • Card payment · 14-day refund on monthly · annual saves 20%
Team
Recommended for 10–50 engineers
$99/mo

Shared compression budget across your engineering org. Unlimited seats — no per-seat add-ons.

Get Team — $99/mo
Included
  • 100,000 compressions / month, pooled across unlimited seats
  • 5 MB max document size · hard-stop at 120,000
  • 4 concurrent compression slots · 300 req/min rate limit
  • All MCP tools + async batch queue + compression projects
  • RBAC roles: owner, admin, operator, viewer
  • GitHub integration + advanced analytics + CSV export
  • 90-day payload retention
  • Email support · 1 business-day first response · no SLA
  • Card + ACH payment · 14-day refund on monthly · annual saves 20%
Business
Growth-stage company · compliance + self-hosted
$199/mo

Shared infra with the compliance wraparound. Unlimited seats. 99.5% SLA. Metered overage past 500k.

Get Business — $199/mo
Included
  • 500,000 compressions / month pooled · metered overage $0.50 per 1,000 (auto-billed)
  • 10 MB max document size
  • 8 concurrent compression slots · 500 req/min rate limit
  • SBERT embedding tier (highest semantic fidelity)
  • Self-hosted Docker — data plane in your VPC (= BYOK answer)
  • SSO via SAML 2.0 + OIDC (Okta, Entra ID, Auth0, Keycloak)
  • Audit-log export (NDJSON + CSV)
  • 1-year payload retention · zero-retention mode in self-hosted
  • 99.5% monthly SLA with 10/25/50% credit schedule
  • Priority email + Slack-connect · 1 BD first response
  • Card + ACH + Wire + PO + Invoice · annual invoicing · DPA + IP indemnity + custom MSA
Enterprise Dedicated
Fortune 500 · reserved capacity · single-tenant
$499/mo

Reserved capacity pool — your traffic never shares a process with another customer.

Talk to sales — from $499/mo
Included
  • Everything in Business
  • Single-tenant capacity (4-8 dedicated nodes, your VPC region of choice)
  • 20+ guaranteed concurrent compressions at peak
  • No noisy-neighbor — your queue is isolated from shared traffic
  • Custom rate limits, no /v1/compress throughput cap
  • Configurable payload retention · zero-retention mode available
  • Data residency available on request (US, EU on roadmap H2 2026)
  • 99.9% monthly SLA · custom credit schedule · status.gotcontext.ai
  • Dedicated channel + named CSM · 4h P1 first response
  • Quarterly business review with usage + capacity plan
  • On roadmap (H2 2026): BYOK / SCIM provisioning / EU + APAC region

Plans differ on volume and fidelity — not capability. All 142 MCP tools ship on every paid plan: compression, semantic memory, code analysis, multimodal, and ACE workflows.

View all 142 tools →
How it compares

Why not just run LLMLingua?

The obvious question. LLMLingua is free and open source. Here’s what you trade when you self-host vs using a managed MCP gateway.

Comparison: gotcontext vs LLMLingua, Langfuse, and per-token APIs (Cohere/Voyage)
DimensiongotcontextLLMLingua (OSS)Langfuse ($0–$29)Cohere/Voyage Compact
MCP gateway built in✓ Native — Claude Code, Cursor, any MCP clientBuild it yourselfNot a compression toolAPI call, not MCP-native
Compression engineSemantic (ONNX + PageRank). Local — no LLM API call.Prompt-compression (token-level)No compression; tracing onlyEmbedding model reranking
Setup time< 5 min — add MCP server URL to claude_desktop_config.jsonPython env, GPU recommended, write integration ↗ LLMLingua docs~10 min (SDK + API key)~5 min (API key + write call)
Maintenance burdenZero — managed infra, version upgrades automaticModel updates, infra, embedding drift — your ops teamLow (managed SaaS)Low (managed SaaS)
Self-hosted optionBusiness and above — data plane in your VPCAlways self-hosted (that's the product)$0 self-host or $29/mo cloudCloud API only
Pricing modelPer-compression flat (not per-token) — predictable at scaleFree (your infra cost)Free tier / $29 teamPer 1M tokens (variable)

LLMLingua and Langfuse are open source projects we respect. This comparison reflects their architectures, not a claim of superiority — choose what matches your deployment model and team capacity.

Feature comparison by tier

Feature comparison by tier

Limits, support, and security across all five plans.

Feature comparison by tier — Free, Pro, Team, Business, and Enterprise Dedicated
SpecificationFreeProTeamBusinessEnterprise Dedicated
Compression
Monthly compressions1,00050,000100,000500,000Unlimited (within capacity pool)
Max document size100 KB1 MB5 MB10 MBCustom (negotiable)
Overage policyHard-stopHard-stop at 60KHard-stop at 120KMetered $0.50 / 1KContractual
Batch ingestionIncludedIncludedIncludedIncluded
Async batch queueIncludedIncludedIncluded
Compression projectsIncludedIncludedIncluded
Seats, retention, SLA
Seats11Unlimited (pooled quota)Unlimited (pooled quota)Unlimited
Payload retention14 days30 days90 days1 yearConfigurable + zero-retention mode
Monthly uptime SLA99.5%99.9%
Service-credit schedule10 / 25 / 50%Custom terms
Status pagestatus.gotcontext.aistatus.gotcontext.ai + custom
Embeddings
Standard compressionIncludedIncludedIncludedIncludedIncluded
Accelerated compression (3-5x faster)IncludedIncludedIncludedIncluded
Custom embedding modelsIncluded
Security & control
API key managementIncludedIncludedIncludedIncludedIncluded
API rate limit10 req/min60 req/min300 req/min500 req/minCustom
MCP Server tool access17 core compression toolsAll MCP toolsAll MCP toolsAll MCP toolsAll MCP tools
Fidelity ProfilesIncludedIncludedIncludedIncludedIncluded
Prompt Cache AuditIncludedIncludedIncludedIncluded
Advanced analytics & CSV exportIncludedIncludedIncluded
TeamsIncludedIncludedIncluded
WebhooksIncludedIncludedIncludedIncluded
Audit-log export (NDJSON/CSV)IncludedIncluded
SSO via SAML 2.0 + OIDCIncludedIncluded
Self-hosted Docker (data plane in your VPC)IncludedIncluded
BYOK (via self-hosted = your VPC)IncludedIncluded
SCIM provisioningOn roadmap H2 2026
Customer-managed encryption keys (cloud)On roadmap H2 2026
Data residency (US / EU / APAC)USUSUSUSUS (EU + APAC on roadmap)
DPA, IP indemnity, custom MSAIncludedIncluded
Support & billing
SupportCommunityEmail · 2 BDEmail · 1 BDPriority email + Slack-connectDedicated channel + named CSM
P1 first response1 business day4 hours
Payment methodsCardCard + ACHCard + ACH + Wire + PO + InvoiceCustom (invoice / PO / ACH / Wire)
Annual discount20% (2.4 months free)20% (2.4 months free)Annual invoice onlyCustom contract
Refund policy14-day money-back14-day money-backAnnual prorated within 30 daysPer contract
Platform
Command Palette (Cmd+K)IncludedIncludedIncludedIncludedIncluded
Activity FeedIncludedIncludedIncludedIncludedIncluded
Dark/Light ThemeIncludedIncludedIncludedIncludedIncluded
CSV ExportIncludedIncludedIncludedIncludedIncluded
Queue Monitor (real-time SSE)IncludedIncludedIncludedIncluded
Webhook NotificationsIncludedIncludedIncludedIncluded
Usage AnalyticsIncludedIncludedIncludedIncluded
GitHub IntegrationIncludedIncludedIncluded
RBAC RolesIncludedIncludedIncluded
Shared ProjectsIncludedIncludedIncluded
MCP Tool CompressionIncludedIncludedIncluded
SSO / SAMLIncludedIncluded
Audit TrailIncludedIncluded
Dedicated SupportIncludedIncluded
Custom IntegrationsIncludedIncluded
Savings

Project your savings.

The/v1/global-savingsnumber on the landing hero is a rolling average across all production traffic — useful as a directional signal, not a projection of your savings. Real reduction depends on your document mix, fidelity choice, and downstream model. Per-model breakdowns (Opus 4.7 vs Gemini Flash vs GPT-5.5) live at /savings-by-model.

Want a number for your own traffic? Use the tier estimator above to project your monthly cost by compression volume — or contact us with 7 days of usage data and we’ll model the monthly delta against your raw token cost across any model.

Compliance and procurement

Built for procurement review

The artifacts a Fortune-500 vendor risk team will ask for, ready before the call.

  • SOC 2

    Type I in progress, target Q3 2026. Not yet certified — stated honestly.

    View page
  • DPA

    Available on request — emailed within one business day; GDPR Art. 28 conformant.

    View page
  • Sub-processors

    Cloudflare · Fly · Supabase · Upstash · Clerk · Polar · Resend · Sentry · PostHog. Full list with 30-day change notice.

    View page
  • Self-hosted Docker

    Business and Enterprise Dedicated. Data plane in your VPC; control plane SaaS. Operates as the BYOK answer.

  • Audit log export

    NDJSON + CSV; 90-day retention on Business, configurable on Enterprise Dedicated.

  • Status page

    status.gotcontext.ai with 90-day rolling uptime per component. Required reading before signing any SLA tier.

    Open
  • Liability cap

    Negotiable on annual contracts; default capped at 12 months of fees in MSA template.

  • On roadmap (H2 2026)

    SCIM provisioning · cloud BYOK / CMEK · EU + APAC data residency · SOC 2 Type II close.

Frequently asked questions

Answers before the call

Anything not covered here? Use the contact form below.

One compression is one POST /v1/compress request (or one MCP tool call that wraps it), capped at the per-tier document size: 100 KB Free, 1 MB Pro, 5 MB Team, 10 MB Business, custom on Enterprise Dedicated. A single 30 KB design doc, a 500 KB GitHub diff, and a 2 MB transcript all count as one compression each — regardless of how many tokens are saved.

Contact sales

Enterprise volume and self-hosted

Compliance reviews, custom SLAs, dedicated capacity, on-prem deployments. Tell us about your use case and we’ll respond within one business day.

Who you are

What you need

Plan interest
Primary requirements(select all that apply)

Anything else

Minimum 10 characters. Include team size, compliance constraints, and timeline if known.

Required

© 2026 gotcontext.aiEffective May 2026Security · Changelog · Docs