Skip to main content
Skip to main content
Get free API key →

Connect via MCP (recommended)#

The MCP server at https://api.gotcontext.ai/mcp gives your AI agent 148 compression, ingestion, and context-management tools without running anything locally. Steps 1, 4, and 5 get you connected and seeing compression in about two minutes. Steps 2 to 3 (a dedicated project) are optional. Add them whenever you want per-project budgets and usage attribution; until then, usage rolls up to your auto-created Default project.

Free plans include 17 compression tools (compression, advisory, budget awareness) and 1,000 compressions/month for validation. Pro, Team, and Enterprise all include the same 148 MCP tools, including ACE (Agent Context Engineering), knowledge management, multimodal ingestion, quality detection, memory, prompt cache, connectors, handoffs, and experiments. Tiers differ on monthly compression volume, embedding fidelity, and enterprise wraparound (self-hosted Docker, OIDC/SSO, audit-log export, dedicated SLA, named support, custom contract), not on which tools you can call.

Setup (5 steps)

1. Get a free API key

Sign in and create a gc_-prefixed key from your dashboard. The free tier includes 17 tools and 1,000 compressions per month, enough to validate the workflow before upgrading.

2. (Optional) Mint a dedicated project for this workspace

Without a project, all traffic attributes to your Default rollup alongside test fixtures and unrelated sessions, making per-project budgets and usage stats meaningless. Create a project from the Projects page or call the MCP tool directly from inside Claude Code:

create_project(name="my-repo", description="Compression project for my-repo")
      # Returns: { project_id: "abc123", name: "my-repo" }

3. (Optional) Bind your key to that project

Go to Settings → API Keys and use the inline rebinder to assign the key to your new project. Allow up to 5 minutes for the change to propagate. The plan cache has a 5-minute TTL, so per-project compression counts begin incrementing on the new project shortly after.

4. Configure the MCP server

Run one command. It prompts for your gc_ key, or pass it via --key gc_... or the GOTCONTEXT_API_KEY env var. Restart your CLI afterwards.

npx gotcontext wrap claude
npx gotcontext wrap codex
npx gotcontext wrap gemini

Run npx gotcontext doctor at any time to see which CLIs are detected and configured.

Or configure manually (paste JSON into your client config)

Choose your client from the snippets below and paste your gc_ key in place of gc_your_key_here. For Claude Code, the key must be available as a shell environment variable. The .mcp.json substitution reads from the shell at session start, not from .env.local. Run export GOTCONTEXT_KEY=gc_... before launching Claude Code.

The snippets default to ?profile=core: 7 essential tools at ~2,000 tokens, so a session starts lean. Swap to ?profile=full (or drop the query parameter) for the complete tool catalog at ~38,000 tokens. Why this matters →

5. Run your first compression: see the savings

Connected. Now ask your agent to compress something verbose: a git diff, a pytest -v run, or a large file. With filter_cli_output you typically see 50 to 60% fewer tokens on real output. Then call project_stats to confirm the usage attributed to your project, not Default.

filter_cli_output(text="<paste a git diff or pytest -v run>")
      # -> compressed text + tokens_saved + savings_pct (typically 50-60% on verbose output)

      project_stats()
      # -> { project_name: "my-repo", compressions_this_month: 1, ... }

Manual JSON config (alternative to the CLI)

Claude Code

{
        "mcpServers": {
          "gotcontext": {
            "url": "https://api.gotcontext.ai/mcp?profile=core",
            "headers": {
              "Authorization": "Bearer gc_your_key_here"
            }
          }
        }
      }

Cursor

{
        "mcpServers": {
          "gotcontext": {
            "url": "https://api.gotcontext.ai/mcp?profile=core",
            "headers": {
              "Authorization": "Bearer gc_your_key_here"
            }
          }
        }
      }

VS Code (settings.json)

{
        "mcp": {
          "servers": {
            "gotcontext": {
              "url": "https://api.gotcontext.ai/mcp?profile=core",
              "headers": {
      "Authorization": "Bearer gc_your_key_here"
              }
            }
          }
        }
      }

Gemini CLI (settings.json)

{
        "mcpServers": {
          "gotcontext": {
            "url": "https://api.gotcontext.ai/mcp?profile=core",
            "type": "http",
            "headers": {
              "Authorization": "Bearer gc_your_key_here"
            },
            "timeout": 30000
          }
        }
      }

Authentication

All MCP connections require a gc_-prefixed API key passed in the Authorization header. Create one from your dashboard.

For custom MCP clients

The MCP endpoint uses Streamable HTTP transport. Requests must include Accept: application/json, text/event-stream and carry the Mcp-Session-Id header from the initialize response on all subsequent calls. Claude Code, Cursor, and VS Code handle this automatically.

Project instructions file (CLAUDE.md / AGENTS.md)

Add a CLAUDE.md (or AGENTS.md) to your project root so the AI knows when and how to use gotcontext compression. Without this, the AI may not use the tools effectively. Copy this starter:

# gotcontext.ai Compression

      This project uses gotcontext.ai for semantic compression via MCP.

      ## When to compress
      - Before sending large files or docs to the AI context window
      - When terminal output is verbose (git diff, test results, logs)
      - When reviewing code across many files
      - Before reviewing a PR or explaining a diff — compress the changed
        files or run `gc_blast_radius` to see only transitively-touched code

      ## Compression workflow
      1. Use `ingest_context` to add a document (give it a unique file_id)
      2. Use `read_skeleton` to get an adaptive structural skeleton.
         Compression adapts to size: small/medium docs stay faithful (most
         sections kept, with meaningful savings); large docs compress hard.
         Drill into any referenced section with `modulate_region`.
      3. For a targeted read, pass `selection_mode="evidence_aware"` + a
         `query` (and optional `top_k`) to anchor the relevant sections.
      4. Use `search_semantic` to find specific sections by query.
      5. Use `filter_cli_output` to compress git diffs, pytest output, etc.

      ## Code understanding (Pro+)
      - `compress_codebase` — AST-aware digest of an entire repo; function
        and class signatures only, bodies stripped
      - `gc_blast_radius` — ranked context for a focus symbol: tensor-grep
        blast-radius + BM25 fusion. Best for PR review and bug triage
      - `gc_compress_manifest` — compress an MCP tools/list response so
        downstream agents see shorter tool descriptions without losing
        inputSchema semantics (v1.8.0+)
      - `batch_ingest_documents` — submit up to 50 docs as one async job;
        poll status via `GET /v1/batch-queue/{id}`

      ## Tips
      - Use `estimate_tokens` first to see if compression is worthwhile
      - For code files, the compressor understands function/class boundaries
      - Use `get_compression_presets` to see available fidelity levels
      - Call `tool_help` for documentation on any specific tool

When to compress

The recommended per-call decision loop for any file or output you are about to pass to the model:

1. Check whether compression is worthwhile

For any file or output larger than ~1,500 tokens (roughly 6,000 bytes), call estimate_tokens first. If the result is below that threshold, send as-is. The compression overhead is not worth it.

2. Call gc_pre_flight first — it routes you to the right tool

gc_pre_flight is the recommended entry point before any gotcontext operation. It returns a verdict (one of four actions) and a mode field that tells you which gotcontext tool to reach for next: scout (use read_skeleton / search_semantic), compress (ingest + read_skeleton), read or write (KB operations), or idle (nothing needed).

gc_pre_flight()
      # verdict — what to do:
      #   send_as_is       — context is small; no action needed
      #   send_compressed  — ingest + read_skeleton before sending
      #   warn_context_limit — approaching limit; compress or summarize
      #   clear_first      — context is saturated; clear before proceeding
      # mode — which tool to reach for next:
      #   scout   → read_skeleton / search_semantic
      #   compress → ingest + read_skeleton
      #   read    → gc_kb_get / gc_kb_query
      #   write   → gc_kb_ingest / gc_kb_edit
      #   idle    → nothing needed

3. Compress if recommended

If the verdict is send_compressed:

ingest_context(file_id="my-doc", content="...")
      read_skeleton(doc_id="...")
      # Use the adaptive skeleton in your prompt instead of the raw content.
      # Drill into any referenced section with modulate_region, or pass a
      # selection_mode="evidence_aware" + query for a targeted read.

For verbose CLI output

Pipe pytest output, fly logs, or git diff through filter_cli_output before passing to the model. Typically 70 to 90% smaller with failure signal preserved.

For code review questions

When asking “what does changing X affect?” or reviewing a diff, call gc_blast_radius with the focus symbol. It returns ranked context (callsites, callers, transitively-touched code) without you reading every file manually.

Common pitfalls

Key bound to the wrong project

Per-project budget alerts fire against the project the key is bound to. If your key is bound to Default (or to a different project), every compression call increments the wrong counter, budget thresholds trigger at the wrong time, and per-project usage charts show nothing. Rebind via Settings → API Keys.

Key with no project binding (project_id NULL)

Legacy keys minted before the per-project update carry a project_id of null and fall back to the user-scoped Default rollup. All traffic appears under Default, polluting that project’s stats. Verify with project_stats(): if project_name returns "Default" but you created a dedicated project, the key needs rebinding.

.mcp.json environment variable not set before launching Claude Code

The .mcp.json substitution reads the shell environment at session start, not from .env.local or any dotenv file. If GOTCONTEXT_KEY is only in .env.local the MCP server will fail to authenticate. Run export GOTCONTEXT_KEY=gc_... in your shell before launching Claude Code, or add it to your shell profile.

Per-project counts not incrementing after rebind

The plan cache has a 5-minute TTL (Upstash). After rebinding a key to a new project, allow up to 5 minutes before project_stats() reflects the new attribution. Counts already attributed to the old project do not retroactively move.

Hitting something not covered here? The full Troubleshooting guide walks through missing tools, 401s, the 421 Invalid Host error, plan gates, rate limits, and self-hosted gotchas, each with the exact fix.

5-Minute Tutorial#

Once your MCP client is connected, run this four-step workflow to see gotcontext.ai in action. Each step is a single MCP tool call. Tell your agent to call the tool.

Step 1: Ingest a document

Tell your agent to call ingest_context with a file_id and the document text. The tool stores a compressed index and returns a doc_id.

ingest_context(
        file_id="readme",
        content="# My Project
...",
        title="Project README"
      )
      # Returns: { doc_id: "doc_abc123", tokens_before: 1840, tokens_after: 312 }

Step 2: Read the compressed skeleton

Call read_skeleton with the doc_id from step 1. Compression is adaptive: small and medium documents stay faithful (most sections kept, with meaningful token savings), while large documents compress aggressively for the biggest savings. The skeleton anchors the most important sections and summarises the rest, and compression is reversible: call modulate_region on any summarised node to expand it back to full fidelity on demand. Nothing is discarded; the original is always reachable.

read_skeleton(doc_id="doc_abc123")
      # Returns an adaptive structural skeleton — anchored sections (headings,
      # key facts, code signatures) plus short summaries for referenced sections.
      # Expand any referenced section:
      #   modulate_region(node_ids=["doc_abc123_n3"], fidelity_level="DETAILED")

For a targeted read, anchor the sections relevant to a question with selection_mode="evidence_aware" plus a query (and an optional top_k):

read_skeleton(
        doc_id="doc_abc123",
        selection_mode="evidence_aware",
        query="how does authentication work",
        top_k=5
      )
      # Force-anchors the sections most relevant to the query

Step 3: Search for a specific section

Use search_semantic to find the most relevant chunks without loading the full document. Useful when your context window is tight.

search_semantic(
        query="how does authentication work",
        doc_id="doc_abc123",
        top_k=3
      )
      # Returns top-3 semantically matching chunks

Step 4: Compress CLI output on the fly

Pipe verbose terminal output through filter_cli_output before it lands in your agent context. Works with git diff, pytest -v, and build logs.

filter_cli_output(
        content=open("pytest_output.txt").read(),
        source="pytest"
      )
      # Returns condensed failure summary — typically 70–90% smaller

What you just did: ingested a document, retrieved its semantic skeleton, searched within it, and compressed CLI output, all through your AI agent with no REST calls and no local setup. Run tool_help(tool_name="ingest_context") for inline docs on any tool, or get_compression_presets() to tune fidelity.

Built-in prompts & resources

The gateway also serves MCP prompts and resources, so they appear in your client (Claude Code, Cursor, and others) the moment you connect — no extra setup. Prompts are ready-to-run workflows that chain the tools below; resources are read-only context your agent pulls on demand. Both are available on every plan.

7 workflow prompts

compress-large-file, review-pr-diff, understand-codebase, debug-failing-test, find-then-expand, pre-flight, lookup-framework-docs

3 context resources

gotcontext://catalog/tools, gotcontext://docs/quickstart, gotcontext://savings/global

What's next

Recipes
Exact tool sequences for PR review, CI output, large-file ingestion, and batch audits.
Full Tool Catalog
All 148 tools by category. Run gc_pre_flight() to see which tools your plan includes.
Fidelity Profiles
Pick a compression level (skeleton through verbatim) per session or per call.
Troubleshooting
Missing tools, 401s, 421 Invalid Host, plan gates, rate limits, self-hosted fixes.

MCP Tool Catalog#

The MCP gateway exposes 148 tools in two profiles. Pass ?profile=core to your MCP URL for a lean 7-tool set (fastest tools/list response, recommended for bandwidth-constrained clients), or ?profile=full (default) for all 148. Use tool_help(tool_name="X") at runtime to get the full parameter schema for any tool without leaving your agent session.

Ingest & Read

  • ingest_context: store + compress a document
  • read_skeleton: get the compressed outline
  • batch_ingest_documents: async bulk ingest (up to 50)
  • ingest_multimodal: PDF, image, audio ingestion
  • refresh_document: re-ingest when source changes

Search & Retrieve

  • search_semantic: embedding-based chunk search
  • search_code: BM25 + AST-aware code search
  • search_memory: retrieve from agent memory
  • get_context_block: fetch a specific chunk by id
  • list_documents: enumerate ingested docs

CLI & Output Filters

  • filter_cli_output: compress git diff, pytest, logs
  • compress_codebase: AST-aware repo digest
  • gc_blast_radius: ranked context for a symbol
  • gc_compress_manifest: shrink MCP tools/list payload
  • estimate_tokens: count tokens before compressing

Context & Memory

  • add_memory: persist a fact across sessions
  • check_budget: context-window utilization check
  • adapt_to_context_window: auto-trim to fit model limit
  • advise_context: recommend compression vs clear
  • gc_pre_flight: call this first — returns a verdict + a mode field (scout/compress/read/write/idle) telling you which tool to reach for next

Knowledge Hub

  • gc_kb_ingest: add a file/URL to your KB
  • gc_kb_query: semantic search across KB
  • gc_kb_get: retrieve a KB document
  • gc_kb_list: list KB items in a project
  • gc_kb_diff: compare two KB document versions

Free Tier (no API key required)

  • gc_lookup: fetch live framework docs (Next.js, FastAPI, React…)
  • tool_help: inline parameter docs for any tool
  • get_compression_presets: list fidelity levels
  • check_environment: verify connectivity and plan
  • estimate_tokens: count tokens (no compression charged)

The full 148-tool list with parameter schemas is available in the OpenAPI spec and via the A2A agent card at /.well-known/agent.json.

REST quickstart#

Get your API key from the dashboard, then make your first compression call:

POSTRun sample request
{
  "text": "gotcontext.ai is a semantic compression API for large-language-model context windows. It reduces token usage by 80–90% on medium-to-large documents through graph-based PageRank analysis, without losing the meaning that drives accurate model responses.\n\nArchitecture overview\n\nThe core pipeline has four stages:\n\n1. Chunking. The document is split into overlapping windows of 200–400 tokens. Window size is configurable; the default balances granularity against embedding cost.\n\n2. Embedding. Each chunk is encoded into a high-dimensional vector using an ONNX-exported sentence-transformer model (all-MiniLM-L6-v2 by default; Pro/Team/Enterprise tiers use accelerated ONNX with INT8 quantisation at 3–5x throughput). Embeddings run fully in-process — no external embedding API call is made, which keeps latency under 90 ms end-to-end for most documents.\n\n3. Graph construction and PageRank. A similarity graph is built where each chunk is a node and edges are drawn when the cosine similarity exceeds a configurable threshold (default: 0.35). The graph is then scored with a damped PageRank (damping factor 0.85). High-rank chunks are the semantic backbone of the document.\n\n4. Skeleton assembly. Chunks are sorted by PageRank score. The top K chunks — where K is determined by the requested fidelity level — are concatenated in original document order (not score order, which preserves narrative flow). The result is a compressed skeleton.\n\nFidelity levels\n\ngotcontext supports five named fidelity levels:\n\n- abstract: retains ~5% of chunks. Keeps only the highest-PageRank semantic backbone. Use for fast fact-retrieval where reasoning across the full document is not required.\n- outline: retains ~10% of chunks. Preserves top-level structure and key claims. Good for getting a structural overview before diving into sections.\n- balanced (default): retains ~20% of chunks. The recommended starting point for most documents — strong compression while keeping enough context for accurate model responses.\n- detailed: retains ~40% of chunks. Recommended for legal, medical, or compliance documents where missing a clause is costly.\n- raw: returns the original document unchanged. Use when you want the token-count and cost-estimate analytics without applying compression.\n\nAPI surface\n\nPOST /v1/compress is the primary endpoint. It accepts a JSON body with:\n\n- text (required): the document string. Maximum size depends on plan: 100 KB free, 1 MB Pro, 5 MB Team, 10 MB Enterprise.\n- fidelity (optional, default \"balanced\"): one of the four levels above.\n- model (optional): the target LLM model name, used only for cost estimation in the response stats. Does not change compression behaviour.\n- output_style (optional, v1.4.0+): \"prose\" | \"bullets\" | \"structured\". Controls the skeleton format. \"prose\" stitches chunks with light connectors; \"bullets\" prefixes each chunk with a dash; \"structured\" emits a JSON object with section labels.\n\nThe response body includes:\n\n- compressed: the compressed skeleton string.\n- stats.original_tokens: token count of the input.\n- stats.compressed_tokens: token count of the skeleton.\n- stats.tokens_saved: the difference.\n- stats.savings_pct: percentage reduction (0–100).\n- stats.estimated_cost_saved_usd: dollar estimate at the model's published input price, or at Opus 4.7 rates ($5/MTok input) when no model is specified.\n\nMCP integration\n\ngotcontext exposes a Streamable-HTTP MCP server at https://api.gotcontext.ai/mcp. This lets Claude Code, Cursor, Windsurf, Gemini CLI, and OpenAI Codex CLI call gotcontext compression directly as a tool — the LLM reads a long document, routes it through gotcontext, and continues reasoning on the compressed skeleton. The round-trip latency is below the tool-call overhead in all three clients.\n\nTool plan gating: the core compress tool is available on all plans. gc_blast_radius (structural code analysis via tensor-grep BM25) and gc_compress_manifest (MCP tool-schema compression, new in v1.8.0) are Pro+ tools.\n\nAuthentication\n\nThree auth modes are supported:\n\n- gc_ API key: HMAC-signed key created from the dashboard. Pass as Authorization: Bearer gc_<key>. Rate limits apply per key.\n- Clerk JWT: used by the dashboard and MCP server. The session token issued by Clerk is accepted on every /v1/* route.\n- Polar license (self-hosted): Ed25519-signed license key validated locally by the self-hosted binary. Metering events are batched and reported asynchronously.\n\nPrompt-cache integration\n\nFrom v1.1.0, gotcontext is aware of provider prompt-cache semantics. When a document has been compressed before with identical fidelity and the cached embedding is still valid, the response includes X-Cache-Hit: true and the latency drops to under 10 ms (cache read only, no embedding pass). The /v1/usage/by-cache endpoint breaks down savings into compression-only and cache-adjusted figures, which the dashboard Cache-Adjusted Savings widget visualises.",
  "fidelity": "balanced"
}
See curl
curl -X POST https://api.gotcontext.ai/v1/demo/compress \
  -H 'Content-Type: application/json' \
  -d '{"text":"gotcontext.ai is a semantic compression API for large-language-model context windows. It reduces token usage by 80–90% on medium-to-large documents through graph-based PageRank analysis, without losing the meaning that drives accurate model responses.\n\nArchitecture overview\n\nThe core pipeline has four stages:\n\n1. Chunking. The document is split into overlapping windows of 200–400 tokens. Window size is configurable; the default balances granularity against embedding cost.\n\n2. Embedding. Each chunk is encoded into a high-dimensional vector using an ONNX-exported sentence-transformer model (all-MiniLM-L6-v2 by default; Pro/Team/Enterprise tiers use accelerated ONNX with INT8 quantisation at 3–5x throughput). Embeddings run fully in-process — no external embedding API call is made, which keeps latency under 90 ms end-to-end for most documents.\n\n3. Graph construction and PageRank. A similarity graph is built where each chunk is a node and edges are drawn when the cosine similarity exceeds a configurable threshold (default: 0.35). The graph is then scored with a damped PageRank (damping factor 0.85). High-rank chunks are the semantic backbone of the document.\n\n4. Skeleton assembly. Chunks are sorted by PageRank score. The top K chunks — where K is determined by the requested fidelity level — are concatenated in original document order (not score order, which preserves narrative flow). The result is a compressed skeleton.\n\nFidelity levels\n\ngotcontext supports five named fidelity levels:\n\n- abstract: retains ~5% of chunks. Keeps only the highest-PageRank semantic backbone. Use for fast fact-retrieval where reasoning across the full document is not required.\n- outline: retains ~10% of chunks. Preserves top-level structure and key claims. Good for getting a structural overview before diving into sections.\n- balanced (default): retains ~20% of chunks. The recommended starting point for most documents — strong compression while keeping enough context for accurate model responses.\n- detailed: retains ~40% of chunks. Recommended for legal, medical, or compliance documents where missing a clause is costly.\n- raw: returns the original document unchanged. Use when you want the token-count and cost-estimate analytics without applying compression.\n\nAPI surface\n\nPOST /v1/compress is the primary endpoint. It accepts a JSON body with:\n\n- text (required): the document string. Maximum size depends on plan: 100 KB free, 1 MB Pro, 5 MB Team, 10 MB Enterprise.\n- fidelity (optional, default \"balanced\"): one of the four levels above.\n- model (optional): the target LLM model name, used only for cost estimation in the response stats. Does not change compression behaviour.\n- output_style (optional, v1.4.0+): \"prose\" | \"bullets\" | \"structured\". Controls the skeleton format. \"prose\" stitches chunks with light connectors; \"bullets\" prefixes each chunk with a dash; \"structured\" emits a JSON object with section labels.\n\nThe response body includes:\n\n- compressed: the compressed skeleton string.\n- stats.original_tokens: token count of the input.\n- stats.compressed_tokens: token count of the skeleton.\n- stats.tokens_saved: the difference.\n- stats.savings_pct: percentage reduction (0–100).\n- stats.estimated_cost_saved_usd: dollar estimate at the model's published input price, or at Opus 4.7 rates ($5/MTok input) when no model is specified.\n\nMCP integration\n\ngotcontext exposes a Streamable-HTTP MCP server at https://api.gotcontext.ai/mcp. This lets Claude Code, Cursor, Windsurf, Gemini CLI, and OpenAI Codex CLI call gotcontext compression directly as a tool — the LLM reads a long document, routes it through gotcontext, and continues reasoning on the compressed skeleton. The round-trip latency is below the tool-call overhead in all three clients.\n\nTool plan gating: the core compress tool is available on all plans. gc_blast_radius (structural code analysis via tensor-grep BM25) and gc_compress_manifest (MCP tool-schema compression, new in v1.8.0) are Pro+ tools.\n\nAuthentication\n\nThree auth modes are supported:\n\n- gc_ API key: HMAC-signed key created from the dashboard. Pass as Authorization: Bearer gc_<key>. Rate limits apply per key.\n- Clerk JWT: used by the dashboard and MCP server. The session token issued by Clerk is accepted on every /v1/* route.\n- Polar license (self-hosted): Ed25519-signed license key validated locally by the self-hosted binary. Metering events are batched and reported asynchronously.\n\nPrompt-cache integration\n\nFrom v1.1.0, gotcontext is aware of provider prompt-cache semantics. When a document has been compressed before with identical fidelity and the cached embedding is still valid, the response includes X-Cache-Hit: true and the latency drops to under 10 ms (cache read only, no embedding pass). The /v1/usage/by-cache endpoint breaks down savings into compression-only and cache-adjusted figures, which the dashboard Cache-Adjusted Savings widget visualises.","fidelity":"balanced"}'
curl -X POST https://api.gotcontext.ai/v1/compress \
        -H "Authorization: Bearer gc_your_key_here" \
        -H "Content-Type: application/json" \
        -d '{"text": "Your document text here...", "fidelity": "balanced"}'

Authentication#

All API requests require a Bearer token in the Authorization header. Two token types are supported:

API Keys (recommended)

Prefixed with gc_. Create keys in the dashboard or viaPOST /v1/keys. Keys are permanent until revoked and can be rotated at any time.

Authorization: Bearer gc_a1b2c3d4e5f6...

Clerk JWT (session tokens)

Short-lived tokens issued by Clerk after sign-in. Used automatically by the dashboard frontend. For programmatic access, API keys are preferred.

Authorization: Bearer eyJhbGciOi...

SDKs & Plugins#

Pre-built clients wrap the REST API so you don't need raw fetch() calls. All clients return the same response shape as the REST API.

TypeScript / JavaScript

@gotcontext/sdk: published to npm. Zero runtime dependencies.

npm install @gotcontext/sdk
import { GotContextClient } from "@gotcontext/sdk";
      const gc = new GotContextClient({ apiKey: "gc_your_key_here" });
      const { compressed, stats } = await gc.compress({ text: "...", fidelity: "balanced" });

Python

gotcontext: published to PyPI.

pip install gotcontext
from gotcontext import GotContext
      gc = GotContext(api_key="gc_your_key_here")
      result = gc.compress(text="...", fidelity="balanced")
      print(result.stats.savings_pct)

Claude Code Plugin

One command installs the gotcontext plugin: pre-wired MCP config plus 9 outcome-oriented skills (shrink-for-claude, ingest-docs, review-pr-diff, extract-api-surface, batch-compress, session-summary, pre-flight, compress-mcp-manifest, quick-start).

/plugin marketplace add oimiragieo/gotcontext-plugin

Agent-to-Agent (A2A) Discovery

Agent frameworks can autodiscover all 148 MCP tools from the Linux Foundation Agent2Agent v1.0 card, no human required.

GET https://api.gotcontext.ai/.well-known/agent.json

For machine-readable product metadata and alternatives comparison, see llms.txt, OpenAPI, and /compare.

Where to next

Quickstart
Connect your MCP client and run your first compression in under two minutes.
Recipes
Copy-paste tool sequences for the workflows you'll run most.
Troubleshooting
Symptom-to-fix for the errors people actually hit.
Glossary
Plain-language definitions for skeleton, fidelity, profiles, and the rest.