API Reference

Updated May 2026View as Markdown llms.txt

gotcontext.ai compresses text and code before it enters an AI context window — saving up to 95% of input tokens. The REST API is simple: send text, get back a compressed skeleton plus token stats.

Quick Start#

Get your API key from the dashboard, then make your first compression call:

POSTTry POST /v1/compress (no signup required)

{
"text": "gotcontext.ai is a semantic compression API for large-language-model context windows. It reduces token usage by 80–90% on medium-to-large documents through graph-based PageRank analysis, without losing the meaning that drives accurate model responses.\n\nArchitecture overview\n\nThe core pipeline has four stages:\n\n1. Chunking. The document is split into overlapping windows of 200–400 tokens. Window size is configurable; the default balances granularity against embedding cost.\n\n2. Embedding. Each chunk is encoded into a high-dimensional vector using an ONNX-exported sentence-transformer model (all-MiniLM-L6-v2 by default; Pro/Team/Enterprise tiers use accelerated ONNX with INT8 quantisation at 3–5x throughput). Embeddings run fully in-process — no external embedding API call is made, which keeps latency under 90 ms end-to-end for most documents.\n\n3. Graph construction and PageRank. A similarity graph is built where each chunk is a node and edges are drawn when the cosine similarity exceeds a configurable threshold (default: 0.35). The graph is then scored with a damped PageRank (damping factor 0.85). High-rank chunks are the semantic backbone of the document.\n\n4. Skeleton assembly. Chunks are sorted by PageRank score. The top K chunks — where K is determined by the requested fidelity level — are concatenated in original document order (not score order, which preserves narrative flow). The result is a compressed skeleton.\n\nFidelity levels\n\ngotcontext supports four named fidelity profiles:\n\n- aggressive: retains the top 10% of chunks. Suitable for information retrieval tasks where a model only needs to find facts, not reason across them.\n- balanced (default): retains the top 30% of chunks. Tested at 87.4% token reduction on real quantum-computing papers while preserving answer accuracy on retrieval benchmarks.\n- conservative: retains the top 50% of chunks. Recommended for legal, medical, or compliance documents where missing a clause is costly.\n- lossless: returns the original document unchanged. Used when the caller wants the analytics (token counts, estimated cost savings) without any actual compression.\n\nAPI surface\n\nPOST /v1/compress is the primary endpoint. It accepts a JSON body with:\n\n- text (required): the document string. Maximum size depends on plan: 100 KB free, 1 MB Pro, 5 MB Team, 10 MB Enterprise.\n- fidelity (optional, default \"balanced\"): one of the four levels above.\n- model (optional): the target LLM model name, used only for cost estimation in the response stats. Does not change compression behaviour.\n- output_style (optional, v1.4.0+): \"prose\" | \"bullets\" | \"structured\". Controls the skeleton format. \"prose\" stitches chunks with light connectors; \"bullets\" prefixes each chunk with a dash; \"structured\" emits a JSON object with section labels.\n\nThe response body includes:\n\n- compressed: the compressed skeleton string.\n- stats.original_tokens: token count of the input.\n- stats.compressed_tokens: token count of the skeleton.\n- stats.tokens_saved: the difference.\n- stats.savings_pct: percentage reduction (0–100).\n- stats.estimated_cost_saved_usd: dollar estimate at the model's published input price, or at Opus 4.7 rates ($5/MTok input) when no model is specified.\n\nMCP integration\n\ngotcontext exposes a Streamable-HTTP MCP server at https://api.gotcontext.ai/mcp. This lets Claude Code, Cursor, Windsurf, Gemini CLI, and OpenAI Codex CLI call gotcontext compression directly as a tool — the LLM reads a long document, routes it through gotcontext, and continues reasoning on the compressed skeleton. The round-trip latency is below the tool-call overhead in all three clients.\n\nTool plan gating: the core compress tool is available on all plans. gc_blast_radius (structural code analysis via tensor-grep BM25) and gc_compress_manifest (MCP tool-schema compression, new in v1.8.0) are Pro+ tools.\n\nAuthentication\n\nThree auth modes are supported:\n\n- gc_ API key: HMAC-signed key created from the dashboard. Pass as Authorization: Bearer gc_<key>. Rate limits apply per key.\n- Clerk JWT: used by the dashboard and MCP server. The session token issued by Clerk is accepted on every /v1/* route.\n- Polar license (self-hosted): Ed25519-signed license key validated locally by the self-hosted binary. Metering events are batched and reported asynchronously.\n\nBilling and plan limits\n\nBilling is handled by Polar. Plans are Free (1,000 compressions/month), Pro ($49/month, 50,000 compressions), Team ($99/month, 100,000 compressions pooled), Enterprise ($199/month, 500,000+ compressions, self-hosted/OIDC/audit), and Enterprise Dedicated ($499/month, reserved-capacity pool). Overage is blocked by default; Pro+ plans can enable metered overage from the dashboard.\n\nUsage is tracked per user_id and reset on the billing-cycle anniversary. The /v1/usage endpoint returns current period usage, plan name, limits, and — from v1.5.5 onward — a plan field that the dashboard reads directly to avoid the three-page inference drift that existed pre-v1.5.5.\n\nPrompt-cache integration\n\nFrom v1.1.0, gotcontext is aware of provider prompt-cache semantics. When a document has been compressed before with identical fidelity and the cached embedding is still valid, the response includes X-Cache-Hit: true and the latency drops to under 10 ms (cache read only, no embedding pass). The /v1/usage/by-cache endpoint breaks down savings into compression-only and cache-adjusted figures, which the dashboard Cache-Adjusted Savings widget visualises.\n\nSelf-hosted deployment\n\nOperators can run gotcontext on their own infrastructure using the Docker image at ghcr.io/oimiragieo/gotcontext-api. The image bundles the token-saver-5000 compression engine, the FastAPI application, and a pre-staged Claude Code plugin bundle at /app/plugins/gotcontext/. Operators point the MCP config at their own endpoint and distribute the modified plugin bundle internally. License keys are Ed25519-signed and verified locally; no phone-home is required for air-gapped deployments.",
"fidelity": "balanced"
}

See curl

curl -X POST https://api.gotcontext.ai/v1/demo/compress \
-H 'Content-Type: application/json' \
-d '{"text":"gotcontext.ai is a semantic compression API for large-language-model context windows. It reduces token usage by 80–90% on medium-to-large documents through graph-based PageRank analysis, without losing the meaning that drives accurate model responses.\n\nArchitecture overview\n\nThe core pipeline has four stages:\n\n1. Chunking. The document is split into overlapping windows of 200–400 tokens. Window size is configurable; the default balances granularity against embedding cost.\n\n2. Embedding. Each chunk is encoded into a high-dimensional vector using an ONNX-exported sentence-transformer model (all-MiniLM-L6-v2 by default; Pro/Team/Enterprise tiers use accelerated ONNX with INT8 quantisation at 3–5x throughput). Embeddings run fully in-process — no external embedding API call is made, which keeps latency under 90 ms end-to-end for most documents.\n\n3. Graph construction and PageRank. A similarity graph is built where each chunk is a node and edges are drawn when the cosine similarity exceeds a configurable threshold (default: 0.35). The graph is then scored with a damped PageRank (damping factor 0.85). High-rank chunks are the semantic backbone of the document.\n\n4. Skeleton assembly. Chunks are sorted by PageRank score. The top K chunks — where K is determined by the requested fidelity level — are concatenated in original document order (not score order, which preserves narrative flow). The result is a compressed skeleton.\n\nFidelity levels\n\ngotcontext supports four named fidelity profiles:\n\n- aggressive: retains the top 10% of chunks. Suitable for information retrieval tasks where a model only needs to find facts, not reason across them.\n- balanced (default): retains the top 30% of chunks. Tested at 87.4% token reduction on real quantum-computing papers while preserving answer accuracy on retrieval benchmarks.\n- conservative: retains the top 50% of chunks. Recommended for legal, medical, or compliance documents where missing a clause is costly.\n- lossless: returns the original document unchanged. Used when the caller wants the analytics (token counts, estimated cost savings) without any actual compression.\n\nAPI surface\n\nPOST /v1/compress is the primary endpoint. It accepts a JSON body with:\n\n- text (required): the document string. Maximum size depends on plan: 100 KB free, 1 MB Pro, 5 MB Team, 10 MB Enterprise.\n- fidelity (optional, default \"balanced\"): one of the four levels above.\n- model (optional): the target LLM model name, used only for cost estimation in the response stats. Does not change compression behaviour.\n- output_style (optional, v1.4.0+): \"prose\" | \"bullets\" | \"structured\". Controls the skeleton format. \"prose\" stitches chunks with light connectors; \"bullets\" prefixes each chunk with a dash; \"structured\" emits a JSON object with section labels.\n\nThe response body includes:\n\n- compressed: the compressed skeleton string.\n- stats.original_tokens: token count of the input.\n- stats.compressed_tokens: token count of the skeleton.\n- stats.tokens_saved: the difference.\n- stats.savings_pct: percentage reduction (0–100).\n- stats.estimated_cost_saved_usd: dollar estimate at the model's published input price, or at Opus 4.7 rates ($5/MTok input) when no model is specified.\n\nMCP integration\n\ngotcontext exposes a Streamable-HTTP MCP server at https://api.gotcontext.ai/mcp. This lets Claude Code, Cursor, Windsurf, Gemini CLI, and OpenAI Codex CLI call gotcontext compression directly as a tool — the LLM reads a long document, routes it through gotcontext, and continues reasoning on the compressed skeleton. The round-trip latency is below the tool-call overhead in all three clients.\n\nTool plan gating: the core compress tool is available on all plans. gc_blast_radius (structural code analysis via tensor-grep BM25) and gc_compress_manifest (MCP tool-schema compression, new in v1.8.0) are Pro+ tools.\n\nAuthentication\n\nThree auth modes are supported:\n\n- gc_ API key: HMAC-signed key created from the dashboard. Pass as Authorization: Bearer gc_<key>. Rate limits apply per key.\n- Clerk JWT: used by the dashboard and MCP server. The session token issued by Clerk is accepted on every /v1/* route.\n- Polar license (self-hosted): Ed25519-signed license key validated locally by the self-hosted binary. Metering events are batched and reported asynchronously.\n\nBilling and plan limits\n\nBilling is handled by Polar. Plans are Free (1,000 compressions/month), Pro ($49/month, 50,000 compressions), Team ($99/month, 100,000 compressions pooled), Enterprise ($199/month, 500,000+ compressions, self-hosted/OIDC/audit), and Enterprise Dedicated ($499/month, reserved-capacity pool). Overage is blocked by default; Pro+ plans can enable metered overage from the dashboard.\n\nUsage is tracked per user_id and reset on the billing-cycle anniversary. The /v1/usage endpoint returns current period usage, plan name, limits, and — from v1.5.5 onward — a plan field that the dashboard reads directly to avoid the three-page inference drift that existed pre-v1.5.5.\n\nPrompt-cache integration\n\nFrom v1.1.0, gotcontext is aware of provider prompt-cache semantics. When a document has been compressed before with identical fidelity and the cached embedding is still valid, the response includes X-Cache-Hit: true and the latency drops to under 10 ms (cache read only, no embedding pass). The /v1/usage/by-cache endpoint breaks down savings into compression-only and cache-adjusted figures, which the dashboard Cache-Adjusted Savings widget visualises.\n\nSelf-hosted deployment\n\nOperators can run gotcontext on their own infrastructure using the Docker image at ghcr.io/oimiragieo/gotcontext-api. The image bundles the token-saver-5000 compression engine, the FastAPI application, and a pre-staged Claude Code plugin bundle at /app/plugins/gotcontext/. Operators point the MCP config at their own endpoint and distribute the modified plugin bundle internally. License keys are Ed25519-signed and verified locally; no phone-home is required for air-gapped deployments.","fidelity":"balanced"}'

curl -X POST https://api.gotcontext.ai/v1/compress \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text": "Your document text here...", "fidelity": "balanced"}'

MCP Server#

gotcontext.ai hosts a remote MCP (Model Context Protocol) server at https://api.gotcontext.ai/mcp. Connect any MCP-compatible AI tool to get compression, ingestion, and context management tools without running anything locally.

Free plans include 17 core compression tools (compression, advisory, budget awareness) and 1,000 compressions/month for validation. Pro, Team, and Enterprise all unlock the same 100+ MCP tools — including ACE (Agent Context Engineering), knowledge management, multimodal ingestion, quality detection, memory, prompt cache, connectors, handoffs, and experiments. Tiers differ on monthly compression volume, embedding fidelity, and enterprise wraparound (self-hosted Docker, OIDC/SSO, audit-log export, dedicated SLA, named support, custom contract) — not on which tools you can call.

Claude Code

{
  "mcpServers": {
    "gotcontext": {
      "url": "https://api.gotcontext.ai/mcp",
      "headers": {
        "Authorization": "Bearer gc_your_key_here"
      }
    }
  }
}

Cursor

{
  "mcpServers": {
    "gotcontext": {
      "url": "https://api.gotcontext.ai/mcp",
      "headers": {
        "Authorization": "Bearer gc_your_key_here"
      }
    }
  }
}

VS Code (settings.json)

{
  "mcp": {
    "servers": {
      "gotcontext": {
        "url": "https://api.gotcontext.ai/mcp",
        "headers": {
          "Authorization": "Bearer gc_your_key_here"
        }
      }
    }
  }
}

Gemini CLI (settings.json)

{
  "mcpServers": {
    "gotcontext": {
      "url": "https://api.gotcontext.ai/mcp",
      "type": "http",
      "headers": {
        "Authorization": "Bearer gc_your_key_here"
      },
      "timeout": 30000
    }
  }
}

Authentication

All MCP connections require a gc_-prefixed API key passed in the Authorization header. Create one from your dashboard.

For custom MCP clients

The MCP endpoint uses Streamable HTTP transport. Requests must include Accept: application/json, text/event-stream and carry the Mcp-Session-Id header from the initialize response on all subsequent calls. Claude Code, Cursor, and VS Code handle this automatically.

Recommended: Add project instructions

For best results, add a CLAUDE.md (or AGENTS.md) to your project root so the AI knows when and how to use gotcontext compression. Without this, the AI may not use the tools effectively. Copy this starter:

# gotcontext.ai Compression

This project uses gotcontext.ai for semantic compression via MCP.

## When to compress
- Before sending large files or docs to the AI context window
- When terminal output is verbose (git diff, test results, logs)
- When reviewing code across many files
- Before reviewing a PR or explaining a diff — compress the changed
  files or run `gc_blast_radius` to see only transitively-touched code

## Compression workflow
1. Use `ingest_context` to add a document (give it a unique file_id)
2. Use `read_skeleton` to get the compressed version
3. Use `search_semantic` to find specific sections by query
4. Use `filter_cli_output` to compress git diffs, pytest output, etc.

## Code understanding (Pro+)
- `compress_codebase` — AST-aware digest of an entire repo; function
  and class signatures only, bodies stripped
- `gc_blast_radius` — ranked context for a focus symbol: tensor-grep
  blast-radius + BM25 fusion. Best for PR review and bug triage
- `gc_compress_manifest` — compress an MCP tools/list response so
  downstream agents see shorter tool descriptions without losing
  inputSchema semantics (v1.8.0+)
- `batch_ingest_documents` — submit up to 50 docs as one async job;
  poll status via `GET /v1/batch-queue/{id}`

## Tips
- Use `estimate_tokens` first to see if compression is worthwhile
- For code files, the compressor understands function/class boundaries
- Use `get_compression_presets` to see available fidelity levels
- Call `tool_help` for documentation on any specific tool

Authentication#

All API requests require a Bearer token in the Authorization header. Two token types are supported:

API Keys (recommended)

Prefixed with gc_. Create keys in the dashboard or viaPOST /v1/keys. Keys are permanent until revoked and can be rotated at any time.

Authorization: Bearer gc_a1b2c3d4e5f6...

Clerk JWT (session tokens)

Short-lived tokens issued by Clerk after sign-in. Used automatically by the dashboard frontend. For programmatic access, API keys are preferred.

Authorization: Bearer eyJhbGciOi...

Compression#

POST/v1/compress

Compress any text document using graph-based semantic compression. Achieves 80–95% token reduction on medium-to-large documents. Optionally supply a query to guide the compressor toward sections most relevant to your question.

AuthBearer token required

Fidelity levels: abstract (5% kept), outline (10%), balanced (20%), detailed (40%), raw (100% — no compression). Small documents under 100 tokens may expand slightly due to skeleton overhead.

Request body

{
  "text": string,       // required — document to compress (min 1 char)
  "fidelity": string,   // optional — "abstract" | "outline" | "balanced" | "detailed" | "raw"
                        //            default: "balanced"
  "query": string|null, // optional — query-guided mode; prioritises relevant sections
  "cost_model": string|null // optional — model name for cost estimate (e.g. "claude-opus-4")
}

Response

{
  "compressed": string,   // compressed skeleton text
  "stats": {
    "original_tokens": number,
    "compressed_tokens": number,
    "savings_pct": number,        // e.g. 87.4
    "compression_ratio": number,  // e.g. 7.9
    "estimated_cost_saved": string|null  // e.g. "$0.042" — only when cost_model supplied
  }
}

curl -X POST https://api.gotcontext.ai/v1/compress \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Transformer models fundamentally changed NLP...",
    "fidelity": "balanced",
    "query": "attention mechanism",
    "cost_model": "claude-sonnet-4-6"
  }'

Error responses

400Invalid fidelity value — valid options: abstract, outline, balanced, detailed, raw

422Missing or empty text field (Pydantic validation)

401Missing or invalid Bearer token

429Rate limit exceeded (see Rate Limits section)

POST/v1/compress-code

AST-aware code compression. Parses function/class boundaries, extracts imports and docstrings, ranks symbols by PageRank on the dependency graph. Returns a skeleton preserving signatures and docstrings. Significantly better than plain text compression for code.

AuthBearer token required

Supported languages with AST-native parsing: Python. JavaScript/TypeScript use regex-based chunking. Java, Go, Rust, C++ fall back to line-based chunking.

Request body

{
  "code": string,        // required — source code to compress (min 1 char)
  "language": string|null, // optional — hint: "python"|"javascript"|"typescript"|"java"|"go"|"rust"|"cpp"
                           //             auto-detected from content when omitted
  "fidelity": string,    // optional — same levels as /compress, default: "balanced"
}

Response

{
  "compressed": string,
  "stats": {
    "original_tokens": number,
    "compressed_tokens": number,
    "savings_pct": number,
    "language_detected": string  // e.g. "python", "javascript", "unknown"
  }
}

curl -X POST https://api.gotcontext.ai/v1/compress-code \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "code": "def process(items):\n    ...",
    "language": "python",
    "fidelity": "balanced"
  }'

Error responses

400Invalid fidelity or unrecognised language hint

422Missing or empty code field

401Missing or invalid Bearer token

Structural Contextv1.5.0#

POST/v1/compress-code/structural

Structural code-context compression. Submit a file bundle + optional focus symbol; the server runs tensor-grep blast-radius + BM25 on the sandboxed files and returns a Reciprocal-Rank-Fusion–ranked context list. Intended for PR-diff-scale code payloads (≤1000 files, ≤512 KB each, ≤5 MB total). Measured 34% token reduction on a 10-file corpus with focus_symbol=cache_lookup vs naive full-bundle submission — see the smoke benchmark at benchmarks/blast_radius_smoke.py.

AuthBearer token required — Pro or higher

Degraded-path: if the tensor-grep binary is unavailable on the server or subprocess times out, the response is still 200 but carries X-Degraded: true and stats.degraded=true with an explanatory `message`. Never 500s on subprocess failure. The BM25 arm uses `tg search --count-matches`; a failure there degrades only the BM25 signal, leaving the graph-distance moat arm functional. The corresponding MCP tool is `gc_blast_radius` — same input/output, exposed to Claude Code via the MCP gateway.

Request body

{
  "files": [
    { "path": "src/app.py", "content": "def handle_request(): ..." },
    { "path": "src/utils.py", "content": "..." }
  ],
  "focus_symbol": "handle_request",       // optional — focus blast-radius on this symbol
  "query": "error handling",              // optional — BM25 query (defaults to focus_symbol)
  "top_k": 25                              // optional — cap on ranked_context length (1-500, default 50)
}

Response

{
  "ranked_context": [
    {
      "path": "src/app.py",
      "score": 0.031,
      "rank": 1,
      "contributing_signals": ["bm25", "graph_distance"]
    }
  ],
  "stats": {
    "files_in": 10,
    "files_ranked": 5,
    "symbols_in": 23,
    "degraded": false
  },
  "message": null   // non-null only on degraded paths (tg missing, timeout, etc.)
}

curl -X POST https://api.gotcontext.ai/v1/compress-code/structural \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "files": [
      {"path":"src/app.py","content":"def handle_request(): pass"},
      {"path":"src/utils.py","content":"..."}
    ],
    "focus_symbol": "handle_request",
    "top_k": 25
  }'

Error responses

400{error_code: "sensitive_content", marker_class: string} — a submitted file matches the secret-marker detector (PEM headers, AWS AKIA keys, OpenAI sk- keys, .ssh/id_rsa, dotenv-style secrets). Marker value is never echoed back.

400{error_code: "bad_path"} — path traversal (../), absolute POSIX/Windows root, or null-byte detected in a submitted path

413Per-file (>512 KB) or aggregate (>5 MB) size cap exceeded. Validation runs pre-subprocess.

402Free-tier plan does not include structural context. Upgrade to Pro+ from the dashboard.

401Missing or invalid Bearer token

POST/v1/batch-compress

Compress up to 50 documents in a single call. Documents are processed concurrently (max 4 at once to avoid saturating the embedding model). Each document may have its own fidelity and query. Failed documents are reported inline — the overall batch always returns 200.

AuthBearer token required

Request body

{
  "documents": [    // required — 1 to 50 items
    {
      "text": string,       // required
      "fidelity": string,   // optional, default "balanced"
      "query": string|null  // optional
    }
  ]
}

Response

{
  "results": [
    {
      "compressed": string,
      "original_tokens": number,
      "compressed_tokens": number,
      "savings_pct": number,
      "compression_ratio": number,
      "error": string|null   // set when this document failed; other fields are 0
    }
  ],
  "summary": {
    "total_documents": number,
    "successful": number,
    "failed": number,
    "total_tokens_in": number,
    "total_tokens_saved": number,
    "avg_savings_pct": number,
    "avg_compression_ratio": number
  }
}

curl -X POST https://api.gotcontext.ai/v1/batch-compress \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "documents": [
      {"text": "First document...", "fidelity": "balanced"},
      {"text": "Second document...", "query": "neural networks"},
      {"text": "Third document...", "fidelity": "outline"}
    ]
  }'

Error responses

400Empty documents list, more than 50 documents, or invalid fidelity in any document

401Missing or invalid Bearer token

POST/v1/recommend

Analyse a document and recommend the optimal fidelity level. Considers document size and (optionally) the target model's context window. Use this to automatically pick the right compression level before calling /compress.

AuthBearer token required

Fidelity rules: <500 tokens → detailed, 500–2000 → balanced, 2000–10000 → outline, >10000 → abstract. If the compressed output would exceed 70% of the target model's context window, fidelity is automatically stepped up.

Request body

{
  "text": string,           // required — document to analyse
  "model": string|null,     // optional — target model (e.g. "claude-sonnet-4-6")
  "context_window": number|null  // optional — override context window size in tokens
}

Response

{
  "recommended_fidelity": string,  // e.g. "balanced"
  "estimated_ratio": number,       // fraction of tokens kept (0.0–1.0)
  "estimated_output_tokens": number,
  "original_tokens": number,
  "reasoning": string              // human-readable explanation
}

curl -X POST https://api.gotcontext.ai/v1/recommend \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Your long document...",
    "model": "claude-sonnet-4-6"
  }'

API Keys#

Create and manage API keys programmatically. Keys are prefixed gc_ and stored as HMAC-SHA256 hashes. The raw key is returned once on creation and cannot be retrieved again.

POST/v1/keys

Create a new API key. Returns the full raw key — store it immediately.

AuthBearer token required

Request body

{
  "name": string  // required — human-readable label (1–100 chars)
}

Response

{
  "key": string,       // full raw key — shown ONCE, store securely
  "key_id": string,    // 16-char hex ID for management
  "name": string,
  "created_at": string // ISO 8601 UTC
}

curl -X POST https://api.gotcontext.ai/v1/keys \
  -H "Authorization: Bearer YOUR_CLERK_JWT" \
  -H "Content-Type: application/json" \
  -d '{"name": "Production server"}'

Error responses

422Missing name field or name too long (>100 chars)

401Authentication required

503Key storage unavailable (both Postgres and Redis down)

GET/v1/keys

List all API keys for the authenticated user. Returns masked key values — the raw key cannot be retrieved after creation.

AuthBearer token required

Response

{
  "keys": [
    {
      "key_id": string,
      "name": string,
      "masked_key": string,    // e.g. "gc_****ab12"
      "created_at": string,    // ISO 8601 UTC
      "last_used": string|null,
      "status": "active" | "revoked"
    }
  ]
}

curl https://api.gotcontext.ai/v1/keys \
  -H "Authorization: Bearer YOUR_CLERK_JWT"

DELETE/v1/keys/:id

Revoke an API key by ID. Takes effect immediately — the key is rejected by the auth middleware within milliseconds.

AuthBearer token required

Response

{
  "success": true,
  "key_id": string
}

curl -X DELETE https://api.gotcontext.ai/v1/keys/YOUR_KEY_ID \
  -H "Authorization: Bearer YOUR_CLERK_JWT"

Error responses

404Key ID not found

400Key is already revoked

401Authentication required

GET/v1/usage

Monthly compression statistics for the authenticated user. Returns compression counts, token totals, plan limit, and the next reset timestamp.

AuthBearer token required

Response

{
  "period": string,            // "YYYY-MM", e.g. "2026-04"
  "compressions_used": number,
  "compressions_limit": number, // 1000 (free) or 50000 (pro)
  "pct_used": number,          // 0.0–100.0
  "tokens_in": number,         // total original tokens this month
  "tokens_saved": number,      // total tokens eliminated this month
  "resets_at": string          // ISO 8601 UTC, midnight 1st of next month
}

curl https://api.gotcontext.ai/v1/usage \
  -H "Authorization: Bearer YOUR_API_KEY"

Billing#

Billing is handled by Polar. The checkout and portal endpoints return redirect URLs — do not call these from server-side code without a user session.

POST/v1/billing/checkout

Create a Polar checkout session to upgrade to Pro. Returns a URL to redirect the user to.

AuthBearer token required (Clerk JWT)

Request body

{
  "plan": "pro"   // currently the only valid value
}

Response

{
  "checkout_url": string  // redirect the user to this URL
}

curl -X POST https://api.gotcontext.ai/v1/billing/checkout \
  -H "Authorization: Bearer YOUR_CLERK_JWT" \
  -H "Content-Type: application/json" \
  -d '{"plan": "pro"}'

Error responses

400Unknown plan (only 'pro' is valid)

503Billing service unavailable or POLAR_PRO_PRODUCT_ID not configured

401Authentication required

POST/v1/billing/portal

Get the Polar customer portal URL to manage subscription, payment method, and invoices.

AuthBearer token required (Clerk JWT)

Response

{
  "portal_url": string  // redirect the user to this URL
}

curl -X POST https://api.gotcontext.ai/v1/billing/portal \
  -H "Authorization: Bearer YOUR_CLERK_JWT"

Error responses

404No billing account found — user has not subscribed yet

503Billing service unavailable

401Authentication required

CLI Filter#

POST/v1/filter-cli

Compress verbose CLI output such as git diffs, test results, and npm install logs. Automatically detects the command type and applies type-specific compression. Typical savings: 80-99% on verbose output.

AuthBearer token required

Auto-detection supports git diff, pytest, jest, npm/yarn install, and other common CLI formats. Provide command_hint to skip detection and use a specific compressor.

Request body

{
  "output": string,        // required — raw CLI output to compress (min 1 char)
  "command_hint": string|null // optional — hint: "git_diff", "test_output", etc.
                              //            auto-detected if omitted
}

Response

{
  "filtered": string,      // compressed CLI output
  "original_chars": number,
  "filtered_chars": number,
  "savings_pct": number,   // e.g. 92.3
  "detected_type": string|null // e.g. "git_diff", "pytest"
}

curl -X POST https://api.gotcontext.ai/v1/filter-cli \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "output": "diff --git a/src/main.py b/src/main.py\n...",
    "command_hint": "git_diff"
  }'

Error responses

422Missing or empty output field

401Missing or invalid Bearer token

503CLI filter engine not available

Savings#

GET/v1/savings

Retrieve your cumulative compression savings across all time. Shows total compressions, tokens processed, tokens saved, and an estimated dollar amount saved based on mid-range model pricing.

AuthBearer token required

Response

{
  "total_compressions": number,
  "total_tokens_in": number,
  "total_tokens_saved": number,
  "savings_pct": number,              // e.g. 87.2
  "estimated_cost_saved_usd": number  // e.g. 12.45
}

curl https://api.gotcontext.ai/v1/savings \
  -H "Authorization: Bearer YOUR_API_KEY"

Error responses

401Authentication required

Cache Audit#

POST/v1/audit-cache

Audit how cache-friendly a prompt is for a specific AI provider. Returns a cacheability score, whether the prompt is cache-friendly, actionable recommendations to improve cache hit rates, and estimated savings.

AuthBearer token required

Use this to optimise prompts for provider-specific caching (e.g. Anthropic prompt caching). Higher scores mean better cache utilisation and lower costs.

Request body

{
  "text": string,      // required — prompt or document text to audit (min 1 char)
  "provider": string   // optional — "anthropic" | "openai" | "google"
                       //            default: "anthropic"
}

Response

{
  "provider": string,
  "cache_friendly": boolean,
  "score": number,                // 0.0 - 1.0 cacheability score
  "recommendations": [string],    // actionable suggestions
  "estimated_savings_pct": number // estimated cache hit savings
}

curl -X POST https://api.gotcontext.ai/v1/audit-cache \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "You are a helpful assistant that...",
    "provider": "anthropic"
  }'

Error responses

422Missing or empty text field

401Missing or invalid Bearer token

503Cache audit service not available

Budget Check#

POST/v1/check-budget

Check how much of a model's context window a text would consume. Returns token estimates, percentage used, a status indicator (OK / WARNING / CRITICAL), and a recommendation on whether to compress.

AuthBearer token required

Status thresholds: OK (< 50% used), WARNING (50-80%), CRITICAL (> 80%). Use before sending large documents to an AI model to decide whether compression is needed.

Request body

{
  "text": string,            // required — text to check against budget (min 1 char)
  "context_window": number,  // optional — target context window in tokens
                             //            default: 200000
  "model": string            // optional — target model for cost estimation
                             //            default: "claude-opus-4"
}

Response

{
  "estimated_tokens": number,
  "context_window": number,
  "pct_used": number,       // e.g. 42.5
  "status": string,         // "OK" | "WARNING" | "CRITICAL"
  "recommendation": string  // human-readable guidance
}

curl -X POST https://api.gotcontext.ai/v1/check-budget \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Your long document or codebase...",
    "context_window": 200000,
    "model": "claude-opus-4"
  }'

Error responses

422Missing or empty text field

401Missing or invalid Bearer token

503Budget check service not available

Output Style Appendixv1.4.0#

POST /v1/compress accepts an optional style field taking one of "terse", "normal" (default), or "verbose". When you set style: "terse", the response carries a short system_prompt_suffix string plus a style_suffix_version tag. Inject the suffix into your downstream LLM's system prompt to cap output verbosity.

Independent April 2026 benchmarks measure ~63% output-token reductionwith this class of rule block, and a March 2026 brevity-constraint paper shows a 26-percentage-point accuracy gain on verbosity-induced error cases — so this is a free win, not a fidelity-for-savings tradeoff.

{
  "compressed": "Short skeleton of the document…",
  "stats": { "original_tokens": 485, "compressed_tokens": 61, "savings_pct": 87.4, ... },
  "system_prompt_suffix": "Be concise. No filler, no hedging. State conclusions first. Omit sycophancy and preambles. Fragments are fine for prose; keep code blocks normal.",
  "style_suffix_version": "v1"
}

The suffix is a versioned constant — fast, deterministic, and prompt-cache-friendly. "verbose" is reserved for a future workflow; today it returnsnull (same as "normal").

Sensitive-Content Refusev1.4.0#

POST /v1/compress and POST /v1/compress-code/structural run a pre-flight against the submitted content for unambiguous secret markers. On match, the request is refused withHTTP 400 and a machine-readable error body:

{
  "detail": {
    "error_code": "sensitive_content",
    "marker_class": "aws_access_key",
    "message": "Input appears to contain sensitive content; refusing to compress. Remove the secret value and retry."
  }
}

Detected marker classes: pem_private_key(PEM RSA/EC/DSA/OpenSSH/PGP/ENCRYPTED private-key headers), aws_access_key (AKIA+ 16-char suffix), openai_api_key (sk-or sk-proj- with ≥20-char suffix), ssh_key_path (.ssh/id_rsa|ed25519|ecdsa|dsafragments), and dotenv_secret(multi-line KEY=value with known-sensitive names like SECRET_KEY, DATABASE_URL, STRIPE_SECRET_KEY, POLAR_ACCESS_TOKEN, etc).

The matched value is never echoed. Error responses carry only the marker_class; the structured log line (sensitive-content-refuse user_id=… marker_class=…) likewise carries no content. Safe to log.

X-Fidelity-Warning Headerv1.4.0#

Every POST /v1/compress response now counts the fenced code blocks, markdown headings, and URLs present in the input and compares to the compressed output. If any class has fewer occurrences in the output, an advisory header is attached:

HTTP/1.1 200 OK
X-Fidelity-Warning: code_blocks,urls
Content-Type: application/json
…

This is advisory — the request never fails on structural loss. Dashboards can alert on high per-tenant rates; auditors can verify "did I lose structure?" without a separate call. Possible values are a comma-separated subset of code_blocks, headings, urls. Absent means all three classes were preserved.

Per-Tenant Cache Thresholdv1.4.0#

The semantic-cache uses cosine similarity to match near-duplicate requests. Research (Portkey + Tianpan, April 2026) shows correct-hit and incorrect-hit similarity distributions overlap between ~0.85 and ~0.92, so a single global threshold is wrong for every non-median workload. Two endpoints let each tenant tune their own cutoff.

GET/v1/settings/semantic-cache-threshold

Read your current cosine-similarity cutoff. source=user means you've set an override; source=global means the server-wide default applies.

AuthBearer token required

Default is the server-wide threshold, currently ~0.95 similarity (corresponds to distance < 0.05).

Request body

— no body —

Response

{
  "threshold": 0.95,
  "source": "global"   // "user" | "global"
}

curl https://api.gotcontext.ai/v1/settings/semantic-cache-threshold \
  -H "Authorization: Bearer YOUR_API_KEY"

Error responses

401Missing or invalid Bearer token

PUT/v1/settings/semantic-cache-threshold

Set or clear your per-tenant cutoff. Pass threshold: null to reset to the server default. Otherwise cosine similarity in [0.80, 0.99].

AuthBearer token required

Research-recommended values: 0.95-0.97 for factual/high-stakes workloads; 0.92 for balanced (the global default ballpark); 0.85-0.90 for FAQ/support where a slightly imprecise hit is cheap. The dashboard Semantic Cache panel surfaces this as a slider.

Request body

{ "threshold": 0.92 }

Response

{
  "threshold": 0.92,
  "source": "user"
}

curl -X PUT https://api.gotcontext.ai/v1/settings/semantic-cache-threshold \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"threshold": 0.92}'

Error responses

422threshold out of range (must be in [0.80, 0.99] or null)

401Missing or invalid Bearer token

Hit-Source Breakdownv1.4.0#

GET /v1/usage/by-cache responses now include a by_source object that splits the cache hits by which mechanism matched — the request-hash fastpath (exact) vs. the embedding-distance fallback (semantic). The invariant exact_hits + semantic_hits == semantic_cache.hitsholds across the response window.

{
  "period_days": 30,
  "semantic_cache": { "hits": 15, "misses": 35, "hit_rate": 0.30, … },
  "by_source": {
    "exact_hits": 12,
    "semantic_hits": 3,
    "misses": 35
  },
  …
}

Dashboards that rendered only the combined hitscounter previously hid which half of the cache was doing the work. The breakdown lets operators see whether a workload is benefiting from fingerprinting (exact) or from embedding-based near-duplicate matching (semantic) — and tune the per-tenant threshold (above) accordingly.

Issue Detection#

POST/v1/detect-issues

Detect hallucinations and blind spots in compressed output by comparing it against the original text. Finds claims not supported by the source (hallucinations) and critical information that was lost (blind spots). Requires a Pro or Enterprise plan.

AuthBearer token required (Pro / Enterprise)

This is a quality assurance tool for compression output. Run it after compressing important documents to verify no critical information was lost and no unsupported claims were introduced.

Request body

{
  "original_text": string,       // required — original uncompressed text (min 1 char)
  "compressed_text": string,     // required — compressed output to check (min 1 char)
  "check_hallucination": boolean, // optional — check for hallucinated content
                                  //            default: true
  "check_blind_spots": boolean   // optional — check for lost critical info
                                  //            default: true
}

Response

{
  "issues_found": number,
  "issues": [
    {
      "type": string,        // "hallucination" or "blind_spot"
      "severity": string,    // "low", "medium", or "high"
      "description": string,
      "location": string|null
    }
  ],
  "quality_score": number   // 0.0 - 1.0 (1.0 = no issues found)
}

curl -X POST https://api.gotcontext.ai/v1/detect-issues \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "original_text": "The full original document...",
    "compressed_text": "The compressed version...",
    "check_hallucination": true,
    "check_blind_spots": true
  }'

Error responses

403Requires a Pro or Enterprise plan

422Missing required text fields

401Missing or invalid Bearer token

503Issue detection service not available

Error Codes#

All errors return JSON with a detail field describing the problem.

// Example error response
{
  "detail": "Invalid fidelity 'garbage'. Valid: ['abstract', 'outline', 'balanced', 'detailed', 'raw']"
}

400

Bad Request

Invalid parameter value (e.g. invalid fidelity, unknown plan, already-revoked key).

401

Unauthorized

Missing, expired, or invalid Bearer token.

404

Not Found

Resource not found (e.g. unknown key_id).

422

Unprocessable Entity

Pydantic validation failed — missing required field or wrong type.

429

Too Many Requests

Rate limit exceeded. Check the Retry-After header.

500

Internal Server Error

Unexpected server error. Retry with exponential back-off.

503

Service Unavailable

Dependency unavailable (Redis, Postgres, or billing service).

Rate Limits#

PlanRate limitMonthly compressions

Free10 requests / minute1,000 / month

Pro100 requests / minute50,000 / month

Monthly limits reset at midnight UTC on the 1st of each month. Check GET /v1/usage for your current consumption. When you hit the rate limit, the API responds with HTTP 429 and a Retry-After header.

Projects#

RequiresTeam orEnterprise plan.

Organize compression workloads into projects. Each project tracks its own usage stats, making it easy to attribute token savings across teams or applications.

POST/v1/projects

Create a compression project.

AuthBearer token required (Team / Enterprise)

Request body

{
  "name": string,          // required — project name (1-100 chars)
  "description": string|null // optional — project description
}

Response

{
  "id": string,
  "name": string,
  "description": string|null,
  "created_at": string,
  "stats": { "compressions": 0, "tokens_saved": 0 }
}

curl -X POST https://api.gotcontext.ai/v1/projects \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"name": "backend-docs", "description": "API documentation compression"}'

Error responses

403Requires Team or Enterprise plan

422Missing or invalid name

GET/v1/projects

List all projects for the authenticated user.

AuthBearer token required (Team / Enterprise)

Response

{
  "projects": [
    {
      "id": string,
      "name": string,
      "description": string|null,
      "created_at": string,
      "stats": {
        "compressions": number,
        "tokens_saved": number
      }
    }
  ]
}

curl https://api.gotcontext.ai/v1/projects \
  -H "Authorization: Bearer YOUR_API_KEY"

GET/v1/projects/{id}

Get project detail with usage statistics.

AuthBearer token required (Team / Enterprise)

Response

{
  "id": string,
  "name": string,
  "description": string|null,
  "created_at": string,
  "updated_at": string,
  "stats": {
    "compressions": number,
    "tokens_saved": number,
    "avg_savings_pct": number
  }
}

curl https://api.gotcontext.ai/v1/projects/YOUR_PROJECT_ID \
  -H "Authorization: Bearer YOUR_API_KEY"

Error responses

404Project not found

PUT/v1/projects/{id}

Update a project's name or description.

AuthBearer token required (Team / Enterprise)

Request body

{
  "name": string|null,        // optional — new name
  "description": string|null  // optional — new description
}

Response

{
  "id": string,
  "name": string,
  "description": string|null,
  "updated_at": string
}

curl -X PUT https://api.gotcontext.ai/v1/projects/YOUR_PROJECT_ID \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"name": "backend-docs-v2"}'

Error responses

404Project not found

403Requires Team or Enterprise plan

DELETE/v1/projects/{id}

Delete a project. Compression history is retained but unlinked.

AuthBearer token required (Team / Enterprise)

Response

{
  "success": true,
  "id": string
}

curl -X DELETE https://api.gotcontext.ai/v1/projects/YOUR_PROJECT_ID \
  -H "Authorization: Bearer YOUR_API_KEY"

Error responses

404Project not found

403Requires Team or Enterprise plan

Batch Queue#

RequiresTeam orEnterprise plan.

Submit large compression jobs asynchronously. The batch queue processes documents in the background and returns results when complete — ideal for bulk ingestion pipelines.

POST/v1/batch-queue

Submit an async batch compression job. Returns 202 Accepted with a job ID for polling.

AuthBearer token required (Team / Enterprise)

Request body

{
  "documents": [           // required — 1 to 500 items
    {
      "text": string,      // required
      "fidelity": string,  // optional, default "balanced"
      "query": string|null // optional
    }
  ],
  "project_id": string|null, // optional — associate with a project
  "webhook_url": string|null // optional — POST results on completion
}

Response

{
  "job_id": string,
  "status": "queued",
  "documents_count": number,
  "created_at": string
}

curl -X POST https://api.gotcontext.ai/v1/batch-queue \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "documents": [
      {"text": "First document..."},
      {"text": "Second document...", "fidelity": "outline"}
    ]
  }'

Error responses

403Requires Team or Enterprise plan

400Empty documents list or more than 500 items

GET/v1/batch-queue

List batch jobs for the authenticated user.

AuthBearer token required (Team / Enterprise)

Response

{
  "jobs": [
    {
      "job_id": string,
      "status": "queued" | "processing" | "completed" | "failed",
      "documents_count": number,
      "completed_count": number,
      "created_at": string,
      "completed_at": string|null
    }
  ]
}

curl https://api.gotcontext.ai/v1/batch-queue \
  -H "Authorization: Bearer YOUR_API_KEY"

GET/v1/batch-queue/{id}

Get job status and progress.

AuthBearer token required (Team / Enterprise)

Response

{
  "job_id": string,
  "status": "queued" | "processing" | "completed" | "failed",
  "documents_count": number,
  "completed_count": number,
  "failed_count": number,
  "created_at": string,
  "completed_at": string|null,
  "progress_pct": number   // 0.0 - 100.0
}

curl https://api.gotcontext.ai/v1/batch-queue/YOUR_JOB_ID \
  -H "Authorization: Bearer YOUR_API_KEY"

Error responses

404Job not found

GET/v1/batch-queue/{id}/results

Retrieve completed batch results. Only available when status is 'completed'.

AuthBearer token required (Team / Enterprise)

Response

{
  "job_id": string,
  "results": [
    {
      "compressed": string,
      "original_tokens": number,
      "compressed_tokens": number,
      "savings_pct": number,
      "error": string|null
    }
  ],
  "summary": {
    "total_documents": number,
    "successful": number,
    "failed": number,
    "total_tokens_saved": number,
    "avg_savings_pct": number
  }
}

curl https://api.gotcontext.ai/v1/batch-queue/YOUR_JOB_ID/results \
  -H "Authorization: Bearer YOUR_API_KEY"

Error responses

404Job not found

409Job not yet completed

Analytics#

RequiresTeam orEnterprise plan.

Detailed analytics for compression usage across projects. View per-project breakdowns, track trends over time, and export data for reporting.

GET/v1/analytics/summary

Per-project usage breakdown for the current billing period.

AuthBearer token required (Team / Enterprise)

Response

{
  "period": string,           // "YYYY-MM"
  "total_compressions": number,
  "total_tokens_saved": number,
  "projects": [
    {
      "project_id": string,
      "project_name": string,
      "compressions": number,
      "tokens_saved": number,
      "avg_savings_pct": number
    }
  ]
}

curl https://api.gotcontext.ai/v1/analytics/summary \
  -H "Authorization: Bearer YOUR_API_KEY"

Error responses

403Requires Team or Enterprise plan

GET/v1/analytics/trends

Daily or weekly compression trends. Use query parameters to control the window.

AuthBearer token required (Team / Enterprise)

Response

{
  "granularity": "daily" | "weekly",
  "data": [
    {
      "date": string,          // "YYYY-MM-DD"
      "compressions": number,
      "tokens_saved": number,
      "avg_savings_pct": number
    }
  ]
}

curl "https://api.gotcontext.ai/v1/analytics/trends?granularity=daily&days=30" \
  -H "Authorization: Bearer YOUR_API_KEY"

Error responses

403Requires Team or Enterprise plan

400Invalid granularity or date range

GET/v1/analytics/export

Export analytics data as CSV for the specified date range.

AuthBearer token required (Team / Enterprise)

Response

Content-Type: text/csv

date,project,compressions,tokens_in,tokens_saved,savings_pct
2026-04-01,backend-docs,142,284000,248000,87.3
2026-04-01,frontend-app,89,178000,151300,85.0
...

curl "https://api.gotcontext.ai/v1/analytics/export?start=2026-04-01&end=2026-04-14" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -o analytics.csv

Error responses

403Requires Team or Enterprise plan

400Invalid date range or missing parameters

Command Palette#

Press Cmd+K (or Ctrl+K on Windows/Linux) to open the command palette. Navigate anywhere instantly.

Keyboard shortcuts

G+D Dashboard G+B Billing G+P Projects G+Q Queue G+S Settings

Press ? anywhere in the dashboard for the full shortcut reference.

Activity Feed#

Your dashboard overview shows your last 10 compressions with token counts, compression ratios, fidelity levels, and timestamps. Track your usage at a glance.

Breadcrumbs#

Navigate complex dashboard pages with breadcrumb trails showing your current location.

Theme#

Switch between Dark, Light, or System theme in Settings > General.

GitHub Webhooks#

Connect your GitHub repository to auto-compress documentation and code on push events. When a PR is opened, gotcontext compresses the diff and posts a comment with token savings. Configure in Settings > Integrations.

Setup

Enter your GitHub Personal Access Token, webhook secret, and repo URL in the Integrations settings tab.

Webhook events

push — triggers file compression on new commits.

pull_request — triggers diff compression + a PR comment with token savings.

Every incoming webhook is verified with HMAC-SHA256 signature validation.

MCP Tool Compression#

Requires Team or Enterprise plan.

Compress MCP tool descriptions to reduce token usage by 50–80%. Two tools are available:

compress_mcp_registry

Batch compress all tool descriptions from one or more MCP servers.

proxy_mcp_server

Proxy any MCP call through compression — transparently reduces tool description tokens for downstream consumers.

Real-Time Streaming#

Monitor batch compression jobs in real-time via Server-Sent Events. The Queue page has two views: List (table of jobs) and Monitor (live streaming cards with progress).

SSE endpoint: GET /v1/batch-queue/stream — subscribe to live job status updates.

Job Status#

Track active, queued, and failed jobs. Failed jobs show error messages with a retry button.

Stats Bar#

See aggregate metrics at the top of the Queue page — active jobs, queued jobs, failures, and average duration.

RBAC Roles#

Four permission levels for team collaboration:

RolePermissions

OwnerFull access, billing, member management

AdminManage members, view billing, manage integrations

OperatorCreate/run jobs, manage API keys

ViewerRead-only dashboard access

SSO#

Requires Enterprise plan.

Enterprise plans support SAML/OIDC single sign-on via Clerk. Configure in Settings > Security & SSO.

What's New#

Recent platform additions:

●Command Palette (Cmd+K)
●GitHub Integration
●Queue Monitor with real-time SSE
●RBAC (Owner, Admin, Operator, Viewer)
●Dark / Light / System theme
●Activity Feed on dashboard overview
●MCP Tool Compression (Team+)
●Team & Enterprise billing tiers

Fidelity Profiles#

Save named compression presets so repeat workflows fire one slug instead of three knobs. Each profile stores a fidelity level, chunk size, and skeleton ratio; pass profile="my-name" on any compress call instead of the raw parameters.

Five built-in fidelity tiers: abstract (most compressed) · outline · balanced (default) · detailed · raw. Manage profiles at /dashboard/profiles.

Webhooks#

Outbound webhooks deliver signed JSON payloads to your endpoint when compression events fire. Currently supported events: compression.completed.

Each delivery includes an X-GotContext-Signature HMAC-SHA256 header keyed off the secret returned at create time. Failed deliveries auto-retry with exponential backoff (3 attempts over ~10 min). Manage at /dashboard/webhooks or via POST /v1/webhooks.

Integrations#

GitHub integration: configure a repository webhook pointing at https://api.gotcontext.ai/v1/integrations/github/webhook with the secret from Settings → Integrations. Push events trigger automatic compression of the changed files so your CI assistant inherits a smaller context window.

Verify with HMAC-SHA256 against the X-Hub-Signature-256 header. Plain-text webhooks and unsigned events are rejected.

Semantic Cache#

Beyond compression we operate a per-tenant semantic cache: an embedding-similarity index of the last 100 baseline calls. When a new prompt is close enough to a cached one, we return the prior compressed result instead of re-running the pipeline — free token reduction on top of normal compression savings.

The cache warms up over the first ~100 baseline calls. Typical hit rates after week 1 land in the 15–25% range. The per-tenant similarity threshold is tunable via POST /v1/settings/semantic-cache-threshold (Team and Enterprise). Hit telemetry shows up at Billing → Cache-Adjusted Savings.