API Reference
gotcontext.ai is an MCP gateway with semantic compression — connect Claude Code, Cursor, VS Code, or any MCP client and immediately unlock 142 compression and context tools. The REST API gives the same results if you prefer direct HTTP.
Connect via MCP (recommended)#
The MCP server at https://api.gotcontext.ai/mcp gives your AI agent 142 compression, ingestion, and context-management tools without running anything locally. Complete the five-step setup below to connect and start attributing usage to a dedicated project.
Setup (5 steps)
1. Get a free API key
Sign in and create a gc_-prefixed key from your dashboard. The free tier includes 17 core tools and 1,000 compressions per month — enough to validate the workflow before upgrading.
2. Mint a dedicated project for this workspace
Without a project, all traffic attributes to your Default rollup alongside test fixtures and unrelated sessions — making per-project budgets and usage stats meaningless. Create a project from the Projects page or call the MCP tool directly from inside Claude Code:
create_project(name="my-repo", description="Compression project for my-repo")
# Returns: { project_id: "abc123", name: "my-repo" }3. Bind your key to that project
Go to Settings → API Keys and use the inline rebinder to assign the key to your new project. Allow up to 5 minutes for the change to propagate — the plan cache has a 5-minute TTL, so per-project compression counts begin incrementing on the new project shortly after.
4. Add the MCP server to your client config
Choose your client from the snippets below and paste your gc_ key in place of gc_your_key_here. For Claude Code, the key must be available as a shell environment variable — the .mcp.json substitution reads from the shell at session start, not from .env.local. Run export GOTCONTEXT_KEY=gc_... before launching Claude Code.
5. Verify attribution
Ask your agent to call project_stats and confirm the compression counter is incrementing on your project, not on Default.
project_stats()
# Returns: { project_name: "my-repo", compressions_this_month: 12, ... }Claude Code
{
"mcpServers": {
"gotcontext": {
"url": "https://api.gotcontext.ai/mcp",
"headers": {
"Authorization": "Bearer gc_your_key_here"
}
}
}
}Cursor
{
"mcpServers": {
"gotcontext": {
"url": "https://api.gotcontext.ai/mcp",
"headers": {
"Authorization": "Bearer gc_your_key_here"
}
}
}
}VS Code (settings.json)
{
"mcp": {
"servers": {
"gotcontext": {
"url": "https://api.gotcontext.ai/mcp",
"headers": {
"Authorization": "Bearer gc_your_key_here"
}
}
}
}
}Gemini CLI (settings.json)
{
"mcpServers": {
"gotcontext": {
"url": "https://api.gotcontext.ai/mcp",
"type": "http",
"headers": {
"Authorization": "Bearer gc_your_key_here"
},
"timeout": 30000
}
}
}Authentication
All MCP connections require a gc_-prefixed API key passed in the Authorization header. Create one from your dashboard.
For custom MCP clients
The MCP endpoint uses Streamable HTTP transport. Requests must include Accept: application/json, text/event-stream and carry the Mcp-Session-Id header from the initialize response on all subsequent calls. Claude Code, Cursor, and VS Code handle this automatically.
Project instructions file (CLAUDE.md / AGENTS.md)
Add a CLAUDE.md (or AGENTS.md) to your project root so the AI knows when and how to use gotcontext compression. Without this, the AI may not use the tools effectively. Copy this starter:
# gotcontext.ai Compression
This project uses gotcontext.ai for semantic compression via MCP.
## When to compress
- Before sending large files or docs to the AI context window
- When terminal output is verbose (git diff, test results, logs)
- When reviewing code across many files
- Before reviewing a PR or explaining a diff — compress the changed
files or run `gc_blast_radius` to see only transitively-touched code
## Compression workflow
1. Use `ingest_context` to add a document (give it a unique file_id)
2. Use `read_skeleton` to get the compressed version
3. Use `search_semantic` to find specific sections by query
4. Use `filter_cli_output` to compress git diffs, pytest output, etc.
## Code understanding (Pro+)
- `compress_codebase` — AST-aware digest of an entire repo; function
and class signatures only, bodies stripped
- `gc_blast_radius` — ranked context for a focus symbol: tensor-grep
blast-radius + BM25 fusion. Best for PR review and bug triage
- `gc_compress_manifest` — compress an MCP tools/list response so
downstream agents see shorter tool descriptions without losing
inputSchema semantics (v1.8.0+)
- `batch_ingest_documents` — submit up to 50 docs as one async job;
poll status via `GET /v1/batch-queue/{id}`
## Tips
- Use `estimate_tokens` first to see if compression is worthwhile
- For code files, the compressor understands function/class boundaries
- Use `get_compression_presets` to see available fidelity levels
- Call `tool_help` for documentation on any specific toolWhen to compress
The recommended per-call decision loop for any file or output you are about to pass to the model:
1. Check whether compression is worthwhile
For any file or output larger than ~1,500 tokens (roughly 6,000 bytes), call estimate_tokens first. If the result is below that threshold, send as-is — the compression overhead is not worth it.
2. Get a routing verdict
Call gc_pre_flight to get one of four verdicts:
gc_pre_flight()
# Verdicts:
# send_as_is — context is small; no action needed
# send_compressed — ingest + read_skeleton before sending
# warn_context_limit — approaching limit; compress or summarize
# clear_first — context is saturated; clear before proceeding3. Compress if recommended
If the verdict is send_compressed:
ingest_context(file_id="my-doc", content="...")
read_skeleton(doc_id="...")
# Use the skeleton in your prompt instead of the raw contentFor verbose CLI output
Pipe pytest output, fly logs, or git diff through filter_cli_output before passing to the model — typically 70–90% smaller with failure signal preserved.
For code review questions
When asking “what does changing X affect?” or reviewing a diff, call gc_blast_radius with the focus symbol. It returns ranked context — callsites, callers, transitively-touched code — without you reading every file manually.
Common pitfalls
Key bound to the wrong project
Per-project budget alerts fire against the project the key is bound to. If your key is bound to Default (or to a different project), every compression call increments the wrong counter, budget thresholds trigger at the wrong time, and per-project usage charts show nothing. Rebind via Settings → API Keys.
Key with no project binding (project_id NULL)
Legacy keys minted before the per-project update carry a project_id of null and fall back to the user-scoped Default rollup. All traffic appears under Default, polluting that project’s stats. Verify with project_stats() — if project_name returns "Default" but you created a dedicated project, the key needs rebinding.
.mcp.json environment variable not set before launching Claude Code
The .mcp.json substitution reads the shell environment at session start — not from .env.local or any dotenv file. If GOTCONTEXT_KEY is only in .env.local the MCP server will fail to authenticate. Run export GOTCONTEXT_KEY=gc_... in your shell before launching Claude Code, or add it to your shell profile.
Per-project counts not incrementing after rebind
The plan cache has a 5-minute TTL (Upstash). After rebinding a key to a new project, allow up to 5 minutes before project_stats() reflects the new attribution. Counts already attributed to the old project do not retroactively move.
5-Minute Tutorial#
Once your MCP client is connected, run this four-step workflow to see gotcontext.ai in action. Each step is a single MCP tool call — no REST, no auth headers, just tell your agent to call the tool.
Step 1 — Ingest a document
Tell your agent to call ingest_context with a file_id and the document text. The tool stores a compressed index and returns a doc_id.
ingest_context(
file_id="readme",
content="# My Project
...",
title="Project README"
)
# Returns: { doc_id: "doc_abc123", tokens_before: 1840, tokens_after: 312 }Step 2 — Read the compressed skeleton
Call read_skeleton with the doc_id from step 1. You get back the semantic structure of the document — typically 60–85% fewer tokens than the original.
read_skeleton(doc_id="doc_abc123")
# Returns compressed structural outline — headings, key facts, code signaturesStep 3 — Search for a specific section
Use search_semantic to find the most relevant chunks without loading the full document. Useful when your context window is tight.
search_semantic(
query="how does authentication work",
doc_id="doc_abc123",
top_k=3
)
# Returns top-3 semantically matching chunksStep 4 — Compress CLI output on the fly
Pipe verbose terminal output through filter_cli_output before it lands in your agent context. Works with git diff, pytest -v, and build logs.
filter_cli_output(
content=open("pytest_output.txt").read(),
source="pytest"
)
# Returns condensed failure summary — typically 70–90% smallerWhat you just did: ingested a document, retrieved its semantic skeleton, searched within it, and compressed CLI output — all through your AI agent with no REST calls and no local setup. Run tool_help(tool_name="ingest_context") for inline docs on any tool, or get_compression_presets() to tune fidelity.
MCP Tool Catalog#
The MCP gateway exposes 142 tools in two profiles. Pass ?profile=core to your MCP URL for a lean 7-tool set (fastest tools/list response, recommended for bandwidth-constrained clients), or ?profile=full (default) for all 142. Use tool_help(tool_name="X") at runtime to get the full parameter schema for any tool without leaving your agent session.
Ingest & Read
ingest_context— store + compress a documentread_skeleton— get the compressed outlinebatch_ingest_documents— async bulk ingest (up to 50)ingest_multimodal— PDF, image, audio ingestionrefresh_document— re-ingest when source changes
Search & Retrieve
search_semantic— embedding-based chunk searchsearch_code— BM25 + AST-aware code searchsearch_memory— retrieve from agent memoryget_context_block— fetch a specific chunk by idlist_documents— enumerate ingested docs
CLI & Output Filters
filter_cli_output— compress git diff, pytest, logscompress_codebase— AST-aware repo digestgc_blast_radius— ranked context for a symbolgc_compress_manifest— shrink MCP tools/list payloadestimate_tokens— count tokens before compressing
Context & Memory
add_memory— persist a fact across sessionscheck_budget— context-window utilization checkadapt_to_context_window— auto-trim to fit model limitadvise_context— recommend compression vs cleargc_pre_flight— single-call context health check
Knowledge Hub
gc_kb_ingest— add a file/URL to your KBgc_kb_query— semantic search across KBgc_kb_get— retrieve a KB documentgc_kb_list— list KB items in a projectgc_kb_diff— compare two KB document versions
Free Tier (no API key required)
gc_lookup— fetch live framework docs (Next.js, FastAPI, React…)tool_help— inline parameter docs for any toolget_compression_presets— list fidelity levelscheck_environment— verify connectivity and planestimate_tokens— count tokens (no compression charged)
The full 142-tool list with parameter schemas is available in the OpenAPI spec and via the A2A agent card at /.well-known/agent.json.
REST quickstart#
Get your API key from the dashboard, then make your first compression call:
{
"text": "gotcontext.ai is a semantic compression API for large-language-model context windows. It reduces token usage by 80–90% on medium-to-large documents through graph-based PageRank analysis, without losing the meaning that drives accurate model responses.\n\nArchitecture overview\n\nThe core pipeline has four stages:\n\n1. Chunking. The document is split into overlapping windows of 200–400 tokens. Window size is configurable; the default balances granularity against embedding cost.\n\n2. Embedding. Each chunk is encoded into a high-dimensional vector using an ONNX-exported sentence-transformer model (all-MiniLM-L6-v2 by default; Pro/Team/Enterprise tiers use accelerated ONNX with INT8 quantisation at 3–5x throughput). Embeddings run fully in-process — no external embedding API call is made, which keeps latency under 90 ms end-to-end for most documents.\n\n3. Graph construction and PageRank. A similarity graph is built where each chunk is a node and edges are drawn when the cosine similarity exceeds a configurable threshold (default: 0.35). The graph is then scored with a damped PageRank (damping factor 0.85). High-rank chunks are the semantic backbone of the document.\n\n4. Skeleton assembly. Chunks are sorted by PageRank score. The top K chunks — where K is determined by the requested fidelity level — are concatenated in original document order (not score order, which preserves narrative flow). The result is a compressed skeleton.\n\nFidelity levels\n\ngotcontext supports five named fidelity levels:\n\n- abstract: retains ~5% of chunks. Keeps only the highest-PageRank semantic backbone. Use for fast fact-retrieval where reasoning across the full document is not required.\n- outline: retains ~10% of chunks. Preserves top-level structure and key claims. Good for getting a structural overview before diving into sections.\n- balanced (default): retains ~20% of chunks. The recommended starting point for most documents — strong compression while keeping enough context for accurate model responses.\n- detailed: retains ~40% of chunks. Recommended for legal, medical, or compliance documents where missing a clause is costly.\n- raw: returns the original document unchanged. Use when you want the token-count and cost-estimate analytics without applying compression.\n\nAPI surface\n\nPOST /v1/compress is the primary endpoint. It accepts a JSON body with:\n\n- text (required): the document string. Maximum size depends on plan: 100 KB free, 1 MB Pro, 5 MB Team, 10 MB Enterprise.\n- fidelity (optional, default \"balanced\"): one of the four levels above.\n- model (optional): the target LLM model name, used only for cost estimation in the response stats. Does not change compression behaviour.\n- output_style (optional, v1.4.0+): \"prose\" | \"bullets\" | \"structured\". Controls the skeleton format. \"prose\" stitches chunks with light connectors; \"bullets\" prefixes each chunk with a dash; \"structured\" emits a JSON object with section labels.\n\nThe response body includes:\n\n- compressed: the compressed skeleton string.\n- stats.original_tokens: token count of the input.\n- stats.compressed_tokens: token count of the skeleton.\n- stats.tokens_saved: the difference.\n- stats.savings_pct: percentage reduction (0–100).\n- stats.estimated_cost_saved_usd: dollar estimate at the model's published input price, or at Opus 4.7 rates ($5/MTok input) when no model is specified.\n\nMCP integration\n\ngotcontext exposes a Streamable-HTTP MCP server at https://api.gotcontext.ai/mcp. This lets Claude Code, Cursor, Windsurf, Gemini CLI, and OpenAI Codex CLI call gotcontext compression directly as a tool — the LLM reads a long document, routes it through gotcontext, and continues reasoning on the compressed skeleton. The round-trip latency is below the tool-call overhead in all three clients.\n\nTool plan gating: the core compress tool is available on all plans. gc_blast_radius (structural code analysis via tensor-grep BM25) and gc_compress_manifest (MCP tool-schema compression, new in v1.8.0) are Pro+ tools.\n\nAuthentication\n\nThree auth modes are supported:\n\n- gc_ API key: HMAC-signed key created from the dashboard. Pass as Authorization: Bearer gc_<key>. Rate limits apply per key.\n- Clerk JWT: used by the dashboard and MCP server. The session token issued by Clerk is accepted on every /v1/* route.\n- Polar license (self-hosted): Ed25519-signed license key validated locally by the self-hosted binary. Metering events are batched and reported asynchronously.\n\nPrompt-cache integration\n\nFrom v1.1.0, gotcontext is aware of provider prompt-cache semantics. When a document has been compressed before with identical fidelity and the cached embedding is still valid, the response includes X-Cache-Hit: true and the latency drops to under 10 ms (cache read only, no embedding pass). The /v1/usage/by-cache endpoint breaks down savings into compression-only and cache-adjusted figures, which the dashboard Cache-Adjusted Savings widget visualises.",
"fidelity": "balanced"
}See curl
curl -X POST https://api.gotcontext.ai/v1/demo/compress \
-H 'Content-Type: application/json' \
-d '{"text":"gotcontext.ai is a semantic compression API for large-language-model context windows. It reduces token usage by 80–90% on medium-to-large documents through graph-based PageRank analysis, without losing the meaning that drives accurate model responses.\n\nArchitecture overview\n\nThe core pipeline has four stages:\n\n1. Chunking. The document is split into overlapping windows of 200–400 tokens. Window size is configurable; the default balances granularity against embedding cost.\n\n2. Embedding. Each chunk is encoded into a high-dimensional vector using an ONNX-exported sentence-transformer model (all-MiniLM-L6-v2 by default; Pro/Team/Enterprise tiers use accelerated ONNX with INT8 quantisation at 3–5x throughput). Embeddings run fully in-process — no external embedding API call is made, which keeps latency under 90 ms end-to-end for most documents.\n\n3. Graph construction and PageRank. A similarity graph is built where each chunk is a node and edges are drawn when the cosine similarity exceeds a configurable threshold (default: 0.35). The graph is then scored with a damped PageRank (damping factor 0.85). High-rank chunks are the semantic backbone of the document.\n\n4. Skeleton assembly. Chunks are sorted by PageRank score. The top K chunks — where K is determined by the requested fidelity level — are concatenated in original document order (not score order, which preserves narrative flow). The result is a compressed skeleton.\n\nFidelity levels\n\ngotcontext supports five named fidelity levels:\n\n- abstract: retains ~5% of chunks. Keeps only the highest-PageRank semantic backbone. Use for fast fact-retrieval where reasoning across the full document is not required.\n- outline: retains ~10% of chunks. Preserves top-level structure and key claims. Good for getting a structural overview before diving into sections.\n- balanced (default): retains ~20% of chunks. The recommended starting point for most documents — strong compression while keeping enough context for accurate model responses.\n- detailed: retains ~40% of chunks. Recommended for legal, medical, or compliance documents where missing a clause is costly.\n- raw: returns the original document unchanged. Use when you want the token-count and cost-estimate analytics without applying compression.\n\nAPI surface\n\nPOST /v1/compress is the primary endpoint. It accepts a JSON body with:\n\n- text (required): the document string. Maximum size depends on plan: 100 KB free, 1 MB Pro, 5 MB Team, 10 MB Enterprise.\n- fidelity (optional, default \"balanced\"): one of the four levels above.\n- model (optional): the target LLM model name, used only for cost estimation in the response stats. Does not change compression behaviour.\n- output_style (optional, v1.4.0+): \"prose\" | \"bullets\" | \"structured\". Controls the skeleton format. \"prose\" stitches chunks with light connectors; \"bullets\" prefixes each chunk with a dash; \"structured\" emits a JSON object with section labels.\n\nThe response body includes:\n\n- compressed: the compressed skeleton string.\n- stats.original_tokens: token count of the input.\n- stats.compressed_tokens: token count of the skeleton.\n- stats.tokens_saved: the difference.\n- stats.savings_pct: percentage reduction (0–100).\n- stats.estimated_cost_saved_usd: dollar estimate at the model's published input price, or at Opus 4.7 rates ($5/MTok input) when no model is specified.\n\nMCP integration\n\ngotcontext exposes a Streamable-HTTP MCP server at https://api.gotcontext.ai/mcp. This lets Claude Code, Cursor, Windsurf, Gemini CLI, and OpenAI Codex CLI call gotcontext compression directly as a tool — the LLM reads a long document, routes it through gotcontext, and continues reasoning on the compressed skeleton. The round-trip latency is below the tool-call overhead in all three clients.\n\nTool plan gating: the core compress tool is available on all plans. gc_blast_radius (structural code analysis via tensor-grep BM25) and gc_compress_manifest (MCP tool-schema compression, new in v1.8.0) are Pro+ tools.\n\nAuthentication\n\nThree auth modes are supported:\n\n- gc_ API key: HMAC-signed key created from the dashboard. Pass as Authorization: Bearer gc_<key>. Rate limits apply per key.\n- Clerk JWT: used by the dashboard and MCP server. The session token issued by Clerk is accepted on every /v1/* route.\n- Polar license (self-hosted): Ed25519-signed license key validated locally by the self-hosted binary. Metering events are batched and reported asynchronously.\n\nPrompt-cache integration\n\nFrom v1.1.0, gotcontext is aware of provider prompt-cache semantics. When a document has been compressed before with identical fidelity and the cached embedding is still valid, the response includes X-Cache-Hit: true and the latency drops to under 10 ms (cache read only, no embedding pass). The /v1/usage/by-cache endpoint breaks down savings into compression-only and cache-adjusted figures, which the dashboard Cache-Adjusted Savings widget visualises.","fidelity":"balanced"}'curl -X POST https://api.gotcontext.ai/v1/compress \
-H "Authorization: Bearer gc_your_key_here" \
-H "Content-Type: application/json" \
-d '{"text": "Your document text here...", "fidelity": "balanced"}'Authentication#
All API requests require a Bearer token in the Authorization header. Two token types are supported:
API Keys (recommended)
Prefixed with gc_. Create keys in the dashboard or viaPOST /v1/keys. Keys are permanent until revoked and can be rotated at any time.
Authorization: Bearer gc_a1b2c3d4e5f6...Clerk JWT (session tokens)
Short-lived tokens issued by Clerk after sign-in. Used automatically by the dashboard frontend. For programmatic access, API keys are preferred.
Authorization: Bearer eyJhbGciOi...SDKs & Plugins#
Pre-built clients wrap the REST API so you don't need raw fetch() calls. All clients return the same response shape as the REST API.
TypeScript / JavaScript
@gotcontext/sdk — published to npm. Zero runtime dependencies.
npm install @gotcontext/sdkimport { GotContextClient } from "@gotcontext/sdk";
const gc = new GotContextClient({ apiKey: "gc_your_key_here" });
const { compressed, stats } = await gc.compress({ text: "...", fidelity: "balanced" });Python
gotcontext — published to PyPI.
pip install gotcontextfrom gotcontext import GotContext
gc = GotContext(api_key="gc_your_key_here")
result = gc.compress(text="...", fidelity="balanced")
print(result.stats.savings_pct)Claude Code Plugin
One command installs the gotcontext plugin — pre-wired MCP config plus 5 skills (compress, blast-radius, dogfood-check, release-ship, knowledge-hub).
/plugin install oimiragieo/gotcontext-mainAgent-to-Agent (A2A) Discovery
Agent frameworks can autodiscover all 142 MCP tools from the Linux Foundation Agent2Agent v1.0 card — no human required.
GET https://api.gotcontext.ai/.well-known/agent.jsonFor machine-readable product metadata and alternatives comparison, see llms.txt, OpenAPI, and /compare.
Compression#
/v1/compressCompress any text document using graph-based semantic compression. Achieves 80–95% token reduction on medium-to-large documents. Optionally supply a query to guide the compressor toward sections most relevant to your question.
Request body
{
"text": string, // required — document to compress (min 1 char)
"fidelity": string, // optional — "abstract" | "outline" | "balanced" | "detailed" | "raw"
// default: "balanced"
"query": string|null, // optional — query-guided mode; prioritises relevant sections
"cost_model": string|null // optional — model name for cost estimate (e.g. "claude-opus-4")
}Response
{
"compressed": string, // compressed skeleton text
"stats": {
"original_tokens": number,
"compressed_tokens": number,
"savings_pct": number, // e.g. 87.4
"compression_ratio": number, // e.g. 7.9
"estimated_cost_saved": string|null // e.g. "$0.042" — only when cost_model supplied
}
}curl -X POST https://api.gotcontext.ai/v1/compress \
-H "Authorization: Bearer gc_your_key_here" \
-H "Content-Type: application/json" \
-d '{
"text": "Transformer models fundamentally changed NLP...",
"fidelity": "balanced",
"query": "attention mechanism",
"cost_model": "claude-sonnet-4-6"
}'Error responses
Code Compression#
/v1/compress-codeAST-aware code compression. Parses function/class boundaries, extracts imports and docstrings, ranks symbols by PageRank on the dependency graph. Returns a skeleton preserving signatures and docstrings. Significantly better than plain text compression for code.
Request body
{
"code": string, // required — source code to compress (min 1 char)
"language": string|null, // optional — hint: "python"|"javascript"|"typescript"|"java"|"go"|"rust"|"cpp"
// auto-detected from content when omitted
"fidelity": string, // optional — same levels as /compress, default: "balanced"
}Response
{
"compressed": string,
"stats": {
"original_tokens": number,
"compressed_tokens": number,
"savings_pct": number,
"language_detected": string // e.g. "python", "javascript", "unknown"
}
}curl -X POST https://api.gotcontext.ai/v1/compress-code \
-H "Authorization: Bearer gc_your_key_here" \
-H "Content-Type: application/json" \
-d '{
"code": "def process(items):\n ...",
"language": "python",
"fidelity": "balanced"
}'Error responses
Code Context Ranking (blast-radius + BM25)v1.5.0#
/v1/compress-code/structuralStructural code-context compression. Submit a file bundle + optional focus symbol; the server runs tensor-grep blast-radius + BM25 on the sandboxed files and returns a Reciprocal-Rank-Fusion–ranked context list. Intended for PR-diff-scale code payloads (≤1000 files, ≤512 KB each, ≤5 MB total). Measured 34% token reduction on a 10-file corpus with focus_symbol=cache_lookup vs naive full-bundle submission — see the smoke benchmark at benchmarks/blast_radius_smoke.py.
Request body
{
"files": [
{ "path": "src/app.py", "content": "def handle_request(): ..." },
{ "path": "src/utils.py", "content": "..." }
],
"focus_symbol": "handle_request", // optional — focus blast-radius on this symbol
"query": "error handling", // optional — BM25 query (defaults to focus_symbol)
"top_k": 25 // optional — cap on ranked_context length (1-500, default 50)
}Response
{
"ranked_context": [
{
"path": "src/app.py",
"score": 0.031,
"rank": 1,
"contributing_signals": ["bm25", "graph_distance"]
}
],
"stats": {
"files_in": 10,
"files_ranked": 5,
"symbols_in": 23,
"degraded": false
},
"message": null // non-null only on degraded paths (tg missing, timeout, etc.)
}curl -X POST https://api.gotcontext.ai/v1/compress-code/structural \
-H "Authorization: Bearer gc_your_key_here" \
-H "Content-Type: application/json" \
-d '{
"files": [
{"path":"src/app.py","content":"def handle_request(): pass"},
{"path":"src/utils.py","content":"..."}
],
"focus_symbol": "handle_request",
"top_k": 25
}'Error responses
Batch Compression (synchronous)#
/v1/batch-compressCompress up to 50 documents in a single call. Documents are processed concurrently (max 4 at once to avoid saturating the embedding model). Each document may have its own fidelity and query. Failed documents are reported inline — the overall batch always returns 200.
Request body
{
"documents": [ // required — 1 to 50 items
{
"text": string, // required
"fidelity": string, // optional, default "balanced"
"query": string|null // optional
}
]
}Response
{
"results": [
{
"compressed": string,
"original_tokens": number,
"compressed_tokens": number,
"savings_pct": number,
"compression_ratio": number,
"error": string|null // set when this document failed; other fields are 0
}
],
"summary": {
"total_documents": number,
"successful": number,
"failed": number,
"total_tokens_in": number,
"total_tokens_saved": number,
"avg_savings_pct": number,
"avg_compression_ratio": number
}
}curl -X POST https://api.gotcontext.ai/v1/batch-compress \
-H "Authorization: Bearer gc_your_key_here" \
-H "Content-Type: application/json" \
-d '{
"documents": [
{"text": "First document...", "fidelity": "balanced"},
{"text": "Second document...", "query": "neural networks"},
{"text": "Third document...", "fidelity": "outline"}
]
}'Error responses
Fidelity Advisor#
/v1/recommendAnalyse a document and recommend the optimal fidelity level. Considers document size and (optionally) the target model's context window. Use this to automatically pick the right compression level before calling /compress.
Request body
{
"text": string, // required — document to analyse
"model": string|null, // optional — target model (e.g. "claude-sonnet-4-6")
"context_window": number|null // optional — override context window size in tokens
}Response
{
"recommended_fidelity": string, // e.g. "balanced"
"estimated_ratio": number, // fraction of tokens kept (0.0–1.0)
"estimated_output_tokens": number,
"original_tokens": number,
"reasoning": string // human-readable explanation
}curl -X POST https://api.gotcontext.ai/v1/recommend \
-H "Authorization: Bearer gc_your_key_here" \
-H "Content-Type: application/json" \
-d '{
"text": "Your long document...",
"model": "claude-sonnet-4-6"
}'API Keys#
Create and manage API keys programmatically. Keys are prefixed gc_ and stored as HMAC-SHA256 hashes. The raw key is returned once on creation and cannot be retrieved again.
/v1/keysCreate a new API key. Returns the full raw key — store it immediately.
Request body
{
"name": string // required — human-readable label (1–100 chars)
}Response
{
"key": string, // full raw key — shown ONCE, store securely
"key_id": string, // 16-char hex ID for management
"name": string,
"created_at": string // ISO 8601 UTC
}curl -X POST https://api.gotcontext.ai/v1/keys \
-H "Authorization: Bearer YOUR_CLERK_JWT" \
-H "Content-Type: application/json" \
-d '{"name": "Production server"}'Error responses
/v1/keysList all API keys for the authenticated user. Returns masked key values — the raw key cannot be retrieved after creation.
Response
{
"keys": [
{
"key_id": string,
"name": string,
"masked_key": string, // e.g. "gc_****ab12"
"created_at": string, // ISO 8601 UTC
"last_used": string|null,
"status": "active" | "revoked"
}
]
}curl https://api.gotcontext.ai/v1/keys \
-H "Authorization: Bearer YOUR_CLERK_JWT"/v1/keys/:idRevoke an API key by ID. Takes effect immediately — the key is rejected by the auth middleware within milliseconds.
Response
{
"success": true,
"key_id": string
}curl -X DELETE https://api.gotcontext.ai/v1/keys/YOUR_KEY_ID \
-H "Authorization: Bearer YOUR_CLERK_JWT"Error responses
Usage#
/v1/usageMonthly compression statistics for the authenticated user. Returns compression counts, token totals, plan limit, and the next reset timestamp.
Response
{
"period": string, // "YYYY-MM", e.g. "2026-04"
"compressions_used": number,
"compressions_limit": number, // varies by plan — see plan field
"pct_used": number, // 0.0–100.0
"tokens_in": number, // total original tokens this month
"tokens_saved": number, // total tokens eliminated this month
"resets_at": string, // ISO 8601 UTC, midnight 1st of next month
"plan": string, // free | pro | team | enterprise
"rate_limit_per_minute": number // varies by plan
}curl https://api.gotcontext.ai/v1/usage \
-H "Authorization: Bearer gc_your_key_here"Billing#
Billing is handled by Polar. The checkout and portal endpoints return redirect URLs — do not call these from server-side code without a user session.
/v1/billing/checkoutCreate a Polar checkout session to upgrade to Pro. Returns a URL to redirect the user to.
Request body
{
"plan": "pro" // currently the only valid value
}Response
{
"checkout_url": string // redirect the user to this URL
}curl -X POST https://api.gotcontext.ai/v1/billing/checkout \
-H "Authorization: Bearer YOUR_CLERK_JWT" \
-H "Content-Type: application/json" \
-d '{"plan": "pro"}'Error responses
/v1/billing/portalGet the Polar customer portal URL to manage subscription, payment method, and invoices.
Response
{
"portal_url": string // redirect the user to this URL
}curl -X POST https://api.gotcontext.ai/v1/billing/portal \
-H "Authorization: Bearer YOUR_CLERK_JWT"Error responses
CLI Output Compressor (git diff, pytest, npm)#
/v1/filter-cliCompress verbose CLI output such as git diffs, test results, and npm install logs. Automatically detects the command type and applies type-specific compression. Typical savings: 80-99% on verbose output.
Request body
{
"output": string, // required — raw CLI output to compress (min 1 char)
"command_hint": string|null // optional — hint: "git_diff", "test_output", etc.
// auto-detected if omitted
}Response
{
"filtered": string, // compressed CLI output
"original_chars": number,
"filtered_chars": number,
"savings_pct": number, // e.g. 92.3
"detected_type": string|null // e.g. "git_diff", "pytest"
}curl -X POST https://api.gotcontext.ai/v1/filter-cli \
-H "Authorization: Bearer gc_your_key_here" \
-H "Content-Type: application/json" \
-d '{
"output": "diff --git a/src/main.py b/src/main.py\n...",
"command_hint": "git_diff"
}'Error responses
Lifetime Token Savings (account)#
/v1/savingsRetrieve your cumulative compression savings across all time. Shows total compressions, tokens processed, tokens saved, and an estimated dollar amount saved based on mid-range model pricing.
Response
{
"total_compressions": number,
"total_tokens_in": number,
"total_tokens_saved": number,
"savings_pct": number, // e.g. 87.2
"estimated_cost_saved_usd": number, // e.g. 12.45
// Pricing basis: Opus 4.7 input rates ($5/MTok), valued at compression-time.
// See /pricing for current rates. No model-specific breakdown is returned here.
}curl https://api.gotcontext.ai/v1/savings \
-H "Authorization: Bearer gc_your_key_here"Error responses
Prompt-Cache Friendliness Score#
/v1/audit-cacheAudit how cache-friendly a prompt is for a specific AI provider. Returns a cacheability score, whether the prompt is cache-friendly, actionable recommendations to improve cache hit rates, and estimated savings.
Request body
{
"text": string, // required — prompt or document text to audit (min 1 char)
"provider": string // optional — "anthropic" | "openai" | "google"
// default: "anthropic"
}Response
{
"provider": string,
"cache_friendly": boolean,
"score": number, // 0.0 - 1.0 cacheability score
"recommendations": [string], // actionable suggestions
"estimated_savings_pct": number // estimated cache hit savings
}curl -X POST https://api.gotcontext.ai/v1/audit-cache \
-H "Authorization: Bearer gc_your_key_here" \
-H "Content-Type: application/json" \
-d '{
"text": "You are a helpful assistant that...",
"provider": "anthropic"
}'Error responses
Context-Window Utilization Check#
/v1/check-budgetCheck how much of a model's context window a text would consume. Returns token estimates, percentage used, a status indicator (OK / WARNING / CRITICAL), and a recommendation on whether to compress.
Request body
{
"text": string, // required — text to check against budget (min 1 char)
"context_window": number, // optional — target context window in tokens
// default: 200000
"model": string // optional — target model for cost estimation
// default: "claude-opus-4"
}Response
{
"estimated_tokens": number,
"context_window": number,
"pct_used": number, // e.g. 42.5
"status": string, // "OK" | "WARNING" | "CRITICAL"
"recommendation": string // human-readable guidance
}curl -X POST https://api.gotcontext.ai/v1/check-budget \
-H "Authorization: Bearer gc_your_key_here" \
-H "Content-Type: application/json" \
-d '{
"text": "Your long document or codebase...",
"context_window": 200000,
"model": "claude-opus-4"
}'Error responses
Output Verbosity Suffixv1.4.0#
POST /v1/compress accepts an optional style field taking one of "terse", "normal" (default), or "verbose". When you set style: "terse", the response carries a short system_prompt_suffix string plus a style_suffix_version tag. Inject the suffix into your downstream LLM's system prompt to cap output verbosity.
The suffix is a versioned constant — fast to inject, prompt-cache-friendly, and does not alter the compressed skeleton itself. It reduces output tokens without affecting compression fidelity.
{
"compressed": "Short skeleton of the document…",
"stats": { "original_tokens": 485, "compressed_tokens": 61, "savings_pct": 87.4, ... },
"system_prompt_suffix": "Be concise. No filler, no hedging. State conclusions first. Omit sycophancy and preambles. Fragments are fine for prose; keep code blocks normal.",
"style_suffix_version": "v1"
}The suffix is a versioned constant — fast, deterministic, and prompt-cache-friendly. "verbose" is reserved for a future workflow; today it returnsnull (same as "normal").
Secret-Marker Pre-flight Blockv1.4.0#
POST /v1/compress and POST /v1/compress-code/structural run a pre-flight against the submitted content for unambiguous secret markers. On match, the request is refused with HTTP 400 and a machine-readable error body:
{
"detail": {
"error_code": "sensitive_content",
"marker_class": "aws_access_key",
"message": "Input appears to contain sensitive content; refusing to compress. Remove the secret value and retry."
}
}Detected marker classes: pem_private_key(PEM RSA/EC/DSA/OpenSSH/PGP/ENCRYPTED private-key headers), aws_access_key (AKIA+ 16-char suffix), openai_api_key (sk-or sk-proj- with ≥20-char suffix), ssh_key_path (.ssh/id_rsa|ed25519|ecdsa|dsafragments), and dotenv_secret(multi-line KEY=value with known-sensitive names like SECRET_KEY, DATABASE_URL, STRIPE_SECRET_KEY, POLAR_ACCESS_TOKEN, etc).
The matched value is never echoed. Error responses carry only the marker_class; the structured log line (sensitive-content-refuse user_id=… marker_class=…) likewise carries no content. Safe to log.
Structural-Loss Advisory Header (code_blocks, headings, urls)v1.4.0#
Every POST /v1/compress response now counts the fenced code blocks, markdown headings, and URLs present in the input and compares to the compressed output. If any class has fewer occurrences in the output, an advisory header is attached:
HTTP/1.1 200 OK X-Fidelity-Warning: code_blocks,urls Content-Type: application/json …
This is advisory — the request never fails on structural loss. Dashboards can alert on high per-tenant rates; auditors can verify "did I lose structure?" without a separate call. Possible values are a comma-separated subset of code_blocks, headings, urls. Absent means all three classes were preserved.
Per-Account Semantic-Cache Similarity Thresholdv1.4.0#
The semantic-cache uses cosine similarity to match near-duplicate requests. Research (Portkey + Tianpan, April 2026) shows correct-hit and incorrect-hit similarity distributions overlap between ~0.85 and ~0.92, so a single global threshold is wrong for every non-median workload. Two endpoints let each tenant tune their own cutoff.
/v1/settings/semantic-cache-thresholdRead your current cosine-similarity cutoff. source=user means you've set an override; source=global means the server-wide default applies.
Request body
— no body —Response
{
"threshold": 0.95,
"source": "global" // "user" | "global"
}curl https://api.gotcontext.ai/v1/settings/semantic-cache-threshold \
-H "Authorization: Bearer gc_your_key_here"Error responses
/v1/settings/semantic-cache-thresholdSet or clear your per-tenant cutoff. Pass threshold: null to reset to the server default. Otherwise cosine similarity in [0.80, 0.99].
Request body
{ "threshold": 0.92 }Response
{
"threshold": 0.92,
"source": "user"
}curl -X PUT https://api.gotcontext.ai/v1/settings/semantic-cache-threshold \
-H "Authorization: Bearer gc_your_key_here" \
-H "Content-Type: application/json" \
-d '{"threshold": 0.92}'Error responses
Cache-Hit Source Breakdown (exact vs semantic)v1.4.0#
GET /v1/usage/by-cache responses now include a by_source object that splits the cache hits by which mechanism matched — the request-hash fastpath (exact) vs. the embedding-distance fallback (semantic). The invariant exact_hits + semantic_hits == semantic_cache.hitsholds across the response window.
{
"period_days": 30,
"semantic_cache": { "hits": 15, "misses": 35, "hit_rate": 0.30, … },
"by_source": {
"exact_hits": 12,
"semantic_hits": 3,
"misses": 35
},
…
}Dashboards that rendered only the combined hitscounter previously hid which half of the cache was doing the work. The breakdown lets operators see whether a workload is benefiting from fingerprinting (exact) or from embedding-based near-duplicate matching (semantic) — and tune the per-tenant threshold (above) accordingly.
Enterprise & SecurityEnterprise#
gotcontext.ai Enterprise adds operational controls for teams that need security, compliance, and deployment flexibility. All Pro and Team features are included.
SAML / OIDC SSO
Connect Okta, Microsoft Entra, Google Workspace, or any SAML 2.0 / OIDC provider. Users authenticate through your IdP; provisioning and deprovisioning happen automatically via SCIM-style webhooks. SSO setup guide →
Self-Hosted Docker Image
Run the full MCP gateway + compression engine inside your VPC. The Docker image ships with the gotcontext Claude Code plugin pre-bundled. Activate with an Ed25519 license key — no phone-home required after activation. Contact us for the image registry token.
Audit-Log Export
Every API call, key creation, team membership change, and project event is written to an immutable append-only log. Export to your SIEM via webhook or pull via GET /v1/audit-log. Retention: 90 days by default; configurable up to 2 years.
Dedicated SLA & Support
99.9% uptime SLA with financial credits. Named customer success manager, private Slack channel, and custom onboarding. Custom contract and invoicing available (PO, NET-30, annual).
Role-Based Access Control
Four roles: Owner, Admin, Operator, Viewer. Scoped per project. API keys can be pinned to specific roles and projects. Available on Team and Enterprise plans. Roles & Permissions →
Data & Compliance
Documents submitted via the compression API are processed in-memory and not persisted beyond the request lifetime (unless you explicitly use Knowledge Hub). SOC 2 Type II audit in progress. Data-processing agreement (DPA) available on request.
Frequently asked questions
All enterprise features require an Enterprise plan. See pricing or contact us at team@gotcontext.ai for a custom quote.
Compression Quality Check (hallucinations & blind spots)#
/v1/detect-issuesDetect hallucinations and blind spots in compressed output by comparing it against the original text. Finds claims not supported by the source (hallucinations) and critical information that was lost (blind spots). Requires a Pro or Enterprise plan.
Request body
{
"original_text": string, // required — original uncompressed text (min 1 char)
"compressed_text": string, // required — compressed output to check (min 1 char)
"check_hallucination": boolean, // optional — check for hallucinated content
// default: true
"check_blind_spots": boolean // optional — check for lost critical info
// default: true
}Response
{
"issues_found": number,
"issues": [
{
"type": string, // "hallucination" or "blind_spot"
"severity": string, // "low", "medium", or "high"
"description": string,
"location": string|null
}
],
"quality_score": number // 0.0 - 1.0 (1.0 = no issues found)
}curl -X POST https://api.gotcontext.ai/v1/detect-issues \
-H "Authorization: Bearer gc_your_key_here" \
-H "Content-Type: application/json" \
-d '{
"original_text": "The full original document...",
"compressed_text": "The compressed version...",
"check_hallucination": true,
"check_blind_spots": true
}'Error responses
Error Codes#
All errors return JSON with a detail field describing the problem.
// Example error response
{
"detail": "Invalid fidelity 'garbage'. Valid: ['abstract', 'outline', 'balanced', 'detailed', 'raw']"
}Bad Request
Invalid parameter value (e.g. invalid fidelity, unknown plan, already-revoked key).
Unauthorized
Missing, expired, or invalid Bearer token.
Not Found
Resource not found (e.g. unknown key_id).
Unprocessable Entity
Pydantic validation failed — missing required field or wrong type.
Too Many Requests
Rate limit exceeded. Check the Retry-After header.
Internal Server Error
Unexpected server error. Retry with exponential back-off.
Service Unavailable
Dependency unavailable (Redis, Postgres, or billing service).
Rate Limits#
GET /v1/usage for your current consumption. When you hit the rate limit, the API responds with HTTP 429 and a Retry-After header.Projects#
Organize compression workloads into projects. Each project tracks its own usage stats, making it easy to attribute token savings across teams or applications.
/v1/projectsCreate a compression project.
Request body
{
"name": string, // required — project name (1-100 chars)
"description": string|null // optional — project description
}Response
{
"id": string,
"name": string,
"description": string|null,
"created_at": string,
"stats": { "compressions": 0, "tokens_saved": 0 }
}curl -X POST https://api.gotcontext.ai/v1/projects \
-H "Authorization: Bearer gc_your_key_here" \
-H "Content-Type: application/json" \
-d '{"name": "backend-docs", "description": "API documentation compression"}'Error responses
/v1/projectsList all projects for the authenticated user.
Response
{
"projects": [
{
"id": string,
"name": string,
"description": string|null,
"created_at": string,
"stats": {
"compressions": number,
"tokens_saved": number
}
}
]
}curl https://api.gotcontext.ai/v1/projects \
-H "Authorization: Bearer gc_your_key_here"/v1/projects/{id}Get project detail with usage statistics.
Response
{
"id": string,
"name": string,
"description": string|null,
"created_at": string,
"updated_at": string,
"stats": {
"compressions": number,
"tokens_saved": number,
"avg_savings_pct": number
}
}curl https://api.gotcontext.ai/v1/projects/YOUR_PROJECT_ID \
-H "Authorization: Bearer gc_your_key_here"Error responses
/v1/projects/{id}Update a project's name or description.
Request body
{
"name": string|null, // optional — new name
"description": string|null // optional — new description
}Response
{
"id": string,
"name": string,
"description": string|null,
"updated_at": string
}curl -X PUT https://api.gotcontext.ai/v1/projects/YOUR_PROJECT_ID \
-H "Authorization: Bearer gc_your_key_here" \
-H "Content-Type: application/json" \
-d '{"name": "backend-docs-v2"}'Error responses
/v1/projects/{id}Delete a project. Compression history is retained but unlinked.
Response
{
"success": true,
"id": string
}curl -X DELETE https://api.gotcontext.ai/v1/projects/YOUR_PROJECT_ID \
-H "Authorization: Bearer gc_your_key_here"Error responses
Batch Queue#
Submit large compression jobs asynchronously. The batch queue processes documents in the background and returns results when complete — ideal for bulk ingestion pipelines.
/v1/batch-queueSubmit an async batch compression job. Returns 202 Accepted with a job ID for polling.
Request body
{
"documents": [ // required — 1 to 500 items
{
"text": string, // required
"fidelity": string, // optional, default "balanced"
"query": string|null // optional
}
],
"project_id": string|null, // optional — associate with a project
"webhook_url": string|null // optional — POST results on completion
}Response
{
"job_id": string,
"status": "queued",
"documents_count": number,
"created_at": string
}curl -X POST https://api.gotcontext.ai/v1/batch-queue \
-H "Authorization: Bearer gc_your_key_here" \
-H "Content-Type: application/json" \
-d '{
"documents": [
{"text": "First document..."},
{"text": "Second document...", "fidelity": "outline"}
]
}'Error responses
/v1/batch-queueList batch jobs for the authenticated user.
Response
{
"jobs": [
{
"job_id": string,
"status": "queued" | "processing" | "completed" | "failed",
"documents_count": number,
"completed_count": number,
"created_at": string,
"completed_at": string|null
}
]
}curl https://api.gotcontext.ai/v1/batch-queue \
-H "Authorization: Bearer gc_your_key_here"/v1/batch-queue/{id}Get job status and progress.
Response
{
"job_id": string,
"status": "queued" | "processing" | "completed" | "failed",
"documents_count": number,
"completed_count": number,
"failed_count": number,
"created_at": string,
"completed_at": string|null,
"progress_pct": number // 0.0 - 100.0
}curl https://api.gotcontext.ai/v1/batch-queue/YOUR_JOB_ID \
-H "Authorization: Bearer gc_your_key_here"Error responses
/v1/batch-queue/{id}/resultsRetrieve completed batch results. Only available when status is 'completed'.
Response
{
"job_id": string,
"results": [
{
"compressed": string,
"original_tokens": number,
"compressed_tokens": number,
"savings_pct": number,
"error": string|null
}
],
"summary": {
"total_documents": number,
"successful": number,
"failed": number,
"total_tokens_saved": number,
"avg_savings_pct": number
}
}curl https://api.gotcontext.ai/v1/batch-queue/YOUR_JOB_ID/results \
-H "Authorization: Bearer gc_your_key_here"Error responses
Analytics#
Detailed analytics for compression usage across projects. View per-project breakdowns, track trends over time, and export data for reporting.
/v1/analytics/summaryPer-project usage breakdown for the current billing period.
Response
{
"period": string, // "YYYY-MM"
"total_compressions": number,
"total_tokens_saved": number,
"projects": [
{
"project_id": string,
"project_name": string,
"compressions": number,
"tokens_saved": number,
"avg_savings_pct": number
}
]
}curl https://api.gotcontext.ai/v1/analytics/summary \
-H "Authorization: Bearer gc_your_key_here"Error responses
/v1/analytics/trendsDaily or weekly compression trends. Use query parameters to control the window.
Response
{
"granularity": "daily" | "weekly",
"data": [
{
"date": string, // "YYYY-MM-DD"
"compressions": number,
"tokens_saved": number,
"avg_savings_pct": number
}
]
}curl "https://api.gotcontext.ai/v1/analytics/trends?granularity=daily&days=30" \
-H "Authorization: Bearer gc_your_key_here"Error responses
/v1/analytics/exportExport analytics data as CSV for the specified date range.
Response
Content-Type: text/csv
date,project,compressions,tokens_in,tokens_saved,savings_pct
2026-04-01,backend-docs,142,284000,248000,87.3
2026-04-01,frontend-app,89,178000,151300,85.0
...curl "https://api.gotcontext.ai/v1/analytics/export?start=2026-04-01&end=2026-04-14" \
-H "Authorization: Bearer gc_your_key_here" \
-o analytics.csvError responses
Command Palette#
Press Cmd+K (or Ctrl+K on Windows/Linux) to open the command palette. Navigate anywhere instantly.
Keyboard shortcuts
G+D Dashboard G+B Billing G+P Projects G+Q Queue G+S Settings
? anywhere in the dashboard for the full shortcut reference.Recent Compressions (last 10)#
Your dashboard overview shows your last 10 compressions with token counts, compression ratios, fidelity levels, and timestamps. Track your usage at a glance.
Theme#
Switch between Dark, Light, or System theme in Settings > General.
GitHub Webhooks#
Connect your GitHub repository to auto-compress documentation and code on push events. When a PR is opened, gotcontext compresses the diff and posts a comment with token savings. Configure in Settings > Integrations.
Setup
Enter your GitHub Personal Access Token, webhook secret, and repo URL in the Integrations settings tab.
Webhook events
push — triggers file compression on new commits.
pull_request — triggers diff compression + a PR comment with token savings.
MCP Tool Compression#
Compress MCP tool descriptions to reduce token usage by 50–80%. Two tools are available:
compress_mcp_registry
Batch compress all tool descriptions from one or more MCP servers.
proxy_mcp_server
Proxy any MCP call through compression — transparently reduces tool description tokens for downstream consumers.
Real-Time Streaming#
Monitor batch compression jobs in real-time via Server-Sent Events. The Queue page has two views: List (table of jobs) and Monitor (live streaming cards with progress).
GET /v1/batch-queue/stream — subscribe to live job status updates.Batch Job Lifecycle (queued → processing → completed / failed)#
Track active, queued, and failed jobs. Failed jobs show error messages with a retry button.
Queue Summary Metrics#
See aggregate metrics at the top of the Queue page — active jobs, queued jobs, failures, and average duration.
Roles & Permissions (Owner, Admin, Operator, Viewer)#
Four permission levels for team collaboration:
SSO#
Enterprise plans support SAML/OIDC single sign-on via Clerk. Configure in Settings > Security & SSO.
Recent Changes — see Changelog#
See the changelog for the full release history.
Fidelity Profiles#
Save named compression presets so repeat workflows fire one slug instead of three knobs. Each profile stores a fidelity level, chunk size, and skeleton ratio; pass profile="my-name" on any compress call instead of the raw parameters.
Five built-in fidelity tiers: abstract (most compressed) · outline · balanced (default) · detailed · raw. Manage profiles at /dashboard/profiles.
Webhooks#
Outbound webhooks deliver signed JSON payloads to your endpoint when compression events fire. Currently supported events: compression.completed.
Each delivery includes an X-GotContext-Signature HMAC-SHA256 header keyed off the secret returned at create time. Failed deliveries auto-retry with exponential backoff (3 attempts over ~10 min). Manage at /dashboard/webhooks or via POST /v1/webhooks.
Email Deliverabilityv1.28.0#
gotcontext sends transactional emails for account events: welcome, team invites, API key expiry warnings, per-project budget alerts, and usage digests. v1.28.0 adds automatic suppression when a delivery fails permanently.
When Resend reports a hard bounce or spam complaint against your address, POST /webhooks/resend receives the event (Svix-verified), sets users.email_opt_out = true for that address, and stops all subsequent sends. This prevents repeated delivery to an address that has rejected mail, which protects sender reputation for every account on the platform.
What triggers suppression:
- Hard bounce — the destination mail server permanently rejected the address (user does not exist, domain does not accept mail). Resend event type:
email.bouncedwithbounce_type: permanent. - Spam complaint — the recipient marked the email as spam. Resend event type:
email.complained.
Soft bounces (temporary failures, full inbox) do not trigger suppression.
Re-enabling notifications: if your address was suppressed in error (mis-delivered bounce report, overly aggressive spam filter), email support@gotcontext.ai to clear the opt-out flag. There is no self-serve toggle in the dashboard yet — tracked for a future settings release.
Manual opt-out is available at GET /v1/unsubscribe?token=<signed-token> — every transactional email includes a signed unsubscribe link in its footer. Visiting it sets email_opt_out = true for that user without requiring authentication.
Integrations#
GitHub integration: configure a repository webhook pointing at https://api.gotcontext.ai/v1/integrations/github/webhook with the secret from Settings → Integrations. Push events trigger automatic compression of the changed files so your CI assistant inherits a smaller context window.
Verify with HMAC-SHA256 against the X-Hub-Signature-256 header. Plain-text webhooks and unsigned events are rejected.
Semantic Cache#
Beyond compression we operate a per-account semantic cache: an embedding-similarity index of the last 100 baseline calls. When a new prompt is close enough to a cached one, we return the prior compressed result instead of re-running the pipeline. Additional reduction; not metered against compression quota.
The cache warms up over the first ~100 baseline calls. Typical hit rates after week 1 land in the 15–25% range. The per-tenant similarity threshold is tunable via POST /v1/settings/semantic-cache-threshold (Team and Enterprise). Hit telemetry shows up at Billing → Cache-Adjusted Savings.
gc_lookupv1.23.1#
Look up framework documentation across 9 indexed frameworks. Available on all plans. A gc_ API key is required for MCP auth, but lookups don't count against your compression quota.
gc_lookup does not count against your monthly compression quota. Indexed frameworks: Drizzle ORM, FastAPI, FastMCP, LangChain, Next.js, Pydantic, React, SQLAlchemy, Tailwind CSS. See /context for the full list and per-framework slug reference.Tool schema
{
"name": "gc_lookup",
"description": "Look up framework docs across 9 indexed frameworks. Free for all plans.",
"inputSchema": {
"type": "object",
"properties": {
"query_text": {
"type": "string",
"description": "Natural-language question or search phrase."
},
"slug": {
"type": "string",
"description": "Optional. Scope to one framework. Omit to search all.",
"enum": ["drizzle","fastapi","fastmcp","langchain","nextjs","pydantic","react","sqlalchemy","tailwind"]
}
},
"required": ["query_text"]
}
}Example call
gc_lookup(
query_text="how to use server actions",
slug="nextjs"
)Project Knowledge Base (per-project documents)v1.23.1#
Store, version, and query your own documents inside a project. Items are isolated per project — composite key (item_id, project_id). Ingest via text, URL fetch, or file upload (PDF, TXT, MD — 5 MB max). Query with natural language via gc_kb_query or pull structured diffs with gc_kb_diff.
MCP tools
gc_kb_ingest(item_id, content, mode) // mode: "text" | "url" | "file"
gc_kb_query(query, project_id?) // semantic search across items
gc_kb_list(project_id?) // list all items in a project
gc_kb_get(item_id, project_id?) // fetch one item
gc_kb_edit(item_id, content, project_id?)
gc_kb_diff(item_id, project_id?) // structured diff against previous version
gc_kb_delete(item_id, project_id?)File upload
From the dashboard at /dashboard/knowledge: drag-drop or click to select a file (PDF, TXT, MD, max 5 MB). Upload progress is tracked in-page; ingestion status polls every 2 seconds with a 5-minute timeout. Files larger than 5 MB must be split before upload. Supported ingest modes: text (paste), url (fetch by URL), and file (binary upload).
Public, Unauthenticated Pagesv1.23.1#
Three read-only pages require no authentication.
- /news — curated AI-context-engineering news feed (28 items across 7 categories).
- /context — framework documentation index for
gc_lookup; lists all 9 indexed frameworks with slugs and version tags. - /benchmarks/compression — compression cost and quality comparison across 13 frontier LLMs; quality scores update as live benchmark runs complete.
- /compare — side-by-side comparison vs LLMLingua, Langfuse, Cohere Compact, Voyage, and NotebookLM; no login required.
https://api.gotcontext.ai/.well-known/agent.json.