Connect via MCP (recommended)#
The MCP server at https://api.gotcontext.ai/mcp gives your AI agent 148 compression, ingestion, and context-management tools without running anything locally. Steps 1, 4, and 5 get you connected and seeing compression in about two minutes. Steps 2 to 3 (a dedicated project) are optional. Add them whenever you want per-project budgets and usage attribution; until then, usage rolls up to your auto-created Default project.
Setup (5 steps)
1. Get a free API key
Sign in and create a gc_-prefixed key from your dashboard. The free tier includes 17 tools and 1,000 compressions per month, enough to validate the workflow before upgrading.
2. (Optional) Mint a dedicated project for this workspace
Without a project, all traffic attributes to your Default rollup alongside test fixtures and unrelated sessions, making per-project budgets and usage stats meaningless. Create a project from the Projects page or call the MCP tool directly from inside Claude Code:
create_project(name="my-repo", description="Compression project for my-repo")
# Returns: { project_id: "abc123", name: "my-repo" }3. (Optional) Bind your key to that project
Go to Settings → API Keys and use the inline rebinder to assign the key to your new project. Allow up to 5 minutes for the change to propagate. The plan cache has a 5-minute TTL, so per-project compression counts begin incrementing on the new project shortly after.
4. Configure the MCP server
Run one command. It prompts for your gc_ key, or pass it via --key gc_... or the GOTCONTEXT_API_KEY env var. Restart your CLI afterwards.
npx gotcontext wrap claudenpx gotcontext wrap codexnpx gotcontext wrap geminiRun npx gotcontext doctor at any time to see which CLIs are detected and configured.
Or configure manually (paste JSON into your client config)
Choose your client from the snippets below and paste your gc_ key in place of gc_your_key_here. For Claude Code, the key must be available as a shell environment variable. The .mcp.json substitution reads from the shell at session start, not from .env.local. Run export GOTCONTEXT_KEY=gc_... before launching Claude Code.
The snippets default to ?profile=core: 7 essential tools at ~2,000 tokens, so a session starts lean. Swap to ?profile=full (or drop the query parameter) for the complete tool catalog at ~38,000 tokens. Why this matters →
5. Run your first compression: see the savings
Connected. Now ask your agent to compress something verbose: a git diff, a pytest -v run, or a large file. With filter_cli_output you typically see 50 to 60% fewer tokens on real output. Then call project_stats to confirm the usage attributed to your project, not Default.
filter_cli_output(text="<paste a git diff or pytest -v run>")
# -> compressed text + tokens_saved + savings_pct (typically 50-60% on verbose output)
project_stats()
# -> { project_name: "my-repo", compressions_this_month: 1, ... }Manual JSON config (alternative to the CLI)
Claude Code
{
"mcpServers": {
"gotcontext": {
"url": "https://api.gotcontext.ai/mcp?profile=core",
"headers": {
"Authorization": "Bearer gc_your_key_here"
}
}
}
}Cursor
{
"mcpServers": {
"gotcontext": {
"url": "https://api.gotcontext.ai/mcp?profile=core",
"headers": {
"Authorization": "Bearer gc_your_key_here"
}
}
}
}VS Code (settings.json)
{
"mcp": {
"servers": {
"gotcontext": {
"url": "https://api.gotcontext.ai/mcp?profile=core",
"headers": {
"Authorization": "Bearer gc_your_key_here"
}
}
}
}
}Gemini CLI (settings.json)
{
"mcpServers": {
"gotcontext": {
"url": "https://api.gotcontext.ai/mcp?profile=core",
"type": "http",
"headers": {
"Authorization": "Bearer gc_your_key_here"
},
"timeout": 30000
}
}
}Authentication
All MCP connections require a gc_-prefixed API key passed in the Authorization header. Create one from your dashboard.
For custom MCP clients
The MCP endpoint uses Streamable HTTP transport. Requests must include Accept: application/json, text/event-stream and carry the Mcp-Session-Id header from the initialize response on all subsequent calls. Claude Code, Cursor, and VS Code handle this automatically.
Project instructions file (CLAUDE.md / AGENTS.md)
Add a CLAUDE.md (or AGENTS.md) to your project root so the AI knows when and how to use gotcontext compression. Without this, the AI may not use the tools effectively. Copy this starter:
# gotcontext.ai Compression
This project uses gotcontext.ai for semantic compression via MCP.
## When to compress
- Before sending large files or docs to the AI context window
- When terminal output is verbose (git diff, test results, logs)
- When reviewing code across many files
- Before reviewing a PR or explaining a diff — compress the changed
files or run `gc_blast_radius` to see only transitively-touched code
## Compression workflow
1. Use `ingest_context` to add a document (give it a unique file_id)
2. Use `read_skeleton` to get an adaptive structural skeleton.
Compression adapts to size: small/medium docs stay faithful (most
sections kept, with meaningful savings); large docs compress hard.
Drill into any referenced section with `modulate_region`.
3. For a targeted read, pass `selection_mode="evidence_aware"` + a
`query` (and optional `top_k`) to anchor the relevant sections.
4. Use `search_semantic` to find specific sections by query.
5. Use `filter_cli_output` to compress git diffs, pytest output, etc.
## Code understanding (Pro+)
- `compress_codebase` — AST-aware digest of an entire repo; function
and class signatures only, bodies stripped
- `gc_blast_radius` — ranked context for a focus symbol: tensor-grep
blast-radius + BM25 fusion. Best for PR review and bug triage
- `gc_compress_manifest` — compress an MCP tools/list response so
downstream agents see shorter tool descriptions without losing
inputSchema semantics (v1.8.0+)
- `batch_ingest_documents` — submit up to 50 docs as one async job;
poll status via `GET /v1/batch-queue/{id}`
## Tips
- Use `estimate_tokens` first to see if compression is worthwhile
- For code files, the compressor understands function/class boundaries
- Use `get_compression_presets` to see available fidelity levels
- Call `tool_help` for documentation on any specific toolWhen to compress
The recommended per-call decision loop for any file or output you are about to pass to the model:
1. Check whether compression is worthwhile
For any file or output larger than ~1,500 tokens (roughly 6,000 bytes), call estimate_tokens first. If the result is below that threshold, send as-is. The compression overhead is not worth it.
2. Call gc_pre_flight first — it routes you to the right tool
gc_pre_flight is the recommended entry point before any gotcontext operation. It returns a verdict (one of four actions) and a mode field that tells you which gotcontext tool to reach for next: scout (use read_skeleton / search_semantic), compress (ingest + read_skeleton), read or write (KB operations), or idle (nothing needed).
gc_pre_flight()
# verdict — what to do:
# send_as_is — context is small; no action needed
# send_compressed — ingest + read_skeleton before sending
# warn_context_limit — approaching limit; compress or summarize
# clear_first — context is saturated; clear before proceeding
# mode — which tool to reach for next:
# scout → read_skeleton / search_semantic
# compress → ingest + read_skeleton
# read → gc_kb_get / gc_kb_query
# write → gc_kb_ingest / gc_kb_edit
# idle → nothing needed3. Compress if recommended
If the verdict is send_compressed:
ingest_context(file_id="my-doc", content="...")
read_skeleton(doc_id="...")
# Use the adaptive skeleton in your prompt instead of the raw content.
# Drill into any referenced section with modulate_region, or pass a
# selection_mode="evidence_aware" + query for a targeted read.For verbose CLI output
Pipe pytest output, fly logs, or git diff through filter_cli_output before passing to the model. Typically 70 to 90% smaller with failure signal preserved.
For code review questions
When asking “what does changing X affect?” or reviewing a diff, call gc_blast_radius with the focus symbol. It returns ranked context (callsites, callers, transitively-touched code) without you reading every file manually.
Common pitfalls
Key bound to the wrong project
Per-project budget alerts fire against the project the key is bound to. If your key is bound to Default (or to a different project), every compression call increments the wrong counter, budget thresholds trigger at the wrong time, and per-project usage charts show nothing. Rebind via Settings → API Keys.
Key with no project binding (project_id NULL)
Legacy keys minted before the per-project update carry a project_id of null and fall back to the user-scoped Default rollup. All traffic appears under Default, polluting that project’s stats. Verify with project_stats(): if project_name returns "Default" but you created a dedicated project, the key needs rebinding.
.mcp.json environment variable not set before launching Claude Code
The .mcp.json substitution reads the shell environment at session start, not from .env.local or any dotenv file. If GOTCONTEXT_KEY is only in .env.local the MCP server will fail to authenticate. Run export GOTCONTEXT_KEY=gc_... in your shell before launching Claude Code, or add it to your shell profile.
Per-project counts not incrementing after rebind
The plan cache has a 5-minute TTL (Upstash). After rebinding a key to a new project, allow up to 5 minutes before project_stats() reflects the new attribution. Counts already attributed to the old project do not retroactively move.
Hitting something not covered here? The full Troubleshooting guide walks through missing tools, 401s, the 421 Invalid Host error, plan gates, rate limits, and self-hosted gotchas, each with the exact fix.
5-Minute Tutorial#
Once your MCP client is connected, run this four-step workflow to see gotcontext.ai in action. Each step is a single MCP tool call. Tell your agent to call the tool.
Step 1: Ingest a document
Tell your agent to call ingest_context with a file_id and the document text. The tool stores a compressed index and returns a doc_id.
ingest_context(
file_id="readme",
content="# My Project
...",
title="Project README"
)
# Returns: { doc_id: "doc_abc123", tokens_before: 1840, tokens_after: 312 }Step 2: Read the compressed skeleton
Call read_skeleton with the doc_id from step 1. Compression is adaptive: small and medium documents stay faithful (most sections kept, with meaningful token savings), while large documents compress aggressively for the biggest savings. The skeleton anchors the most important sections and summarises the rest, and compression is reversible: call modulate_region on any summarised node to expand it back to full fidelity on demand. Nothing is discarded; the original is always reachable.
read_skeleton(doc_id="doc_abc123")
# Returns an adaptive structural skeleton — anchored sections (headings,
# key facts, code signatures) plus short summaries for referenced sections.
# Expand any referenced section:
# modulate_region(node_ids=["doc_abc123_n3"], fidelity_level="DETAILED")For a targeted read, anchor the sections relevant to a question with selection_mode="evidence_aware" plus a query (and an optional top_k):
read_skeleton(
doc_id="doc_abc123",
selection_mode="evidence_aware",
query="how does authentication work",
top_k=5
)
# Force-anchors the sections most relevant to the queryStep 3: Search for a specific section
Use search_semantic to find the most relevant chunks without loading the full document. Useful when your context window is tight.
search_semantic(
query="how does authentication work",
doc_id="doc_abc123",
top_k=3
)
# Returns top-3 semantically matching chunksStep 4: Compress CLI output on the fly
Pipe verbose terminal output through filter_cli_output before it lands in your agent context. Works with git diff, pytest -v, and build logs.
filter_cli_output(
content=open("pytest_output.txt").read(),
source="pytest"
)
# Returns condensed failure summary — typically 70–90% smallerWhat you just did: ingested a document, retrieved its semantic skeleton, searched within it, and compressed CLI output, all through your AI agent with no REST calls and no local setup. Run tool_help(tool_name="ingest_context") for inline docs on any tool, or get_compression_presets() to tune fidelity.
Built-in prompts & resources
The gateway also serves MCP prompts and resources, so they appear in your client (Claude Code, Cursor, and others) the moment you connect — no extra setup. Prompts are ready-to-run workflows that chain the tools below; resources are read-only context your agent pulls on demand. Both are available on every plan.
7 workflow prompts
compress-large-file, review-pr-diff, understand-codebase, debug-failing-test, find-then-expand, pre-flight, lookup-framework-docs
3 context resources
gotcontext://catalog/tools, gotcontext://docs/quickstart, gotcontext://savings/global
What's next
MCP Tool Catalog#
The MCP gateway exposes 148 tools in two profiles. Pass ?profile=core to your MCP URL for a lean 7-tool set (fastest tools/list response, recommended for bandwidth-constrained clients), or ?profile=full (default) for all 148. Use tool_help(tool_name="X") at runtime to get the full parameter schema for any tool without leaving your agent session.
Ingest & Read
ingest_context: store + compress a documentread_skeleton: get the compressed outlinebatch_ingest_documents: async bulk ingest (up to 50)ingest_multimodal: PDF, image, audio ingestionrefresh_document: re-ingest when source changes
Search & Retrieve
search_semantic: embedding-based chunk searchsearch_code: BM25 + AST-aware code searchsearch_memory: retrieve from agent memoryget_context_block: fetch a specific chunk by idlist_documents: enumerate ingested docs
CLI & Output Filters
filter_cli_output: compress git diff, pytest, logscompress_codebase: AST-aware repo digestgc_blast_radius: ranked context for a symbolgc_compress_manifest: shrink MCP tools/list payloadestimate_tokens: count tokens before compressing
Context & Memory
add_memory: persist a fact across sessionscheck_budget: context-window utilization checkadapt_to_context_window: auto-trim to fit model limitadvise_context: recommend compression vs cleargc_pre_flight: call this first — returns a verdict + amodefield (scout/compress/read/write/idle) telling you which tool to reach for next
Knowledge Hub
gc_kb_ingest: add a file/URL to your KBgc_kb_query: semantic search across KBgc_kb_get: retrieve a KB documentgc_kb_list: list KB items in a projectgc_kb_diff: compare two KB document versions
Free Tier (no API key required)
gc_lookup: fetch live framework docs (Next.js, FastAPI, React…)tool_help: inline parameter docs for any toolget_compression_presets: list fidelity levelscheck_environment: verify connectivity and planestimate_tokens: count tokens (no compression charged)
The full 148-tool list with parameter schemas is available in the OpenAPI spec and via the A2A agent card at /.well-known/agent.json.
REST quickstart#
Get your API key from the dashboard, then make your first compression call:
{
"text": "gotcontext.ai is a semantic compression API for large-language-model context windows. It reduces token usage by 80–90% on medium-to-large documents through graph-based PageRank analysis, without losing the meaning that drives accurate model responses.\n\nArchitecture overview\n\nThe core pipeline has four stages:\n\n1. Chunking. The document is split into overlapping windows of 200–400 tokens. Window size is configurable; the default balances granularity against embedding cost.\n\n2. Embedding. Each chunk is encoded into a high-dimensional vector using an ONNX-exported sentence-transformer model (all-MiniLM-L6-v2 by default; Pro/Team/Enterprise tiers use accelerated ONNX with INT8 quantisation at 3–5x throughput). Embeddings run fully in-process — no external embedding API call is made, which keeps latency under 90 ms end-to-end for most documents.\n\n3. Graph construction and PageRank. A similarity graph is built where each chunk is a node and edges are drawn when the cosine similarity exceeds a configurable threshold (default: 0.35). The graph is then scored with a damped PageRank (damping factor 0.85). High-rank chunks are the semantic backbone of the document.\n\n4. Skeleton assembly. Chunks are sorted by PageRank score. The top K chunks — where K is determined by the requested fidelity level — are concatenated in original document order (not score order, which preserves narrative flow). The result is a compressed skeleton.\n\nFidelity levels\n\ngotcontext supports five named fidelity levels:\n\n- abstract: retains ~5% of chunks. Keeps only the highest-PageRank semantic backbone. Use for fast fact-retrieval where reasoning across the full document is not required.\n- outline: retains ~10% of chunks. Preserves top-level structure and key claims. Good for getting a structural overview before diving into sections.\n- balanced (default): retains ~20% of chunks. The recommended starting point for most documents — strong compression while keeping enough context for accurate model responses.\n- detailed: retains ~40% of chunks. Recommended for legal, medical, or compliance documents where missing a clause is costly.\n- raw: returns the original document unchanged. Use when you want the token-count and cost-estimate analytics without applying compression.\n\nAPI surface\n\nPOST /v1/compress is the primary endpoint. It accepts a JSON body with:\n\n- text (required): the document string. Maximum size depends on plan: 100 KB free, 1 MB Pro, 5 MB Team, 10 MB Enterprise.\n- fidelity (optional, default \"balanced\"): one of the four levels above.\n- model (optional): the target LLM model name, used only for cost estimation in the response stats. Does not change compression behaviour.\n- output_style (optional, v1.4.0+): \"prose\" | \"bullets\" | \"structured\". Controls the skeleton format. \"prose\" stitches chunks with light connectors; \"bullets\" prefixes each chunk with a dash; \"structured\" emits a JSON object with section labels.\n\nThe response body includes:\n\n- compressed: the compressed skeleton string.\n- stats.original_tokens: token count of the input.\n- stats.compressed_tokens: token count of the skeleton.\n- stats.tokens_saved: the difference.\n- stats.savings_pct: percentage reduction (0–100).\n- stats.estimated_cost_saved_usd: dollar estimate at the model's published input price, or at Opus 4.7 rates ($5/MTok input) when no model is specified.\n\nMCP integration\n\ngotcontext exposes a Streamable-HTTP MCP server at https://api.gotcontext.ai/mcp. This lets Claude Code, Cursor, Windsurf, Gemini CLI, and OpenAI Codex CLI call gotcontext compression directly as a tool — the LLM reads a long document, routes it through gotcontext, and continues reasoning on the compressed skeleton. The round-trip latency is below the tool-call overhead in all three clients.\n\nTool plan gating: the core compress tool is available on all plans. gc_blast_radius (structural code analysis via tensor-grep BM25) and gc_compress_manifest (MCP tool-schema compression, new in v1.8.0) are Pro+ tools.\n\nAuthentication\n\nThree auth modes are supported:\n\n- gc_ API key: HMAC-signed key created from the dashboard. Pass as Authorization: Bearer gc_<key>. Rate limits apply per key.\n- Clerk JWT: used by the dashboard and MCP server. The session token issued by Clerk is accepted on every /v1/* route.\n- Polar license (self-hosted): Ed25519-signed license key validated locally by the self-hosted binary. Metering events are batched and reported asynchronously.\n\nPrompt-cache integration\n\nFrom v1.1.0, gotcontext is aware of provider prompt-cache semantics. When a document has been compressed before with identical fidelity and the cached embedding is still valid, the response includes X-Cache-Hit: true and the latency drops to under 10 ms (cache read only, no embedding pass). The /v1/usage/by-cache endpoint breaks down savings into compression-only and cache-adjusted figures, which the dashboard Cache-Adjusted Savings widget visualises.",
"fidelity": "balanced"
}See curl
curl -X POST https://api.gotcontext.ai/v1/demo/compress \
-H 'Content-Type: application/json' \
-d '{"text":"gotcontext.ai is a semantic compression API for large-language-model context windows. It reduces token usage by 80–90% on medium-to-large documents through graph-based PageRank analysis, without losing the meaning that drives accurate model responses.\n\nArchitecture overview\n\nThe core pipeline has four stages:\n\n1. Chunking. The document is split into overlapping windows of 200–400 tokens. Window size is configurable; the default balances granularity against embedding cost.\n\n2. Embedding. Each chunk is encoded into a high-dimensional vector using an ONNX-exported sentence-transformer model (all-MiniLM-L6-v2 by default; Pro/Team/Enterprise tiers use accelerated ONNX with INT8 quantisation at 3–5x throughput). Embeddings run fully in-process — no external embedding API call is made, which keeps latency under 90 ms end-to-end for most documents.\n\n3. Graph construction and PageRank. A similarity graph is built where each chunk is a node and edges are drawn when the cosine similarity exceeds a configurable threshold (default: 0.35). The graph is then scored with a damped PageRank (damping factor 0.85). High-rank chunks are the semantic backbone of the document.\n\n4. Skeleton assembly. Chunks are sorted by PageRank score. The top K chunks — where K is determined by the requested fidelity level — are concatenated in original document order (not score order, which preserves narrative flow). The result is a compressed skeleton.\n\nFidelity levels\n\ngotcontext supports five named fidelity levels:\n\n- abstract: retains ~5% of chunks. Keeps only the highest-PageRank semantic backbone. Use for fast fact-retrieval where reasoning across the full document is not required.\n- outline: retains ~10% of chunks. Preserves top-level structure and key claims. Good for getting a structural overview before diving into sections.\n- balanced (default): retains ~20% of chunks. The recommended starting point for most documents — strong compression while keeping enough context for accurate model responses.\n- detailed: retains ~40% of chunks. Recommended for legal, medical, or compliance documents where missing a clause is costly.\n- raw: returns the original document unchanged. Use when you want the token-count and cost-estimate analytics without applying compression.\n\nAPI surface\n\nPOST /v1/compress is the primary endpoint. It accepts a JSON body with:\n\n- text (required): the document string. Maximum size depends on plan: 100 KB free, 1 MB Pro, 5 MB Team, 10 MB Enterprise.\n- fidelity (optional, default \"balanced\"): one of the four levels above.\n- model (optional): the target LLM model name, used only for cost estimation in the response stats. Does not change compression behaviour.\n- output_style (optional, v1.4.0+): \"prose\" | \"bullets\" | \"structured\". Controls the skeleton format. \"prose\" stitches chunks with light connectors; \"bullets\" prefixes each chunk with a dash; \"structured\" emits a JSON object with section labels.\n\nThe response body includes:\n\n- compressed: the compressed skeleton string.\n- stats.original_tokens: token count of the input.\n- stats.compressed_tokens: token count of the skeleton.\n- stats.tokens_saved: the difference.\n- stats.savings_pct: percentage reduction (0–100).\n- stats.estimated_cost_saved_usd: dollar estimate at the model's published input price, or at Opus 4.7 rates ($5/MTok input) when no model is specified.\n\nMCP integration\n\ngotcontext exposes a Streamable-HTTP MCP server at https://api.gotcontext.ai/mcp. This lets Claude Code, Cursor, Windsurf, Gemini CLI, and OpenAI Codex CLI call gotcontext compression directly as a tool — the LLM reads a long document, routes it through gotcontext, and continues reasoning on the compressed skeleton. The round-trip latency is below the tool-call overhead in all three clients.\n\nTool plan gating: the core compress tool is available on all plans. gc_blast_radius (structural code analysis via tensor-grep BM25) and gc_compress_manifest (MCP tool-schema compression, new in v1.8.0) are Pro+ tools.\n\nAuthentication\n\nThree auth modes are supported:\n\n- gc_ API key: HMAC-signed key created from the dashboard. Pass as Authorization: Bearer gc_<key>. Rate limits apply per key.\n- Clerk JWT: used by the dashboard and MCP server. The session token issued by Clerk is accepted on every /v1/* route.\n- Polar license (self-hosted): Ed25519-signed license key validated locally by the self-hosted binary. Metering events are batched and reported asynchronously.\n\nPrompt-cache integration\n\nFrom v1.1.0, gotcontext is aware of provider prompt-cache semantics. When a document has been compressed before with identical fidelity and the cached embedding is still valid, the response includes X-Cache-Hit: true and the latency drops to under 10 ms (cache read only, no embedding pass). The /v1/usage/by-cache endpoint breaks down savings into compression-only and cache-adjusted figures, which the dashboard Cache-Adjusted Savings widget visualises.","fidelity":"balanced"}'curl -X POST https://api.gotcontext.ai/v1/compress \
-H "Authorization: Bearer gc_your_key_here" \
-H "Content-Type: application/json" \
-d '{"text": "Your document text here...", "fidelity": "balanced"}'Authentication#
All API requests require a Bearer token in the Authorization header. Two token types are supported:
API Keys (recommended)
Prefixed with gc_. Create keys in the dashboard or viaPOST /v1/keys. Keys are permanent until revoked and can be rotated at any time.
Authorization: Bearer gc_a1b2c3d4e5f6...Clerk JWT (session tokens)
Short-lived tokens issued by Clerk after sign-in. Used automatically by the dashboard frontend. For programmatic access, API keys are preferred.
Authorization: Bearer eyJhbGciOi...SDKs & Plugins#
Pre-built clients wrap the REST API so you don't need raw fetch() calls. All clients return the same response shape as the REST API.
TypeScript / JavaScript
@gotcontext/sdk: published to npm. Zero runtime dependencies.
npm install @gotcontext/sdkimport { GotContextClient } from "@gotcontext/sdk";
const gc = new GotContextClient({ apiKey: "gc_your_key_here" });
const { compressed, stats } = await gc.compress({ text: "...", fidelity: "balanced" });Python
gotcontext: published to PyPI.
pip install gotcontextfrom gotcontext import GotContext
gc = GotContext(api_key="gc_your_key_here")
result = gc.compress(text="...", fidelity="balanced")
print(result.stats.savings_pct)Claude Code Plugin
One command installs the gotcontext plugin: pre-wired MCP config plus 9 outcome-oriented skills (shrink-for-claude, ingest-docs, review-pr-diff, extract-api-surface, batch-compress, session-summary, pre-flight, compress-mcp-manifest, quick-start).
/plugin marketplace add oimiragieo/gotcontext-pluginAgent-to-Agent (A2A) Discovery
Agent frameworks can autodiscover all 148 MCP tools from the Linux Foundation Agent2Agent v1.0 card, no human required.
GET https://api.gotcontext.ai/.well-known/agent.jsonFor machine-readable product metadata and alternatives comparison, see llms.txt, OpenAPI, and /compare.