Measured savings across 11 LLMs — Claude Opus 4.7 to Gemini Flash.→ See per-model data
Get free API key →
KB-2026.05 · gc_kb_query v1.0 BETA

Compressed RAG.

Up to 5-20× cheaper than NotebookLM.

Knowledge Hub brings semantic compression to your retrieval stack. 64% avg compression in production(live from /v1/global-savings) — ingest, query, and retrieve at a fraction of the token cost without rebuilding your infrastructure.

64%
avg compression ratio
7
MCP tools (gc_kb_*)
MCP + REST
both interfaces
v1.0
versioned docs

How it compares

Compression ratio for competitors is listed as — we do not have audited numbers and will not fabricate them. Pricing is sourced from each provider's public documentation as of May 2026.

Featuregotcontext.aiNotebookLMPineconeVectara
Pricing modelFree / $49 / $99 / $199 / $499$20/mo (Google One AI Premium)Free starter; $0.096/hr per pod (Standard)Free; $25/mo base (Growth)
Compression ratio60% avg in production (live)
MCP-native interface7 gc_kb_* toolsNo MCP APINo MCP APINo MCP API
Code-level REST access/v1/projects/{id}/knowledge/*No public REST APIREST + gRPCREST API
Audit logAppend-only, DB-enforcedNoneNoneNone
Multi-tenant projectsYes — partition-key isolationGoogle account onlyYesYes
Self-hosted optionYes — Docker image + Ed25519 licenseNoNoYes (enterprise)

Comparison based on publicly documented features and pricing as of May 2026. Verify current capabilities at each provider's documentation.

What a query looks like

One MCP tool call. Compressed chunks back. Cost delta visible immediately.

TypeScript · MCP
const result = await mcpClient.callTool({
  name: "gc_kb_query",
  arguments: {
    project_id: "proj_abc123",
    query: "What were the latency SLA decisions?",
    top_k: 10,
  },
});

// result.chunks[0]:
// {
//   raw_text: "SLA targets: p50 <120ms, p99 <800ms...",
//   score: 0.94,
//   tokens: 187
// }
This query
billed tokens1,847
saved tokens12,360
cost delta-$0.034

At $3/1M tokens. Savings scale with document size — longer corpora compress more aggressively.

By the numbers

A representative query against a 40-page technical document, at gotcontext.ai's observed 64% average compression ratio (sourced from live production data at /v1/global-savings).

Without compression
Traditional RAG — full raw chunks returned
Chunk tokens retrieved12,400
Prompt tokens sent to LLM~13,800
Cost at $3/1M tokens$0.041
With Knowledge Hub
Compressed retrieval — semantic skeleton per chunk
Chunk tokens after compression~4,960 (-64%)
Prompt tokens sent to LLM~6,360
Cost at $3/1M tokens$0.019
54% cost reduction on this query

Methodology: 40-page technical PDF, semantic skeleton fidelity, GPT-5.5 / Claude Opus 4.7 pricing at $3/1M input tokens. Compression ratio is the live production average from the gotcontext.ai API. Actual results depend on document length, fidelity setting, and content type — longer documents with repetitive structure compress more aggressively (up to 20× reduction in retrieval tokens vs full-document RAG).

How it works

Three steps. No new infrastructure. Works with any LLM through MCP or the REST API.

01
Ingest

Upload documents via the dashboard or call gc_kb_ingest. Each document is chunked, semantically compressed, and stored with halfvec embeddings for fast retrieval.

02
Query

Call gc_kb_query with a natural-language question. The engine retrieves the most relevant compressed chunks — returning context at 5-20× lower token cost than full-document retrieval.

03
Build

Use the returned compressed context in your LLM call. Edit, version, and diff documents over time with gc_kb_edit and gc_kb_diff. Every change is tracked.

Built for AI teams

RES
Research & analysis

Ingest PDFs, papers, and reports. Ask questions across your entire corpus without blowing your context budget on every query.

AGT
Agent memory

Give your agents a persistent, queryable knowledge store. Pull compressed context into prompts on demand — not as a static system prompt.

DOC
Internal documentation

Index runbooks, architecture docs, and on-call playbooks. Surface answers via MCP without exposing raw document content to every LLM call.

ENT
Enterprise RAG

Version-controlled documents with full audit trail. Diff edits between versions. Project-scoped isolation so each team keeps their own KB.

MUL
Multi-model pipelines

The same MCP interface works with Claude, GPT, Gemini, and any model accessible through a CLI that supports Streamable HTTP MCP.

OPS
Cost-sensitive production

When retrieval tokens are your biggest line item, compressing before retrieval changes the economics of running RAG at scale.

Security & isolation

Every Knowledge Hub tenant is cryptographically isolated at the database layer. No shared index, no cross-project bleed.

Project isolation

Chunks are partitioned by project_id across 16 hash buckets. Queries cannot physically cross project boundaries — the partition key is enforced at the DB layer.

Append-only audit trail

Every ingest, edit, and delete is recorded in an immutable audit log via DB-level BEFORE triggers. INSERT-only — no UPDATE or DELETE path exists on the audit table.

Encryption at rest

Data lives in Supabase (Postgres) with encryption at rest enabled by default. All transit is TLS 1.2+ enforced by Fly.io + Cloudflare.

Soft-delete + recovery

gc_kb_delete soft-deletes items — content is retained and recoverable by an admin for 90 days. No silent data loss on misfire.

gotcontext.ai does not use your documents for model training. Documents are stored solely to serve your queries and are never shared across accounts.

Works with your stack

Knowledge Hub exposes a standard Streamable HTTP MCP endpoint and a REST API. It works with any LLM or framework that can make an HTTP call.

Claude (Anthropic)
GPT-5.5 (OpenAI)
Gemini 3.1 (Google)
LangChain
LlamaIndex
Any HTTP client

No proprietary SDK required. See the full API reference →

7 MCP tools, full lifecycle

Every operation is available as an MCP tool and a REST endpoint. Use the same interface from Claude Code, Gemini CLI, Codex, or any MCP-compatible client.

gc_kb_ingest

Upload a document and store semantic chunks with compressed embeddings.

gc_kb_query

Query your knowledge base with semantic search; results include compressed context.

gc_kb_get

Fetch a specific item by ID with its current version and metadata.

gc_kb_list

List all items in a project with pagination and status filters.

gc_kb_edit

Update an item's content with optimistic concurrency and version tracking.

gc_kb_diff

Compare two versions of an item to see what changed between edits.

gc_kb_delete

Soft-delete an item from the knowledge base (recoverable by admin).

All tools mirror a REST API at /v1/projects/{project_id}/knowledge/* — use whichever interface fits your stack.

Time to value

From zero to querying your first document in under an hour.

01
2 min
Get your API key

Sign up, upgrade to Pro, and mint a gc_ API key from the dashboard. No vendor approval process, no POC form.

02
10 min
Ingest your first document

Call gc_kb_ingest from Claude Code, Gemini CLI, or curl. The response includes chunks_created and savings_pct — you can see the compression immediately.

03
< 1 hour
Query in production

Replace your existing RAG retrieval step with gc_kb_query. Swap in compressed chunks wherever you were passing raw document context to your LLM.

Ready to try Knowledge Hub?

Available on the Pro tier and above. Sign up or upgrade to start ingesting documents and querying compressed context via MCP or REST.

Or explore the docs →

Phase 1 BETA is live. Phase 2 (KnowledgeService domain layer + async upload pipeline) is on the roadmap for H2 2026.