Compressed RAG.
Up to 5-20× cheaper than NotebookLM.
Knowledge Hub brings semantic compression to your retrieval stack. 64% avg compression in production(live from /v1/global-savings) — ingest, query, and retrieve at a fraction of the token cost without rebuilding your infrastructure.
How it compares
Compression ratio for competitors is listed as — — we do not have audited numbers and will not fabricate them. Pricing is sourced from each provider's public documentation as of May 2026.
| Feature | gotcontext.ai | NotebookLM | Pinecone | Vectara |
|---|---|---|---|---|
| Pricing model | Free / $49 / $99 / $199 / $499 | $20/mo (Google One AI Premium) | Free starter; $0.096/hr per pod (Standard) | Free; $25/mo base (Growth) |
| Compression ratio | 60% avg in production (live) | — | — | — |
| MCP-native interface | 7 gc_kb_* tools | No MCP API | No MCP API | No MCP API |
| Code-level REST access | /v1/projects/{id}/knowledge/* | No public REST API | REST + gRPC | REST API |
| Audit log | Append-only, DB-enforced | None | None | None |
| Multi-tenant projects | Yes — partition-key isolation | Google account only | Yes | Yes |
| Self-hosted option | Yes — Docker image + Ed25519 license | No | No | Yes (enterprise) |
Comparison based on publicly documented features and pricing as of May 2026. Verify current capabilities at each provider's documentation.
What a query looks like
One MCP tool call. Compressed chunks back. Cost delta visible immediately.
const result = await mcpClient.callTool({
name: "gc_kb_query",
arguments: {
project_id: "proj_abc123",
query: "What were the latency SLA decisions?",
top_k: 10,
},
});
// result.chunks[0]:
// {
// raw_text: "SLA targets: p50 <120ms, p99 <800ms...",
// score: 0.94,
// tokens: 187
// }At $3/1M tokens. Savings scale with document size — longer corpora compress more aggressively.
By the numbers
A representative query against a 40-page technical document, at gotcontext.ai's observed 64% average compression ratio (sourced from live production data at /v1/global-savings).
Methodology: 40-page technical PDF, semantic skeleton fidelity, GPT-5.5 / Claude Opus 4.7 pricing at $3/1M input tokens. Compression ratio is the live production average from the gotcontext.ai API. Actual results depend on document length, fidelity setting, and content type — longer documents with repetitive structure compress more aggressively (up to 20× reduction in retrieval tokens vs full-document RAG).
How it works
Three steps. No new infrastructure. Works with any LLM through MCP or the REST API.
Upload documents via the dashboard or call gc_kb_ingest. Each document is chunked, semantically compressed, and stored with halfvec embeddings for fast retrieval.
Call gc_kb_query with a natural-language question. The engine retrieves the most relevant compressed chunks — returning context at 5-20× lower token cost than full-document retrieval.
Use the returned compressed context in your LLM call. Edit, version, and diff documents over time with gc_kb_edit and gc_kb_diff. Every change is tracked.
Built for AI teams
Ingest PDFs, papers, and reports. Ask questions across your entire corpus without blowing your context budget on every query.
Give your agents a persistent, queryable knowledge store. Pull compressed context into prompts on demand — not as a static system prompt.
Index runbooks, architecture docs, and on-call playbooks. Surface answers via MCP without exposing raw document content to every LLM call.
Version-controlled documents with full audit trail. Diff edits between versions. Project-scoped isolation so each team keeps their own KB.
The same MCP interface works with Claude, GPT, Gemini, and any model accessible through a CLI that supports Streamable HTTP MCP.
When retrieval tokens are your biggest line item, compressing before retrieval changes the economics of running RAG at scale.
Security & isolation
Every Knowledge Hub tenant is cryptographically isolated at the database layer. No shared index, no cross-project bleed.
Chunks are partitioned by project_id across 16 hash buckets. Queries cannot physically cross project boundaries — the partition key is enforced at the DB layer.
Every ingest, edit, and delete is recorded in an immutable audit log via DB-level BEFORE triggers. INSERT-only — no UPDATE or DELETE path exists on the audit table.
Data lives in Supabase (Postgres) with encryption at rest enabled by default. All transit is TLS 1.2+ enforced by Fly.io + Cloudflare.
gc_kb_delete soft-deletes items — content is retained and recoverable by an admin for 90 days. No silent data loss on misfire.
gotcontext.ai does not use your documents for model training. Documents are stored solely to serve your queries and are never shared across accounts.
Works with your stack
Knowledge Hub exposes a standard Streamable HTTP MCP endpoint and a REST API. It works with any LLM or framework that can make an HTTP call.
No proprietary SDK required. See the full API reference →
7 MCP tools, full lifecycle
Every operation is available as an MCP tool and a REST endpoint. Use the same interface from Claude Code, Gemini CLI, Codex, or any MCP-compatible client.
gc_kb_ingestUpload a document and store semantic chunks with compressed embeddings.
gc_kb_queryQuery your knowledge base with semantic search; results include compressed context.
gc_kb_getFetch a specific item by ID with its current version and metadata.
gc_kb_listList all items in a project with pagination and status filters.
gc_kb_editUpdate an item's content with optimistic concurrency and version tracking.
gc_kb_diffCompare two versions of an item to see what changed between edits.
gc_kb_deleteSoft-delete an item from the knowledge base (recoverable by admin).
All tools mirror a REST API at /v1/projects/{project_id}/knowledge/* — use whichever interface fits your stack.
Time to value
From zero to querying your first document in under an hour.
Sign up, upgrade to Pro, and mint a gc_ API key from the dashboard. No vendor approval process, no POC form.
Call gc_kb_ingest from Claude Code, Gemini CLI, or curl. The response includes chunks_created and savings_pct — you can see the compression immediately.
Replace your existing RAG retrieval step with gc_kb_query. Swap in compressed chunks wherever you were passing raw document context to your LLM.
Ready to try Knowledge Hub?
Available on the Pro tier and above. Sign up or upgrade to start ingesting documents and querying compressed context via MCP or REST.
Phase 1 BETA is live. Phase 2 (KnowledgeService domain layer + async upload pipeline) is on the roadmap for H2 2026.