KB-2026.05 · gc_kb_query v1.0 BETA

Compressed RAG.

Up to 5-20× cheaper than NotebookLM.

Knowledge Hub brings semantic compression to your retrieval stack. ~50% typical compression(conservative figure; live /v1/global-savings runs higher): ingest, query, and retrieve at a fraction of the token cost without rebuilding your infrastructure.

Open Knowledge Hub $ pip install gotcontext

~50%

typical compression

MCP tools (gc_kb_*)

MCP + REST

both interfaces

v1.0

versioned docs

How it compares

Compression ratio for competitors is listed as —. We do not have audited numbers and will not fabricate them. Pricing is sourced from each provider's public documentation as of May 2026.

Feature	gotcontext.ai	NotebookLM	Pinecone	Vectara
Pricing model	Free / $49 / $99 / $199 / $499	$20/mo (Google One AI Premium)	Free starter; $0.096/hr per pod (Standard)	Free; $25/mo base (Growth)
Compression ratio	~50% typical (live runs higher)	—	—	—
MCP-native interface	7 gc_kb_* tools	No MCP API	No MCP API	No MCP API
Code-level REST access	/v1/projects/{id}/knowledge/*	No public REST API	REST + gRPC	REST API
Audit log	Append-only, DB-enforced	None	None	None
Multi-tenant projects	Yes, partition-key isolation	Google account only	Yes	Yes
Self-hosted option	Yes, Docker image + Ed25519 license	No	No	Yes (enterprise)

Comparison based on publicly documented features and pricing as of May 2026. Verify current capabilities at each provider's documentation.

What a query looks like

One MCP tool call. Compressed chunks back. Cost delta visible immediately.

TypeScript · MCP

const result = await mcpClient.callTool({
  name: "gc_kb_query",
  arguments: {
    project_id: "proj_abc123",
    query: "What were the latency SLA decisions?",
    top_k: 10,
  },
});

// result.chunks[0]:
// {
//   raw_text: "SLA targets: p50 <120ms, p99 <800ms...",
//   score: 0.94,
//   tokens: 187
// }

This query

billed tokens1,847

saved tokens12,360

cost delta-$0.034

At $3/1M tokens. Savings scale with document size: longer corpora compress more aggressively.

By the numbers

A representative query against a 40-page technical document, at gotcontext.ai's observed ~50% typical compression ratio (conservative figure; live production data at /v1/global-savings runs higher).

Without compression

Traditional RAG: full raw chunks returned

Chunk tokens retrieved12,400

Prompt tokens sent to LLM~13,800

Cost at $3/1M tokens$0.041

With Knowledge Hub

Compressed retrieval: semantic skeleton per chunk

Chunk tokens after compression~4,960 (-50%)

Prompt tokens sent to LLM~6,360

Cost at $3/1M tokens$0.019

54% cost reduction on this query

Methodology: 40-page technical PDF, semantic skeleton fidelity, GPT-5.5 / Claude Opus 4.7 pricing at $3/1M input tokens. Compression ratio shown is a conservative ~50% typical figure; the live production average from the gotcontext.ai API runs higher. Actual results depend on document length, fidelity setting, and content type. Longer documents with repetitive structure compress more aggressively (up to 20× reduction in retrieval tokens vs full-document RAG).

How it works

Three steps. No new infrastructure. Works with any LLM through MCP or the REST API.

Ingest

Upload documents via the dashboard or call gc_kb_ingest. Each document is chunked, semantically compressed, and stored with halfvec embeddings for fast retrieval.

Query

Call gc_kb_query with a natural-language question. The engine retrieves the most relevant compressed chunks, returning context at 5-20× lower token cost than full-document retrieval.

Build

Use the returned compressed context in your LLM call. Edit, version, and diff documents over time with gc_kb_edit and gc_kb_diff. Every change is tracked.

Built for AI teams

RES

Research & analysis

Ingest PDFs, papers, and reports. Ask questions across your entire corpus without blowing your context budget on every query.

AGT

Cross-agent memory

A project-scoped knowledge store every agent on your team reaches over the same MCP endpoint. Claude Code, Cursor, Codex — all read the same compressed context. Pull what you need on demand, not as a static system prompt bloating every call.

DOC

Internal documentation

Index runbooks, architecture docs, and on-call playbooks. Surface answers via MCP without exposing raw document content to every LLM call.

ENT

Enterprise RAG

Version-controlled documents with full audit trail. Diff edits between versions. Project-scoped isolation so each team keeps their own KB.

MUL

Multi-model pipelines

The same MCP interface works with Claude, GPT, Gemini, and any model accessible through a CLI that supports Streamable HTTP MCP.

OPS

Cost-sensitive production

When retrieval tokens are your biggest line item, compressing before retrieval changes the economics of running RAG at scale.

Security & isolation

Every Knowledge Hub tenant is cryptographically isolated at the database layer. No shared index, no cross-project bleed.

Project isolation

Chunks are partitioned by project_id across 16 hash buckets. Queries cannot physically cross project boundaries: the partition key is enforced at the DB layer.

Append-only audit trail

Every ingest, edit, and delete is recorded in an immutable audit log via DB-level BEFORE triggers. INSERT-only: no UPDATE or DELETE path exists on the audit table.

Encryption at rest

Data lives in Supabase (Postgres) with encryption at rest enabled by default. All transit is TLS 1.2+ enforced by Fly.io + Cloudflare.

Soft-delete + recovery

gc_kb_delete soft-deletes items: content is retained and recoverable by an admin for 90 days. No silent data loss on misfire.

gotcontext.ai does not use your documents for model training. Documents are stored solely to serve your queries and are never shared across accounts.

Works with your stack

Knowledge Hub exposes a standard Streamable HTTP MCP endpoint and a REST API. It works with any LLM or framework that can make an HTTP call.

Claude (Anthropic)

GPT-5.5 (OpenAI)

Gemini 3.1 (Google)

LangChain

LlamaIndex

Any HTTP client

No proprietary SDK required. See the full API reference →

7 MCP tools, full lifecycle

Every operation is available as an MCP tool and a REST endpoint. Use the same interface from Claude Code, Gemini CLI, Codex, or any MCP-compatible client.

gc_kb_ingest

Upload a document and store semantic chunks with compressed embeddings.

gc_kb_query

Query your knowledge base with semantic search; results include compressed context.

gc_kb_get

Fetch a specific item by ID with its current version and metadata.

gc_kb_list

List all items in a project with pagination and status filters.

gc_kb_edit

Update an item's content with optimistic concurrency and version tracking.

gc_kb_diff

Compare two versions of an item to see what changed between edits.

gc_kb_delete

Soft-delete an item from the knowledge base (recoverable by admin).

All tools mirror a REST API at /v1/projects/{project_id}/knowledge/*. Use whichever interface fits your stack.

Time to value

From zero to querying your first document in under an hour.

2 min

Get your API key

10 min

Ingest your first document

Call gc_kb_ingest from Claude Code, Gemini CLI, or curl. The response includes chunks_created and savings_pct: you can see the compression immediately.

< 1 hour

Query in production

Replace your existing RAG retrieval step with gc_kb_query. Swap in compressed chunks wherever you were passing raw document context to your LLM.

Ready to try Knowledge Hub?

Available on the Pro tier and above. Sign up or upgrade to start ingesting documents and querying compressed context via MCP or REST.

Open Knowledge Hub View pricing

Or explore the docs →

Phase 1 BETA is live. Phase 2 (KnowledgeService domain layer + async upload pipeline) is on the roadmap for H2 2026.