MCP Context Server turns AI coding agents from amnesiacs into collaborators with a persistent, searchable, shared memory. Every session, every decision, every past solution — accumulating, cross-linked, instantly retrievable — across agents, projects, and context compactions.
Your agent had a plan. The window compacted. Now it’s asking what you were working on. Sound familiar?
Agent builds a rich plan, starts executing, hits compaction. Plan evaporates. You spend the next five minutes re-explaining, re-pasting, re-hoping.
The agent stores its own plan in a persistent thread. After compaction it reads it back — instantly — and keeps executing. The plan lives outside the window.
Plan once, persist it, resume forever. The agent retrieves its own roadmap after every context reset and picks up exactly where it left off.
Orchestrator points subagent at a thread; subagent reads the full original. No telephone-game summaries. Nothing compressed at the boundary.
Two weeks later, the agent can replay why a decision was made — original request, research, tradeoffs.
A pattern solved in project A is available to agents in project B — automatically searchable across your whole workflow.
Spin up a tasks or knowledge-base thread. Agents read and write there — a structured store that fits your workflow.
Meeting notes, architecture decisions, review findings, requirements. Store now, retrieve later — same session, next session, different project.
Downloads a Docker Compose stack (Postgres + pgvector, Ollama, MCP Context Server), starts it, and fully configures Claude Code — hooks, skills, rules included.
CLAUDE_CODE_TOOLBOX_ENV_CONFIG='https://raw.githubusercontent.com/alex-feel/mcp-context-server/refs/heads/main/agents/claude-code/environment-docker-ollama.yaml' \ CLAUDE_CODE_TOOLBOX_SKIP_INSTALL='1' \ bash -c "$(curl -fsSL https://raw.githubusercontent.com/alex-feel/claude-code-toolbox/main/scripts/macos/setup-environment.sh)"
Pulls ~1.2 GB of Ollama models on first start. Everything else is handled.
Hooks, skills and rules are wired so agents learn when to store and when to retrieve. No manual setup.
Other MCP clients: Point Cursor, Codex, or
LangChain at http://localhost:8000/mcp
PyPI:
pip install mcp-context-server
Backends: SQLite (zero-config) or PostgreSQL
Deploy: Docker Compose · Kubernetes ·
Helm
Full documentation ↗
Stemming, ranking, boolean queries; pgvector similarity; Reciprocal Rank Fusion when you want both — plus cross-encoder reranking on top.
Filter by nested JSON paths, tags, date ranges, and indexed fields. GIN indexes on array/object metadata for speed.
Zero-config for solo work; 10× write throughput with Postgres for teams. Same API, swap with one env var.
Ollama (local, default), OpenAI, Azure, HuggingFace, Voyage.
Every stored entry gets a concise summary via Ollama, OpenAI, or Anthropic.
Docker Compose for local; Helm chart for Kubernetes; bearer for HTTP transport auth.
Long docs chunked for semantic search. Results over-fetched, then reranked with ms-marco-MiniLM for precision.
Persist a text or multimodal entry to a thread.
Browse and filter entries by thread, metadata, tags, dates.
Retrieve full, untruncated entries by ID.
Patch text, metadata, tags, or images in place.
Remove stale or superseded entries.
Enumerate threads with entry counts and timestamps.
Server health, feature status, and usage metrics.
Stemming, ranking, boolean queries, cross-encoder rerank.
Vector similarity with cross-encoder rerank.
FTS + semantic fused with RRF, cross-encoder rerank.
Persist many entries in one call.
Bulk patches across entries.
Multi-criteria bulk deletion (IDs, threads, age).
Point any MCP-compatible client at http://localhost:8000/mcp. No lock-in, no custom protocol.
The project is open source under the MIT license. Star it, fork it, contribute to it — or just run the command and feel the difference on your next coding session.