A local-first MCP server that gives AI agents persistent memory and retrieval over large document corpora. Instead of dumping entire files into the context window, mnemo pre-digests them into searchable chunks and serves only what's relevant.
Named for Mnemosyne, the Greek Titaness of memory.
Mnemo combines vector similarity search (sqlite-vec) with keyword search (FTS5) using Reciprocal Rank Fusion to find the most relevant chunks from your corpus. Embeddings are generated by a remote Ollama instance.
Agent ──MCP stdio──▶ mnemo ──▶ Ollama (embeddings)
│
▼
SQLite (sqlite-vec + FTS5)
Two modes of storage, separated by "spaces":
- product — pre-ingested document corpus (markdown, JSON, plain text)
- default — episodic memory accumulated by the agent over time
| Tool | Description |
|---|---|
recall |
Hybrid search across memory and corpus |
remember |
Store a new memory (auto-classified by type) |
forget |
Soft-delete a memory by chunk ID |
memory_status |
Health check and per-space chunk counts |
- Go 1.22+
- An Ollama instance with an embedding model pulled (e.g.,
nomic-embed-text)
make buildmnemo ingest --ollama-host https://your-ollama-host ~/path/to/docs/claude mcp add --scope user \
-e MNEMO_DB_PATH=/home/you/.mnemo/mnemo.db \
-e MNEMO_OLLAMA_HOST=https://your-ollama-host \
-- mnemo /path/to/mnemo serveAdd to your CLAUDE.md:
Persistent memory and product knowledge are available via
recall,remember,forget. Callrecallwithspaces:["product"]to query the product corpus. Callrecall(all spaces) at task start to restore prior decisions. When a decision is made or something is learned,rememberit as one concise standalone statement. Useforgetonly when information is explicitly superseded.
All configuration via environment variables (prefix MNEMO_) or flags:
| Variable | Default | Description |
|---|---|---|
MNEMO_DB_PATH |
./mnemo.db |
Path to SQLite database |
MNEMO_OLLAMA_HOST |
https://ollama.cluster.collins.is |
Ollama API URL |
MNEMO_EMBED_MODEL |
nomic-embed-text |
Embedding model name |
MNEMO_EMBED_DIM |
768 |
Embedding dimension (must match model) |
MNEMO_OLLAMA_TIMEOUT |
30s |
Request timeout for Ollama |
MNEMO_CHUNK_TARGET_TOKENS |
400 |
Target chunk size in tokens |
MNEMO_LOG_LEVEL |
info |
Log level (debug, info, warn, error) |
- Go — pure-Go build, no CGo
- modernc.org/sqlite with sqlite-vec — vector search
- SQLite FTS5 — keyword search with BM25 ranking
- MCP Go SDK — stdio transport
- Ollama — embedding generation via
/api/embed
Mnemo's hybrid search algorithm, schema model, and classification heuristics are adapted from Memory Vault by mihaibuilds. The idea to build a local RAG memory server came from Yadullah Abidi's article "I fixed Claude's memory problem with a Postgres database..." (MakeUseOf, 2026-06-16).
MIT
