Every session with an AI coding agent starts from zero. You open Claude Code or Cursor, and the agent has no memory of the codebase decisions you made last week, the architectural constraints you explained two months ago, the bugs that bit you last sprint. You explain them again. Every time.
This is the cold-start problem, and it is the primary friction that keeps AI coding agents from feeling like real collaborators. The agent is smart, but it has no continuity. It is a senior engineer with amnesia who resets overnight.
agentmemory is an open-source project built specifically to solve this. It is not a new agent. It is a memory layer — 51 MCP tools and 12 auto hooks that give your existing agents persistent, retrievable memory across sessions. 8,825 GitHub stars, TypeScript, zero external databases required.
The cold-start problem in concrete terms
Consider what a typical engineer session looks like without persistent memory:
- “This codebase uses a custom event bus, not Redux. The event bus is in
src/core/bus.ts.” - “Don’t touch the
UserSessionmodel directly — there’s a known issue with the serializer, useUserSessionServiceinstead.” - “We’re on Postgres 14 and the
jsonb_path_queryfunction behaves differently on the staging environment — this caused three hours of debugging in March.”
None of that survives a session boundary. Next time you open a new conversation, you start over. If you work with multiple agents — Claude Code for implementation, Cursor for review, Gemini CLI for docs — none of them share context with each other either.
The naive fix is stuffing everything into the system prompt. That has two problems: token cost scales linearly with everything you add, and LLMs perform worse with long, indiscriminate context dumps (the “lost in the middle” problem). You need selective recall, not total recall.
What agentmemory actually is
agentmemory is an MCP server with 51 tools, backed by a local storage engine built on the iii engine. There are no new APIs to learn. If your agent supports the Model Context Protocol, it can use agentmemory.
The hooks are the part that makes it automatic. For Claude Code, agentmemory installs 12 lifecycle hooks — wired to session start, session end, tool calls, file edits, and other events. When the agent finishes a task, a hook fires and stores what was learned. When a new session starts, a hook fires and retrieves relevant memories before the agent takes its first action. You do not have to think about it.
The design is intentionally invisible. You configure it once, and then memory just works.
Memory architecture: storage, scoring, and recall
This is where agentmemory diverges from simpler approaches. Memory is not a flat append-only log. It has a lifecycle.
The decay curve matters. A memory that was reinforced by usage yesterday has a high confidence score. A memory that has sat untouched for 90 days has decayed toward zero — it will not pollute the context unless it becomes relevant again (at which point usage resets the clock). This prevents the memory store from becoming a graveyard of stale, misleading context.
The 827-test suite covers these lifecycle transitions explicitly. Confidence scoring is not a heuristic tacked on — it is a first-class feature of the storage engine.
Multi-agent memory sharing
One of the more interesting architectural choices: all agents talk to the same agentmemory MCP server. Memory is not per-agent. It is per-project.
┌──────────────────────────────────────────────────────────────────────┐
│ agentmemory: Shared Memory Hub │
└──────────────────────────────────────────────────────────────────────┘
┌─────────────┐ ┌─────────────┐
│ Claude Code │ │ Cursor │
│ 12 hooks │ │ MCP server │
│ MCP + skills│ │ │
└──────┬──────┘ └──────┬──────┘
│ │
│ MCP (stdio / HTTP) │
│ │
▼ ▼
┌──────────────────────────────────────────────────┐
│ │
│ agentmemory MCP Server │
│ 51 tools · iii engine │
│ │
│ ┌────────────────────────────────────────────┐ │
│ │ Shared Memory Store │ │
│ │ SQLite + in-process vector index │ │
│ │ knowledge graph · confidence scores │ │
│ └────────────────────────────────────────────┘ │
│ │
└────────────┬─────────────────────┬───────────────┘
│ │
│ │
┌────────────┴──┐ ┌────┴────────────┐
│ Gemini CLI │ │ Codex CLI │
│ MCP server │ │ 6 hooks + MCP │
│ │ │ + skills │
└───────────────┘ └─────────────────┘
Also connects: OpenClaw · Hermes · Cline · Goose · OpenCode
+ any MCP-compatible client
────────────────────────────────────────────────────
When Claude Code learns: "UserSessionService, not
UserSession directly" → stored in shared memory.
When Cursor opens the same project → retrieves that
memory → already knows. No re-explanation needed.
────────────────────────────────────────────────────
This is a meaningful architectural decision. Cross-agent memory sharing means that the team of agents working on a project accumulates shared institutional knowledge, not siloed per-tool histories. If you switch from Claude Code to Cursor for a code review session, the memory moves with you.
Hybrid search and what 95.2% R@5 actually means
R@5 — recall at 5 — measures whether the correct memory is among the top 5 results returned for a query. At 95.2%, agentmemory finds the relevant memory in the top 5 results 95.2% of the time. That is the number that matters because the agent only gets those top 5 — if the right memory is not in that set, the agent operates without it.
The mechanism that achieves this is hybrid search. Pure semantic (vector) search is good at finding conceptually similar content but fails on exact matches — function names, error codes, version numbers, specific identifier strings. Pure BM25 (keyword) search is precise on exact matches but misses paraphrased content and synonym variations.
agentmemory runs both, then fuses the ranked lists using Reciprocal Rank Fusion (RRF). A memory that ranks well on both signals scores higher than one that ranks on only one. This is not novel — hybrid search is standard in enterprise search — but it is the right engineering decision here, and the 95.2% R@5 validates it against a realistic benchmark.
For context: naive approaches score near zero (stateless baseline), naive context dumps achieve 60-70% (the “stuff everything in” approach, which also balloons token cost), semantic-only search lands around 80-85%.
Token economics: how 92% reduction works
This one requires a precise explanation because “92% fewer tokens” is a strong claim and the mechanism is specific.
The alternative to selective memory recall is full-context injection: you dump your entire project history, architecture notes, and decision log into the system prompt every session. If that history is 50,000 tokens, every session costs 50,000 tokens in context before the agent writes a single line of code. This compounds — the cost per session grows as the project grows.
agentmemory retrieves at most 5 memories per query (R@5). A typical memory is a few sentences. Five memories might be 200-400 tokens total. On a 50,000-token history, that is a 99%+ reduction for a single query. The 92% figure is the measured average across a realistic workload distribution — some queries benefit less, some benefit more.
The reduction is real because the retrieval is selective. The agent asks “what do I know about UserSession?” and gets the 2-3 memories that are directly relevant, not the entire project history. The 95.2% R@5 is what makes this safe — you can only afford selective recall if retrieval is reliable.
Knowledge graphs vs flat memory
Most memory systems for agents are key-value stores or append-only logs. agentmemory builds knowledge graphs alongside flat memory.
When the agent learns that function A calls function B, and that function B depends on service C, those relationships are stored as graph edges — not just as three separate memories. When you later query about function A, the graph traversal can surface the downstream dependency on service C even if the query does not mention it explicitly.
This is closer to how institutional knowledge actually works. Facts do not live in isolation — they have relationships, and those relationships are often more useful than the facts alone. “The UserSession model has a known serializer bug” is a fact. “The UserSession model has a known serializer bug, which is why UserSessionService was introduced, and why the auth tests mock UserSessionService rather than UserSession directly” is a knowledge graph traversal.
The graph capability requires more storage overhead than flat memory, but the local SQLite backing means this stays on-machine with no external service dependencies.
Install and Claude Code integration
agentmemory is distributed as an npm package. No Docker, no Postgres, no external vector database.
npm install -g @agentmemory/agentmemory
agentmemory init
init sets up the local SQLite database and vector index. The entire state lives in a directory on your machine.
For Claude Code specifically:
agentmemory install claude-code
This single command wires 12 hooks into Claude Code’s configuration and registers the MCP server. The hooks cover:
- Session start: retrieve relevant memories before the first agent action
- Session end: persist what was learned
- Tool call events: capture tool outputs worth remembering
- File edit events: note which files were changed and why
- Error events: store failed approaches to avoid repeating them
After installation, you restart Claude Code and memory starts accumulating. There is no additional configuration required.
For Codex CLI, the install command is agentmemory install codex and wires 6 hooks (the Codex lifecycle has fewer hook points). For Cursor, Gemini CLI, Cline, and OpenCode, the install registers the MCP server without hooks — those clients do not expose a hook API, so memory is tool-mediated rather than automatic.
The iii Console: inspecting memory in real time
agentmemory ships a local web interface called the iii Console. You can use it to inspect what is currently in the memory store: the raw memories, their confidence scores, which queries triggered which retrievals, and which memories are decaying.
This is the debugging surface for the inevitable question: “Why did the agent forget that?” The iii Console lets you see whether the memory was stored at all, what its current confidence score is, and whether a different query term would have retrieved it.
In practice, the console is most useful during initial setup and for diagnosing retrieval failures on specific queries. Once a project’s memory is stable, you rarely need to open it.
Honest take: limitations and what to watch
agentmemory is a young project with high stars and active development. A few things to be clear about:
The 95.2% R@5 benchmark: The benchmark methodology matters for interpreting this number. R@5 is measured on a specific benchmark corpus — not necessarily your codebase, your query patterns, or your memory volume. Real-world retrieval performance on a domain-specific codebase may differ. The number is credible and the architecture justifies it, but treat it as a lower-bound target rather than a guarantee.
Hooks are agent-specific: The 12-hook experience requires Claude Code. For Cursor, Cline, and other MCP-only clients, you get the 51 MCP tools but not the automatic lifecycle integration. Memory retrieval in those clients is tool-mediated — the agent has to call a memory tool explicitly, which means retrieval quality depends on how the agent model decides to use the tools.
Local-only at the moment: The zero-external-database design is a feature for single-developer workflows. For team environments where multiple engineers share an agent, there is currently no sync or collaboration layer. Memory is local to the machine running agentmemory.
Knowledge graph maturity: The graph capability is real but newer than the flat memory system. Expect the graph traversal quality to improve over time as the iii engine matures.
Benchmark comparison
Retrieval R@5 by Memory Approach
─────────────────────────────────────────────────────────────
No memory (stateless)
0% ▓▓ 0%
└──┘
Naive context dump (full history in system prompt)
60-70% ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ~65%
└────────────────────────────────┘
Semantic-only vector search
80-85% ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ~83%
└─────────────────────────────────────────┘
agentmemory hybrid (semantic + BM25 + confidence)
95.2% ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ 95.2%
└────────────────────────────────────────────────┘
─────────────────────────────────────────────────────────────
0% 100%
Notes:
· Naive dump also burns 10-50× more tokens per session.
· Semantic-only misses exact identifier / error-code matches.
· agentmemory hybrid fuses both signals via RRF before scoring.
The gap between semantic-only (83%) and hybrid (95.2%) represents roughly 1 in 8 queries where semantic search would miss a relevant memory that hybrid search catches. For a developer working eight hours a day with an agent, that is a meaningful difference in agent behavior.
Conclusion
agentmemory addresses a real, concrete problem in the current state of AI coding agents. Stateless sessions are not a minor inconvenience — they are a structural tax on every engineer using these tools, paid in re-explanation time every morning.
The technical architecture is sound: hybrid search for retrieval quality, confidence decay to prevent stale context, knowledge graphs for relational recall, and zero external dependencies for deployment simplicity. The hook-based auto-integration with Claude Code and Codex CLI is the right design — memory that requires manual intervention gets forgotten.
The caveats are real: the benchmark is on a controlled corpus, the experience degrades gracefully on MCP-only clients, and the local-only storage is a constraint for team workflows. None of these are dealbreakers for a single developer looking to eliminate cold-start friction.
The design document alone — a GitHub Gist with 1,200+ stars and 172 forks before the implementation shipped — is evidence that this problem resonates. The implementation now has 8,825 stars and a 827-test passing suite. That is not hype. That is a project that found a genuine gap and shipped something technically credible to fill it.
If you work with AI coding agents daily and the cold-start problem costs you real time, agentmemory is worth an hour of setup.