01Why LLMs are stateless
Every large language model has a context window — a finite buffer of tokens that it can attend to in a single inference call. When the call ends, nothing persists. The next call starts completely blank unless the application re-supplies prior context.
For a simple chatbot this is fine. For an agent doing real work — one that revisits the same customer, the same codebase, or the same legal matter across days or weeks — statelessness is a fundamental problem. The agent forgets everything it learned. It re-asks questions already answered, re-discovers constraints already found, and re-makes mistakes already corrected.
Agentic memory is the solution: an external layer that stores what the agent has learned and retrieves the most relevant memories into the context window at the start of each new session. The agent appears to have a continuous working relationship with the world it operates in.
Context window vs. agentic memory. The context window is RAM — fast, limited, cleared between sessions. Agentic memory is the hard drive — persistent, queryable, and governed.
02Agentic memory vs RAG
Retrieval-Augmented Generation (RAG) and agentic memory are often confused because both retrieve information at inference time. They solve different problems.
| Dimension | RAG | Agentic memory |
|---|---|---|
| Unit of storage | Documents / passages | Typed memories (facts, lessons, preferences…) |
| Who writes it | Humans (content teams, knowledge bases) | Agents (learned from interactions) |
| Retrieval signal | Semantic similarity to query | Semantic + keyword + type + container + recency |
| Validity / expiry | None — stale docs stay retrievable | First-class: fact-validity windows, contradiction handling |
| Governance | Minimal | Audit trail, confidence scores, right-to-forget |
| Typical use case | Q&A over a knowledge base | Agent remembering what it learned about a user, codebase, or account |
RAG is appropriate when the information to retrieve is static and human-authored — documentation, product specs, legal precedent. Agentic memory is appropriate when the information is dynamic, agent-generated, and needs to expire, contradict, or get forgotten.
Production agents typically use both: RAG for the static knowledge base, agentic memory for learned context about the specific entities the agent works with.
03The 6 memory types
A well-designed memory layer imposes structure on what is stored. Unstructured text is hard to retrieve precisely, impossible to govern, and impossible to prioritize. AgentPrizm uses six memory types — kept deliberately small so agents don't have to decide which bucket to use:
| Type | What it stores | Example |
|---|---|---|
| fact | A true statement about the world, a user, or an entity — with an optional validity window | "Acme Corp's procurement is frozen until Q4 2026" |
| lesson | Something the agent learned from experience — a pattern, a gotcha, a strategy that worked | "This user prefers bullet-point summaries over prose" |
| directive | A standing instruction or rule the agent must follow | "Never surface competitor names in customer-facing output" |
| preference | A stated user preference — lighter-weight than a directive | "User prefers responses in Spanish" |
| contact | A person and their role, relationship, or relevance | "Sarah Chen — VP Engineering at Acme, primary technical contact" |
| bookmark | A URL or resource the agent should remember and be able to resurface | "https://acme.com/api-docs — their internal API reference" |
Six types, period — keeping categories small forces precision and makes retrieval filtering predictable.
04Memory architectures
Not all memory layers are built the same way. Three patterns appear in production agent stacks today:
| Architecture | How it works | Best for | Limitations |
|---|---|---|---|
| In-context memory | Re-inject prior conversation into the context window each session | Short conversation histories, simple chatbots | Hits context-window limits fast; no retrieval; no governance; no persistence beyond the app layer |
| External vector store | Embed all memories, store in a vector DB, retrieve by cosine similarity | High-recall document search | No type system, no validity windows, no contradiction handling; retrieval is similarity-only; governance built from scratch |
| Hosted governed memory layer | External service with typed memory, hybrid semantic + keyword retrieval, validity windows, contradiction detection, confidence scores, audit, and right-to-forget | Production agents handling real user data over time | External dependency; cost scales with usage |
The in-context pattern breaks at scale. The vector-store pattern requires building governance from scratch — validity, contradiction, audit, and forget are all non-trivial to implement correctly. The hosted governed memory layer handles all of that, letting agent engineers focus on the agent itself.
05How to add agentic memory to your agent
There are two paths to wiring AgentPrizm into an agent. The MCP path is the fastest for MCP-capable agents (Claude Code, Cursor, any MCP-compatible orchestrator). The REST path works with any agent framework.
Path 1 — Remote MCP (zero install)
Add a single block to your agent's MCP server config. No local subprocess, no SDK to install.
{
"mcpServers": {
"agentprizm-memory": {
"type": "http",
"url": "https://agentprizm.com/api/mcp",
"headers": { "Authorization": "Bearer ap_YOUR_KEY_HERE" }
}
}
}Your agent immediately gets eight memory tools: memory_bootstrap, memory_recall, memory_create, memory_forget, memory_ingest, memory_ingest_url, memory_context, and memory_profile. No code changes required.
Path 2 — REST API
Call the REST API directly from any agent loop — Python, TypeScript, Go, or any HTTP client. Ingest a memory after a conversation; recall relevant memories before the next one.
import httpx
BASE = "https://agentprizm.com/api/v1/agent"
HEADERS = {"Authorization": "Bearer ap_YOUR_KEY_HERE"}
# Store a new fact after a conversation
httpx.post(f"{BASE}/memories", headers=HEADERS, json={
"content": "Acme Corp's procurement is frozen until Q4 2026",
"type": "fact",
"containers": ["acme-corp"],
"validUntil": "2026-10-01T00:00:00Z"
})
# Recall relevant memories before the next session
r = httpx.post(f"{BASE}/recall", headers=HEADERS, json={
"query": "Acme Corp budget and procurement",
"containers": ["acme-corp"],
"limit": 5
})
memories = r.json()["memories"]The full API reference is at agentprizm.com/api-reference. The five-minute quickstart is at agentprizm.com/docs.
06Governance primitives
Raw storage and retrieval is necessary but not sufficient for production agents. The governance layer is what separates a memory layer from a note-taking app:
Know how sure you are
Every recall response surfaces a confidence score alongside the retrieved memories. Agents can surface low-confidence memories to users differently — or suppress them below a threshold — rather than presenting everything as equally reliable.
Facts expire
Any memory can be given a validUntil timestamp. When the window closes, the memory is automatically excluded from recall results. Procurement freezes, promotional offers, relationship statuses — anything that goes stale gets a window.
New facts supersede old ones
When a new memory contradicts an existing one, AgentPrizm flags the conflict rather than silently keeping both. The agent or operator resolves the contradiction — the old memory is marked superseded so it stays in the audit trail but no longer surfaces in recall.
Every recall is traceable
Every recall request returns an audit receipt — a record of what was retrieved, when, and by which agent. Required for compliance reviews, liability questions, and debugging production misbehavior.
Memories belong to contexts
Memories are scoped to containers — named scopes like a customer ID, a repo name, or a project slug. Agents only recall memories from the containers they are authorized to query. No cross-tenant leakage.
One-call GDPR compliance
POST /forget removes a memory — or an entire container — with a single API call. Soft forget marks it invisible to recall; hard forget purges the content. An audit trail of the forget event is kept for compliance regardless. GDPR-aligned by design.
07Frequently asked questions
What is agentic memory?
Agentic memory is persistent, cross-session memory for AI agents — the ability to store, recall, and govern facts, lessons, preferences, and decisions across conversations instead of starting from zero each session. Unlike a chat window's context window, agentic memory survives between sessions and can be queried semantically at any time.
How is agentic memory different from RAG?
RAG (Retrieval-Augmented Generation) retrieves documents or passages to answer a query. Agentic memory stores structured, typed memories — individual facts, lessons, directives, and preferences — scoped to specific agents or containers, with governance features like fact-validity windows, contradiction detection, confidence scores, and a one-call right-to-forget. RAG is about fetching information; agentic memory is about an agent knowing and governing what it has learned.
How do I give my AI agent long-term memory?
You can add long-term memory to your AI agent in two ways: (1) via MCP — add one block to your agent's MCP config pointing at a hosted memory server like AgentPrizm and your agent can immediately store and recall memories; (2) via REST API — call POST /memories to ingest a memory and POST /recall to retrieve relevant ones based on a query string. Both approaches take under ten minutes to wire up.
What is a memory layer for AI agents?
A memory layer for AI agents is an external service that gives agents persistent storage separate from the LLM's context window. It stores structured memories, retrieves the most relevant ones for each new conversation, and handles governance concerns like expiry, contradiction, and audit. The memory layer pattern decouples what an agent knows from what fits in a single prompt.
Ready to give your agent a memory? Start on the free Hobby plan — 1,000 memories, no credit card. Wire it in under ten minutes via the quickstart.