Most agents that "have memory" actually have a notepad. Everything the agent learns — a customer's billing address, a hard-won lesson about a flaky API, the rule that it should never email without approval — gets dumped into one undifferentiated pile and retrieved by similarity search. It works in the demo. It rots in production.

The problem is not the storage. It is the loss of type. When every memory is the same kind of thing, the agent cannot reason about how long it should be trusted, what should override what, or which item is even safe to overwrite. A phone number and a standing safety rule are treated as the same kind of object. That is the bug.

Human memory does not work that way, and the distinction is old and well established. In 1972, the psychologist Endel Tulving drew a line between episodic memory — "the memory of everyday events (such as times, location geography, associated emotions, and other contextual information)" — and semantic memory, "general world knowledge that humans have accumulated throughout their lives" (Wikipedia: Episodic memory; Wikipedia: Semantic memory). Remembering that you stroked a particular cat last Tuesday is a different kind of memory from knowing what a cat is. Both are real. Treating them as one thing throws away information.

To be clear about what is established and what is ours: cognitive science gives us strong evidence that memory has kinds, not one bucket. It does not hand you six tidy categories for an AI agent. The six below are AgentPrizm's design decision — a practical taxonomy we chose because each type fails differently and therefore needs to be handled differently. Think of it as engineering inspired by the science, not a claim about the brain.

The six types

Here is the taxonomy, with a concrete example of each.

Fact — a stable, objective statement about the world or the user's environment. "The production database runs Postgres 16." Facts change rarely. When they change, the old one is usually wrong, not merely out of date.

Lesson — something learned from experience, usually from a mistake. "The billing webhook fires twice on plan upgrades, so the handler must be idempotent." A lesson carries an implicit "because" — it is the residue of an event that cost someone time.

Directive — a standing rule the agent must follow. "Never send a customer-facing email without explicit human approval." Directives are normative, not descriptive. They are about what should happen, and they should win arguments with everything else.

Preference — a soft, changeable inclination of the user. "Prefers terse answers with no closing summary." A preference is true until the user changes their mind, which they will. It should shape behavior, but it should never block it.

Contact — a person and the facts that matter for working with them. "Maria — head of RevOps, owns the Salesforce integration, approves data-export requests." Contacts are facts about people, but they earn their own type because agents query them by role and relationship, not by content similarity.

Bookmark — a pointer to an external resource. "runbook.internal/oncall — the on-call escalation runbook." A bookmark is not knowledge; it is an address where knowledge lives. The agent should fetch it, not recite it.

Six, deliberately. Not sixteen. A taxonomy you cannot hold in your head is a taxonomy nobody applies consistently, which means the types stop meaning anything. Keeping the categories painfully small is the whole point.

What breaks when you collapse them

Here is where the abstract argument becomes a concrete bug report. Every failure below is one type being handled as if it were another.

Treating a preference like a fact

A user says, once, "just give me the short version." You store it as an immutable fact. Three weeks later the same user is debugging something gnarly and asks for detail — and the agent, dutifully, keeps clipping its answers, because the "fact" said so. Preferences have a short shelf life and should yield the moment the user signals otherwise. Facts are sticky by design. Store a preference as a fact and you have built an agent that argues with its own user about what they want.

Treating a one-off event like a standing directive

During one incident, an operator says "skip the staging deploy this time, ship straight to prod." If that gets written as a directive — a standing rule — the agent will skip staging forever, on every deploy, having quietly promoted a one-time exception into policy. This is the most dangerous confusion of the set, because directives are exactly the memories you grant the most authority. A lesson ("staging was skipped during the March incident") records what happened. A directive ("always skip staging") commands what must happen. Conflate them and your agent's rulebook fills up with accidents.

Treating a lesson like a fact

A lesson is conditional — it was true because of a situation. "Retry the payment API three times" was learned because the provider was flaky in Q1. Demote it to a flat fact and you strip the "because." When the provider fixes their reliability, nobody knows the lesson can be retired, because a fact has no expiry built into its meaning. Lessons should age and get revisited. Facts mostly should not.

Treating a bookmark like knowledge

The agent stores the contents of a runbook as a memory instead of a pointer to it. The runbook gets updated next week. The agent now confidently recites last week's escalation path — stale, wrong, and impossible to tell apart from current truth, because it looks exactly like every other memory. A bookmark stays fresh because it forces a fetch. Inlined knowledge silently expires.

Notice the pattern. Each failure is a validity failure — the agent trusting a memory for longer than it deserved, or granting it more authority than it earned. Type is how you encode validity. Lose the type, lose the ability to reason about trust.

Why this is a retrieval problem, not just a storage problem

Storing typed memories is half of it. The payoff comes at recall.

When you retrieve by pure similarity, a question like "should I email this customer?" pulls back whatever is semantically nearest — maybe an old email draft, maybe a preference, maybe nothing useful — and the one memory that actually matters, the directive "never email without approval," might rank fifth and fall below the cutoff. Pure vector search has no notion that a directive should outrank a stylistic preference for a question about what the agent is allowed to do.

This connects to a real, documented constraint. Anthropic, describing context engineering for agents, defines the goal as finding "the smallest possible set of high-signal tokens that maximize the likelihood of some desired outcome," and warns about "context rot": "as the number of tokens in the context window increases, the model's ability to accurately recall information from that context decreases" (Anthropic: Effective context engineering for AI agents). You cannot fix that by stuffing more memories into the prompt. You fix it by sending fewer, better ones — and "better" requires knowing a memory's type so you can prioritize a binding directive over a soft preference. Type is a ranking signal, not just a label.

The research lineage agrees that retrieval is the hard part. The Stanford "Generative Agents" paper, which gave 25 simulated characters a "memory stream," extends a language model to "store a complete record of the agent's experiences using natural language, synthesize those memories over time into higher-level reflections, and retrieve them dynamically to plan behavior" (Park et al., 2023, arXiv:2304.03442). The interesting work was never the writing. It was the synthesis and the retrieval — deciding, at any moment, which slice of a large memory deserves to be in the prompt.

How AgentPrizm puts this to work

AgentPrizm implements exactly these six types — fact, lesson, directive, preference, contact, bookmark — as first-class kinds rather than tags on a generic blob. A few design choices follow directly from the argument above.

Containers over taxonomies. Memories are scoped to containers (think: per-project or per-agent scopes) so a lesson learned in one project does not leak into an unrelated one. Routing memories to the right scope does more for relevance than any amount of clever categorization within a single pile.

Hybrid recall, type-aware ranking. Recall blends semantic similarity with keyword matching, then weights the results by type and severity — so a binding directive is not silently outranked by a stylistically similar preference. The type you stored at write time becomes leverage at read time.

Validity windows. Because the system knows a preference is changeable and a fact is sticky, memories can carry validity windows and supersede relations — so a corrected fact replaces the old one with a trail, instead of two contradictory "facts" both ranking high.

Audit. Forgetting is a soft, audited operation by default, so when an agent's behavior changes you can see which memory drove it and when it entered or left scope. Trust in an agent is mostly trust that you can explain what it remembered.

For the developer-facing detail — types, container scoping, recall filters, and forgetting semantics — see the docs. For limits and plans, see pricing.

The takeaway

If you remember one thing: a memory's type is how you encode how much, and how long, to trust it. Collapse the types into one blob and you have not simplified your agent — you have deleted the information it needs to decide what to believe. Six types is not bureaucracy. It is the smallest set that still lets an agent tell a rule from a habit, a fact from a lesson, and a pointer from the thing it points at.

Getting that structure right is what separates an agent with a notepad from an agent with a memory.

The 6 Memory Types Every AI Agent Needs (and What Happens When You Confuse Them)