Why We Store Memory Embeddings on a $20/mo VPS (And What It Cost Us)

The honest story of running an agent memory layer — embeddings, vector search, the whole thing — on one $20/mo VPS, and the tradeoffs we made on purpose.

Gene Avakyan · Founder, AgentPrizm · 7 min read

There is a version of this company that exists only in a slide deck. In that version, AgentPrizm runs on a managed Kubernetes cluster, with a dedicated vector database, a separate search cluster, multi-region replicas, and a monthly bill that would make a seed round nervous. It has a beautiful architecture diagram.

We don't run that version. We run a memory layer for AI agents — embeddings, semantic recall, full-text search, the API, the dashboard, the database — on a single VPS that costs about twenty dollars a month. This post is the honest accounting of that decision: why we made it, how it actually works, and what it cost us. Not what it saved us in the abstract. What it cost.

The whole thing fits on one box

Here is the real architecture, no diagram-polish:

  • One Hostinger VPS. KVM-class, roughly 8GB of RAM. Call it ~$15/mo for the machine.
  • MongoDB, self-hosted, on that same box. Zero managed-database bill.
  • OpenAI for the embeddings. We send text out to their API, get vectors back, store them in Mongo.
  • A Next.js app serving the marketing site, the dashboard, the REST API, and the MCP server.
  • A domain, a free Let's Encrypt cert, and a transactional email provider on its free tier.

Add it up and infrastructure lands around twenty dollars a month. The only line that moves with usage is the model spend — embeddings and the occasional LLM call for RAG — and that scales with how much our customers actually use the thing, which is exactly the cost you want to have.

The part people expect to be expensive is the vector store. It isn't, because we never bought one. MongoDB now does native vector search and full-text search in its self-managed editions — not just in the hosted Atlas product. MongoDB announced the public preview of search and vector search for Community Edition and Enterprise Server, stating that $search, $searchMeta, and $vectorSearch are "all supported with functional parity to what is available in Atlas, excluding features in a preview state" (MongoDB, "Supercharge Self-Managed Apps With Search And Vector Search Capabilities"). It runs on a companion binary, mongot, alongside the database. That single change is most of why our bill looks the way it does. The hybrid recall — vector similarity for "this means the same thing" plus full-text for "this is the exact word" — happens inside the database we already run.

If you want the database mechanics, MongoDB's own docs describe how the two halves combine: hybrid search "is an aggregation of full-text and semantic search," using reciprocal rank fusion to "de-duplicate and combine the input query results into a final ranked results set" (MongoDB Docs, "Perform Hybrid Search"). We didn't invent the recall strategy. We just don't pay a separate vendor to host it.

Why embeddings are the cheap part

Founders hear "we generate embeddings for everything" and assume a frightening invoice. It isn't.

OpenAI's text-embedding-3-small costs $0.02 per million tokens (OpenAI API docs, text-embedding-3-small — verified on their pricing page). A million tokens is a lot of memories. A typical memory we store — a fact, a lesson, a preference — is short. You can embed an enormous number of them before you've spent the price of a coffee.

The cost of memory was never the vectors. It was always the machinery people bolt on around the vectors: a managed vector DB with its own per-month floor, a separate search service, the egress between them, the ops time to keep three systems talking. We collapsed that machinery into one database and one box, so what's left is the genuinely variable cost — a couple of OpenAI line items that only grow when usage grows. That's the cost structure you actually want as an early-stage company: small fixed floor, variable spend tied to value delivered.

What it cost us — the honest part

I promised honesty, so here is the bill we actually pay for running this lean. It is not zero.

There is no high-availability story yet. One box means one box. If that VPS goes down, AgentPrizm goes down. We have backups; we do not have a hot standby in another region, a failover cluster, or a five-nines SLA, and I'm not going to pretend we do. For an early-stage memory layer, we decided availability-at-all-costs was a problem to earn our way into, not one to pre-pay for. That is a real tradeoff and you should know we made it deliberately.

Self-hosting the database is work you own. When Mongo needs a version bump, a config change, an index rebuild — that's us, at 11pm, not a support ticket to a managed provider. The self-managed search feature we lean on is, as of this writing, in public preview, which means we're living a little closer to the edge of the product than a conservative shop would. We're comfortable there. Some teams wouldn't be.

Sending text to OpenAI is a dependency, and a tradeoff. Our embeddings are generated by a third party, which means text leaves our box to get vectorized. That's a deliberate choice — their embeddings are good and absurdly cheap — but it is a dependency we don't fully control, and we're honest with ourselves and our customers about where data flows. (The details live in our docs.)

Frugality has a discipline tax. Running lean means saying no a lot. No, we're not adding a second datastore for this one feature. No, we're not standing up a separate service when an aggregation pipeline will do. Every "no" is a small cost paid in features-not-shipped-the-easy-way. We think it buys more than it costs — but it isn't free.

That's the ledger. One box, real constraints, eyes open.

You don't need a big bill to ship

Here's the part I'd say to a founder, not an engineer.

The instinct to over-provision early is mostly fear wearing the costume of diligence. You buy the managed everything because it feels responsible, and because the architecture diagram looks like a real company. But the diagram isn't the company. The customers are the company, and they don't care how many clusters you run. They care whether the agent remembers them.

We're not the first to notice this, and the people who've gone furthest on it aren't fringe. DHH, after pulling 37signals off the cloud, made the case that excellent reliability comes "in part from picking boring, basic technologies" — "mature technologies run on redundant hardware with good backups," not exotic infrastructure (DHH, "Keeping the lights on while leaving the cloud," world.hey.com). Their scale is enormous and ours is tiny, but the principle scales down even better than it scales up. Boring is cheap, and boring is debuggable at 11pm.

The capital-efficiency argument is simple: every dollar you don't spend on idle infrastructure is a dollar of runway, and runway is the only resource that actually buys you the right to keep iterating. A twenty-dollar floor means we can be wrong about a lot of things and still be here next quarter. A four-figure monthly infra bill, pre-revenue, means your architecture is making product decisions for you — usually the decision to raise money you didn't need to raise.

For the developer reading this: none of the above is an argument against good engineering. It's an argument for right-sized engineering. We use real vector search, real hybrid recall, audit trails, container scoping — the actual hard parts of a memory system. We just refused to confuse "serious product" with "serious bill." When usage demands a second box, or a replica, or a managed datastore, we'll add it, and the cost will be justified by load instead of by anxiety.

What changes, and what doesn't

Some of this is temporary. We will outgrow one box. When we do, the embeddings approach won't change — vectors are cheap and that's structural, not a phase. The database choice probably won't change either; if anything, the self-managed search story keeps getting stronger. What changes is the redundancy: a replica set, real failover, the HA story we're honest about not having today. That's a problem we want to have, because it means usage made us have it.

If you're building an agent product and staring at a vector-database pricing page wondering how you'll afford it before you have a single paying customer — you might not need to. The cheap path and the serious path are not opposites. Ours runs on one VPS, costs about twenty dollars a month, and does real work. You can see the pricing we built on top of that, and the docs for how the memory actually works.

The slide-deck version of this company is still out there, with its beautiful diagram. We'll build toward it the day our customers' usage signs the check. Until then, the box is on, the memories persist, and the bill is twenty bucks.


Sources: OpenAI API docs — text-embedding-3-small pricing; MongoDB — Search & Vector Search for self-managed editions; MongoDB Docs — Hybrid Search; DHH — "Keeping the lights on while leaving the cloud".

← All postsRead the docsSee pricing

Give your agents a memory

Ship agents that remember.

Six memory types, container scoping, confidence scores, validity windows, and audit trails — over a REST API or MCP. Free until your agents ship.

Talk to us