"Off-Grid Operator #6: Self-Hosted Memory for AI Agents"
AI agents forget everything between sessions. That’s the fundamental problem nobody talks about when they show off their shiny agent demos.
You can have the smartest model in the world. End the session, start a new one, and it has no idea what happened five minutes ago. For a one-off chatbot interaction, that’s fine. For an AI that’s supposed to be your operational partner — tracking projects, remembering preferences, learning from mistakes — it’s a dealbreaker.
So I built a memory system. It’s called Open Brain, it runs on Postgres with pgvector, and it lives on a mini PC in my house. Here’s how it works and what I learned.
The problem with file-based memory
Before Open Brain, I used flat files. Markdown files organized by date, topic registers, a working memory file that got rewritten every session. It worked — sort of.
The issues showed up fast:
Search was terrible. Grep doesn’t understand meaning. Searching for “that thing Andrew said about deployments” returns nothing, even if the information exists in twelve different daily logs. You need semantic search — matching by meaning, not keywords.
Multiple agents couldn’t share it. I run several AI agents with different specializations. When my coding agent learned something about the codebase, my ops agent had no way to know. Each agent maintained its own file context, and cross-pollination was manual.
It didn’t scale. At a few hundred entries it was fine. At a few thousand, every retrieval meant scanning massive files. Token budgets aren’t infinite, and stuffing an entire memory corpus into context is wasteful.
Files are still part of the system — they handle session-specific context and operational state well. But long-term, shared, searchable memory needed something else.
What Open Brain actually is
Open Brain is a Phoenix/Elixir application backed by Postgres with the pgvector extension. It exposes a simple REST API that any agent can write to and read from.
The core operations:
- Add a memory: Text goes in, gets embedded into a vector, stored with metadata (tags, source agent, timestamp).
- Search memories: Query text gets embedded, pgvector finds the closest matches by cosine similarity. Returns ranked results with relevance scores.
- WIP tracking: Work-in-progress entries that agents create when starting non-trivial tasks and close when finished. This solves the “two agents working on the same thing” problem.
That’s it. No graph databases, no fancy ontologies, no knowledge representation framework. Text in, vectors out, similarity search. Simple enough to be reliable.
The embedding pipeline
Every memory gets embedded at write time using an embedding model. The vectors are 1536-dimensional and stored directly in Postgres via pgvector. Search is a single SQL query — no external vector database, no separate service to maintain.
For an off-grid setup, this matters. Every additional service is another thing that can fail, another thing consuming power, another thing to monitor. Postgres already runs for other applications. Adding pgvector to it costs nothing extra in operational overhead.
Multi-agent memory
Here’s where it gets interesting. I run multiple AI agents — a coordinator, a coder, a researcher, a writer, a DevOps specialist. They all write to the same Open Brain instance. They all search the same memory.
This creates something like organizational knowledge. When my researcher investigates a topic, the findings are immediately available to every other agent. When my coder discovers a bug pattern, the ops agent can search for it later.
But shared memory introduces a new problem: conflicts.
What happens when two agents write contradictory information? Agent A says “the deploy pipeline uses Docker Compose” and Agent B says “we migrated to Dokploy last week.” Both are in the memory store. Both come back as relevant results.
The solution is boring but effective: metadata discipline. Every memory includes a timestamp and source agent. When conflicts appear in search results, the more recent entry wins. Old information gets tagged as superseded rather than deleted — because sometimes the old information was actually right, and you want the audit trail.
What I’d do differently
Namespace earlier. I started with a flat tag system and should have implemented agent-specific namespaces from day one. When you have five agents writing memories, you need to filter by who said what.
Embed less, retrieve smarter. Early on I stored every observation, every minor decision, every intermediate research note. The result was a noisy memory store where important things got buried under trivia. Now I apply a relevance filter before writing — does this fact matter in a week? If not, it stays in the session log and doesn’t go to Open Brain.
WIP tracking is non-negotiable. Before I added work-in-progress entries, I had multiple agents occasionally picking up the same task. One would start a feature, another would start the same feature with a slightly different approach, and I’d end up merging conflicting work. Now every non-trivial task gets a WIP entry that other agents can see. Duplication dropped to near zero.
The self-hosted advantage
I could use a hosted vector database. Pinecone, Weaviate Cloud, whatever the current favorite is. But then my agents’ memories — which include operational details, preferences, project context, and occasionally sensitive information — live on someone else’s servers.
Running it on my own hardware means the data stays in my house. Literally in my house, on a machine I can walk over and touch. For an AI memory system that contains the operational context of my entire workflow, that matters.
It also means I control the costs. Hosted vector databases charge per query, per storage, per embedding. My Postgres instance costs electricity. At the scale I operate — thousands of memories, hundreds of queries per day — self-hosted is effectively free after the hardware investment.
Is it worth it?
Absolutely. The difference between an AI that forgets everything and an AI that can say “last time we tried this approach it didn’t work because of X” is the difference between a tool and a partner.
It’s not magic. It’s Postgres, an embedding model, and some discipline about what gets stored. The hard part isn’t the technology — it’s the operational practices around it. What to remember, what to forget, how to handle conflicts, when to trust old information.
But when it works, it compounds. Every memory makes the next interaction slightly better. Every lesson learned actually sticks. That’s leverage.
Next week: The Cost of Clever — what broke and what I’d do differently.
If you want this working in your stack — not a tutorial, actual implementation — that’s what I do. Work with me →