Open Brain: Why I Built a Database for My Thoughts
I have tried every note-taking system at least twice.
Evernote in 2011, then again in 2018. Notion. Bear. Apple Notes. Obsidian with a plugin maze. Roam Research for six months until the sync anxiety broke me. Each one promised to be the last note app I would ever need, and each one eventually became another archive I was afraid to search.
The real problem was never the app. It was the retrieval model.
Every note-taking app — from the simplest text file to the most elaborately linked knowledge graph — stores your words and retrieves them by the words themselves. You search for “Colombia GDP” and you get the notes that contain the phrase “Colombia GDP.” You search for “World Bank project meeting” and you miss every note where you wrote “the Bogotá team” or “the infrastructure lending side.” The filing system works, but only if the words in your head at retrieval time match the words in your head at capture time, and those are often not the same words.
I am a data scientist. I think a lot about representations of things. And at some point it clicked: keyword search treats text as a bag of characters. Semantic search treats text as a point in meaning space. The right tool for navigating your own thoughts is the second one.
The term: Open Brain
I need to give credit where it is due. I borrowed the term Open Brain from Nate B Jones, a YouTuber whose work on AI-augmented workflows gave the concept a name and a frame. His argument, roughly, is that a Second Brain as Tiago Forte describes it is a method — a manual, disciplined practice of capturing and organizing. An Open Brain is a stack — infrastructure that makes recall automatic so the method nearly disappears.
The distinction matters. Forte’s Building a Second Brain is an excellent book about habits and systems. I recommend it. But its core insight — that our minds are for having ideas, not storing them — points toward a conclusion the book does not reach: if storage and retrieval are the problem, the solution is a database and a retrieval model, not a better filing cabinet.
What a Second Brain gets wrong
The Second Brain method asks you to do three expensive things every time you capture something:
- Decide where it goes. Does this note belong in “Areas,” “Projects,” “Resources,” or “Archives”? Forte gives you the PARA system to answer that, and it is reasonable, but it still requires a decision at capture time, which is exactly when you are thinking about something else.
- Tag it correctly. So that future-you, searching under different vocabulary, can find it.
- Review and re-file regularly. Because tags decay and folders drift.
These are not bugs in the system — they are what the system requires to work. The cost is real: I have notebooks full of raw captures that I never processed into “real” notes because the processing was work I deferred until it became too large to ever finish.
The Open Brain approach removes the capture friction almost entirely. You send a thought — a voice message, a few sentences, a pasted paragraph — and the system handles the rest: extracting metadata, generating a semantic embedding, storing both. Retrieval later happens by meaning, not by the exact tags you assigned on a Tuesday when you were distracted.
Why a database, not a folder
The phrase “your data in a database you own” sounds heavier than it is. In practice, it means a free Supabase project — Postgres in the cloud, zero servers to manage, backups handled by Supabase, accessible from any device. The data format is a table of text and a column of floating-point numbers. You could export the whole thing to a CSV and read it in R.
The column of numbers is the embedding: a 1,536-dimensional vector that represents the meaning of each memory. The model that produces those numbers — OpenAI’s text-embedding-3-small — is worth pausing on. Think of it as a very elaborate dimensionality reduction that maps any piece of text onto a consistent coordinate system where semantic similarity corresponds to geometric proximity. If you have done PCA or UMAP on high-dimensional survey data, you have already worked with the downstream outputs of the same idea. The difference here is that the input is free-form text and the output is a standardized 1,536-number fingerprint whose geometry is rich enough to support nearest-neighbor search.
When you search your brain, you provide a query. The system embeds your query the same way it embedded every memory at capture time. Then it finds the memories whose vectors are closest to your query vector — the notes that mean the most similar thing, regardless of which exact words appear in either.
The result is that you can type “what did I decide about the Ecuador project last quarter” and find the note that says “Quito trip — agreed to focus on the poverty mapping module,” even though none of those query words appear in the note.
How I capture things
I have three capture surfaces, and I use all three:
Claude Desktop and Codex — any time I am in a coding or analysis session and I want to save a decision, a finding, or a reference, I tell the AI and it saves to my brain in the background. This is the most frictionless path because I am already in the conversation.
Telegram — I have a bot set up on my phone. A voice note, a forwarded article, a stray thought while walking — I send it to the bot and it saves. Voice notes get transcribed via Whisper first. The whole capture takes under ten seconds.
Direct SQL — occasionally, when I want to bulk-import notes or write a structured update, I run SQL against Supabase directly. This is the power-user escape hatch.
The cost
This is a running concern I want to address directly, because it is the first question I get when I describe the system.
Supabase free tier covers a personal brain comfortably. The embeddings cost a fraction of a cent per note. I have been running this for over a year and the total OpenAI bill for embeddings and metadata extraction averages $0.10–0.30 per month. That is not a typo. A single Notion Pro subscription costs more in one month than my Open Brain will cost in years.
The other cost is setup time. This is real. The database setup takes an afternoon. The Telegram bot takes another hour. The web app, if you want one, is a weekend project. I am writing this series precisely because I want to make that setup time as short as possible for anyone who wants to follow along.
What this series covers
This is the first post in a series. The posts that follow will walk through each component of the stack:
- Post 2 — Setting up Supabase and pgvector, defining the schema, and building hybrid search that combines semantic and keyword matching
- Post 3 — The Telegram bot for mobile capture, including voice note transcription
- Post 4 — The family brain concept (a shared Supabase project for household memories) and the retrieval improvements I added after launch: server-side filters, Cohere reranking, and type-aware chunking for long notes
- Post 5 — A Next.js web app deployed on Vercel that gives you a browser interface and installs as a PWA on your phone
- Post 6 — Keeping the system alive: automated backups, security hardening, and GitHub hygiene
The whole stack runs on free tiers and a few dollars of OpenAI credit per month. Every piece is something you own: the database, the code, the data. Nothing disappears when a startup shuts down.
If you are an economist or data scientist who codes in R or Stata, you already have most of the conceptual background you need. Postgres is not foreign if you have worked with relational databases in any form. The new pieces — embeddings, pgvector, Node.js bots — will get proper introductions as they come up.
The next post sets up the database.
Agent prompt: orient your AI to the project
Before you write a single line of code, give your AI agent the following prompt. It establishes what you are building and why, so every subsequent conversation starts from a shared mental model rather than having to re-explain the concept each time.
Copy the block below, paste it into your agent (Claude Code, Codex, Cursor, or any coding assistant), and adjust the bracketed placeholders to match your setup.
I am building a personal AI second brain called an Open Brain. Here is what it is:
- A PostgreSQL database (hosted on Supabase, free tier) stores all my notes as text.
- Each note is embedded with OpenAI text-embedding-3-small (1536 dimensions) and stored
alongside the text in a vector(1536) column.
- OpenAI gpt-4o-mini extracts structured metadata from each note: title, summary, type,
importance score, tags, people mentioned, and action items.
- Retrieval uses a hybrid search function that combines cosine similarity (70%) with
full-text keyword ranking (30%), so searches work by meaning AND by exact terms.
- Capture happens via a Telegram bot on my phone and via this AI coding session.
- There is an optional MCP (Model Context Protocol) server that exposes the brain as
tools to Claude Desktop and Codex.
My Supabase project:
- URL: [YOUR_SUPABASE_URL]
- The main table is called `memories`.
- Service role key is in my .env file as DB_SERVICE_ROLE_KEY.
- OpenAI key is in my .env file as OPENAI_API_KEY.
The GitHub repository for the full implementation is at:
[YOUR_REPO_URL — or omit if you have not pushed yet]
When I ask you to help me build or extend the Open Brain, always:
1. Read the relevant source files before making changes.
2. Keep the hybrid search function (search_memories_v2) intact — do not replace it with
a simpler cosine-only search.
3. Use the DB_ prefix for Supabase env vars, not SUPABASE_ — this is intentional.
4. Contributor is always "andres" (or whatever I tell you my handle is).
5. Do not add error handling for cases that cannot happen. Trust the database constraints.
This prompt is your project context. Save it somewhere you can paste it quickly — a text snippet, a CLAUDE.md file in the project root, or an AGENTS.md file if you are using Codex. The subsequent posts each include a focused build prompt for that post’s component; the context prompt above is the foundation they all assume.