2
7 Comments

Why I'm Building a Memory Universe Instead of One Startup

A few weeks ago, I shared that I had mapped out 76 product ideas.

Now that number has grown to 90.

Whenever people see that number, they usually ask the same question:

"Why not just focus on one startup?"

It's a fair question.

Most startup advice is built around finding one big idea and going all in.

But that's not really the problem I've become obsessed with.

Before building products, I worked as a memory coach.

For years, I watched people collect information constantly:

notes,
bookmarks,
screenshots,
prompts,
ideas.

The problem wasn't storing information.

The problem was finding and using it when it actually mattered.

Retrieval is where everything breaks.

That's what led me to what I now call the Memory Universe.

Not one giant platform.

A collection of small tools designed to reduce specific points of cognitive friction.

Each tool solves a narrow problem.

Capturing an idea.

Finding a prompt.

Organizing knowledge.

Retrieving something you know you saved but can't find.

Individually, none of these tools are revolutionary.

Together, they form a system.

What's surprised me most is that building them isn't the hardest part.

I've already shipped several products and continue shipping new ones.

The real challenge is deciding what deserves to exist.

With 90 mapped ideas, prioritization becomes a bigger problem than execution.

Some ideas look great on paper but probably shouldn't be built.

Others seem small but solve real pain.

I'm still learning how to tell the difference.

So I'm curious:

For founders who have built multiple products,

How do you decide which ideas deserve your time?

Do you rely on intuition?

User demand?

Revenue potential?

Or something else entirely?

I'd love to hear how others approach this.

on June 1, 2026
  1. 1

    The interesting part here is that your real bottleneck is not ideas or shipping speed. It is selection quality.

    With 90 mapped ideas, the risk is not building too slowly. The risk is building too many “valid” tools that do not compound into one clear market position.

    I’d probably score each idea against four filters:

    1. Does this solve a painful retrieval moment people already feel?
    2. Can the user explain the pain in one sentence?
    3. Does this tool strengthen the bigger Memory Universe, or distract from it?
    4. Can you validate demand before building the full version?

    The strongest products are probably not the most clever ones. They are the ones where the user already has a repeated frustration: “I saved this somewhere, but I can’t find it when I need it.”

    If useful, I can put together a short written prioritization breakdown for the Memory Universe: idea scoring framework, best first user segment, which product types to build first, and a simple validation plan before you commit to the next tools.

    1. 1

      That’s very close to what my own chatbot suggested too.

      It recommended grouping the ideas into 3 or 4 categories instead of treating all 90 ideas equally.

      I’m leaning toward something like:

      • Core tools
      • Support tools
      • Experiments
      • Distractions

      The hard part is that some ideas look useful in isolation, but they may not strengthen the Memory Universe as a whole.

      So I’m starting to ask a different question:

      Not “Is this a good product idea?”

      But “Does this make the Memory Universe easier to understand, use, or trust?”

      That filter feels much more useful than just ranking ideas by how clever they are.

      1. 1

        Yes, that is exactly the right filter.

        This is probably better as a short written breakdown than more scattered thread advice.

        Drop your email and I’ll send over a tighter version. I’d map the idea buckets, scoring framework, which product types should come first, and the simplest validation path before you build more tools.

        1. 1

          Appreciate that.

          I'm still mapping and refining the framework myself, so I'm being a bit careful about sharing the full idea map at this stage.

          That said, a lot of the thinking behind the Memory Universe is already documented on my site.

          If you're curious, feel free to start a conversation there:

          https://www.geniusbrain.works/

          I'd be interested to hear how you'd approach the classification and prioritization side before diving into the ideas themselves.

  2. 1

    Good question — it's the part that breaks quietly rather than loudly. Short version: I treat context drift as a versioning problem before it's a modeling problem.
    First, the handoff between steps is a structured artifact, not free text — a schema-constrained contract — so an environment change can shift how a step reasons but can't silently change what the next step receives. That contains the blast radius.
    Then pin what actually drifts: the model snapshot (never "latest"), decode params (temperature low/zero on execution steps, variability allowed only where real judgment lives — exactly your execution-vs-judgment split), prompt versions, and the embedding model + index version. The embedding one bites hardest: re-embed with a different model mid-sprint and retrieval quietly returns different neighbors, so the "context" moved without anyone touching the agent.
    And the only reliable way to know drift happened is a golden-set eval on every change — model bump, prompt edit, env shift. Same philosophy as your QA point: automate detection, keep the human on the judgment call about whether the drift is acceptable.
    Most of my hands-on here is on the retrieval/structured-output layer — pgvector-backed RAG and schema-constrained output — so I'm most confident on the "pin the contract and the index, eval the deltas" side. Curious where you've drawn the line between deterministic execution and genuine judgment in your agentic QA — that boundary feels like the whole game, and I'm not sure anyone's fully solved where it sits.

    1. 1

      That's a really thoughtful way of approaching it.

      I agree with most of what you said, especially around versioning, retrieval consistency, and evaluating deltas.

      What I've been thinking about lately is that drift might exist one layer above the system as well.

      Humans drift.

      Our assumptions drift.
      Our definitions of success drift.
      Even the questions we ask drift.

      Before we evaluate an agent's judgment, someone has to decide what "good judgment" means in the first place.

      That's why I've become increasingly interested in the quality of questions rather than just the quality of answers.

      In my experience, better reasoning tends to produce better prompts, better evaluations, and ultimately better systems.

      So while I agree that context drift needs to be managed technically, I also wonder how much of agent quality is downstream from the people designing and evaluating it.

      Maybe the boundary between execution and judgment starts even earlier — at the point where we decide what question is worth asking.

    2. 1

      That's a great way to think about it.

      I agree that versioning, evals, and retrieval consistency are critical for reducing drift at the system level.

      My perspective is that there's also a human layer beneath all of that.

      Even with a well-versioned system, someone still has to define the evaluation criteria, interpret the results, and decide what constitutes a good outcome.

      In that sense, I've started thinking that system quality is often downstream from user quality.

      The quality of the questions we ask shapes the quality of the answers we get.

      A strong builder tends to create better prompts, better evaluations, and better feedback loops—not just because of technical skill, but because of reasoning skill.

      Humans drift too. We misread context, make assumptions, and confidently reach incorrect conclusions.

      LLMs seem to inherit many of those same failure modes.

      That's one reason I'm so interested in memory and retrieval systems.

      Better context reduces mistakes, but I don't think it completely removes the need for judgment.

      My current belief is that as AI gets better, critical thinking becomes more important, not less.

Trending on Indie Hackers
Your build-in-public audience is not your market. I learned the difference the slow way. User Avatar 194 comments I built a WhatsApp AI bot for doctors in Peru — launched 3 weeks ago, 0 paying customers, and stuck waiting for Meta to approve my app User Avatar 61 comments Built a "stocks as football cards" thing. 5 days in, my launch tweet got 7 views. What am I missing? User Avatar 33 comments From broke and burned out as a PM, to launching my SaaS and optimizing my health User Avatar 32 comments Why Claude Skills Are Becoming Important for Tech Careers User Avatar 24 comments I kept starting projects and dropping them. So I built a system that wouldn’t let me User Avatar 23 comments