From Delivery Rider to Building My First AI System — Here's My Story

by Ammorick

Hi everyone, I'm a self-taught developer from China. I work as a delivery rider from 8 PM to 8 AM, and I code in every spare moment between.

In June 2025, I was lying in bed after a shift, scrolling through videos, when I came across a course on AI Agents. The structure was: receive input → call tools → output → memory. And a question popped into my head:

"Why can't we have two AI brains talking to each other, cross-validating information?"

Along with that, a whole string of ideas followed. And I had this strange feeling — I don't know why, but it felt like this was what I was supposed to do with my life.

I didn't know how to write a single line of code back then. But the thought never went away.

On May 24, 2026, I decided to start learning to code. I'm a delivery rider — I work from 8 PM to 8 AM, and I have about 4 hours a day for self-study. So I started building. Two weeks later, I had a working MVP:

Multi-expert parallel execution: simultaneously calling medical and legal "expert brains" (powered by LLM APIs) using ThreadPoolExecutor

Safety brain: input + output filtering, with black/whitelist loaded from an external JSON and separate violation logs

Memory module: persistent conversation history (JSON), keeping the last 20 turns, with support for "recall what we talked about last time"

Director brain: aggregates responses from multiple experts into a clear, user-friendly final answer

Engineering habits: environment variable management, modular design (dispatcher/summarizer/memory/safe/safety_logger), requirements.txt, etc.

This sketch was drawn after the MVP was running, to map out the next steps.
draft sketch

Today, my MVP has these capabilities:

Multiple experts (medical, legal, strategy) running in parallel, each powered by a different API — Zhipu, Aliyun, and OpenRouter (which gives flexible access to multiple models through a single gateway)

Multi-round debate — they can agree, disagree, or supplement each other, all within a shared memory framework

Memory — it remembers the last 20 turns, so context carries over

Safety — input and output filtering with violation logs (there's still a substring-matching false-positive issue, but it doesn't affect the core value of the MVP, so I'm leaving it for now)

Test case:

"I have a cold. I can't understand the medication instructions from the doctor, and my company won't approve my sick leave. What should I do?"

This triggers all four experts at once. Each expert sees the full conversation history from previous rounds, so they can build on each other's arguments or push back where they disagree.

Running the test case — medical, law, and strategy experts responding simultaneously (general expert triggered as fallback).

It's far from perfect. The director brain's output is still just simple concatenation. Response speed is slow. The timeout logic hasn't been refined yet. All of these need work. But the core functionality works — the rest is just time.

The code is here: https://github.com/ammorick/ai-learning-journey/tree/main/Code update/2026-06-19

I also keep a "Museum of Ideas" — a separate repo where I capture the bigger, longer-term directions I'm not ready to build yet: multi-brain dialogue, AI that helps design new AI brains, safety as the underlying "ground" instead of a fence around the system. Some of these may take years. Some may never happen. But they're my starting point, and they're my North Star.

https://github.com/ammorick/Future-exploration-direction.to-be-sorted

It's not polished. It's still rough around the edges. But it runs, and I'm iterating on it every day.

I've written a couple of articles about this journey — in Chinese for now, but the code and the process are universal:

From Single Expert to Multi-Expert Debate — a technical recap of the core mechanism

Night-Shift Delivery Rider, Coding in Spare Time — the origin story and overall architecture

While testing, I realized something: to an end user, this system probably looks just like any other Q&A bot — even though the underlying logic has completely changed.

But I'm not worried about that right now. I believe in focusing on what I do best. As a developer, my energy should go into the technology itself: improving my skills, sharpening my judgment, and refining the system architecture. User feedback matters — but that's for a different stage. Right now, presenting the project as a work-in-progress is itself a form of demonstration. What people who know this space will see is the thinking process and the judgment behind it, not just the final output.

I don't know if what I'm building will ever be "good enough" for the market. But I choose to trust myself. Maybe it won't pass the market test — but it's already a proof of my capabilities. And maybe that will open other doors further down the road.

One thing I always keep in mind: there is no tomorrow without today. Only by putting in the work every single day can a real tomorrow arrive. I'll keep building this project — unless it gets crushed by the wheel of history.

I'm new to this field, and my fundamentals are still weak — I'm working on them. So I'm here to learn, and also to share my progress.

I'd love to hear your thoughts — what would you add to a system like this?

Thanks for reading.

Ammorick

on June 19, 2026

Say something nice to ufor…

Post Comment

1

This is seriously inspiring, Ammorick!
A delivery rider working 12-hour night shifts and still building a multi-expert AI system with parallel brains and safety layers in your spare time? That’s next-level dedication.
Respect for putting in the work every single day. Keep going man, this kind of persistence pays off.
Wishing you the best on your journey 🙌

Chanchalsaini200

·
9 hours ago
·
Reply
1. 1
  
  Really appreciate your kind words and well wishes!
  Squeezing in time to code after every night shift has definitely been tough, but watching the project take shape and the architecture come together piece by piece makes it all worth it. I’ll keep putting in the work steadily. Thanks so much for the support 🙌
  
  ufor
  
  ·
  8 hours ago
  ·
  Reply
1

This is a genuinely impressive start, especially given the constraints
you're working under. The honest observation that users can't see the
difference yet is actually one of the more mature things I've read
from someone this early in their journey — most people either oversell
or get discouraged by it. Curious: as you move toward phase two
(making it meaningful to end users), do you have a specific problem
domain in mind, or are you still exploring where the multi-expert
debate format adds the most real-world value?

0io

·
13 hours ago
·
Reply
1. 1
  
  That's a really good question — and honestly, something I've been turning over in my mind for a while now.
  
  I haven't locked in on a specific vertical yet. But from what I've observed so far, the multi-expert debate model delivers the most value in two types of scenarios:
  
  The first is cross-disciplinary problems. Take "sick leave request rejected by the company" — a medical expert and a legal expert see it completely differently. Asking only one side leaves blind spots. The value of the multi-expert model here is stitching together complete information from different dimensions, rather than having a single model give a vague, compromised answer.
  
  The second is high-stakes decisions. When the accuracy of the answer directly affects the final choice — like medical references or contract interpretation — having multiple experts cross-check each other and clearly flag points of disagreement is far more reliable than just throwing out a single conclusion.
  
  I also agree that not every conversation needs to trigger the multi-expert mode. I've planned a tiered routing logic (essentially a "Director Brain" acting as the overall orchestrator) to determine which questions are worth the multi-expert debate and which can be efficiently handled by a single model. That part hasn't been implemented yet, though.
  
  My personal take is that this model has even greater potential in academic research, complex problem analysis, and cross-domain discussions — it might even generate new perspectives that differ from conventional human thinking. Once the system matures, the cross-validation mechanism among multiple experts could also effectively reduce AI hallucination. At that stage, for questions that truly fit the multi-expert model, users should be able to feel a clear difference from general-purpose AI.
  
  So my current strategy is clear: stabilize the core system, make it flexible, then test it against a few real-world scenarios. It could be a medical + legal + workplace strategy combination, or something else. Which domain it eventually settles in — I'll verify that through iteration.
  
  In the longer term, if this project goes further, I hope it evolves toward something closer to "multiple brains autonomously interacting and generating new perspectives" — that's actually one of the core motivations behind why I started this in the first place. But that's a longer conversation. First, I need to get the foundation solid.
  
  Curious — have you ever come across product scenarios that needed multi-perspective answers? If you've worked on anything similar, I'd really be interested in how you thought about positioning its value.
  
  ufor
  
  ·
  8 hours ago
  ·
  Reply
1

The part that caught my attention wasn't the multi-expert system.

It was the possibility that the thing you've spent the most time improving may not be the thing that ultimately determines whether users perceive a difference.

Those can overlap.

But they don't always.

That's the part I'd be most curious about.

aryan_sinh

·
17 hours ago
·
Reply
1. 1
  
  Yes, I honestly think this is one of the core pain points for many indie developers. At its heart, it's about audience positioning — I believe that before starting to build, an indie developer needs to be clear about who their audience is, do the corresponding market research, and then build with that audience in mind.
  
  For my project, the target users I've chosen are not end users. They're the people who look at code, architecture, and the decision-making logic behind it — the ones who can determine if the foundation is solid enough to build on, and whether to give someone like me a chance to step into the industry.
  
  But I'm also keeping future possibilities open. In this project, my current phase is about getting the core functionality right. The next phase will be about making it meaningful to end users. I'm still in phase one.
  
  I don't think this is the "right answer" for every project. It's just a bet I'm making: that getting the core architecture right first will pay off in the long run, even if the early version looks unremarkable from the outside. But for where I am right now, the tangible improvement in my own skills is certain. And that might open up more possibilities down the road.
  
  This is also why I chose to post here on Indie Hackers, instead of pushing it directly to the market — because people here can read between the lines of the underlying logic, whereas typical end users care more about the final experience and usually don't pay attention to the value of what's underneath.
  
  Thanks for the question — it's a good one, and I've been reflecting on it for a while.
  
  ufor
  
  ·
  16 hours ago
  ·
  Reply
  1. 1
    
    That's actually the part I found most interesting in your reply.
    
    A lot of people assume every project has to optimize for market validation immediately.
    
    What you're describing sounds closer to capability validation.
    
    Those can produce very different decisions about what gets built, what gets ignored, and how success gets measured.
    
    I suspect a lot of the disagreement around projects like this comes from people evaluating them against a goal they weren't originally designed to serve.
    
    aryan_sinh
    
    ·
    13 hours ago
    ·
    Reply
    1. 1
      
      You put it very precisely — clearer than I could have expressed it myself.
      
      "Capability validation" vs "market validation" — this is a distinction I've always felt but never had the right words for. And the choice between these two, in essence, comes back to a more fundamental question: value is relative — it depends on who's measuring it. What's valuable to an end user might not matter much to someone trying to assess your technical judgment, and vice versa.
      
      I think a lot of the confusion and disagreement around early-stage projects comes from people using one framework to judge work that was built under a different one.
      
      For me, this choice is intentional. I don't have the bandwidth to run both tracks in parallel right now, so I've decided to focus on capability validation first — I believe that when I eventually switch to the market validation phase, the foundation I've built here will make that track much faster.
      
      Have you seen this tension play out often in projects you've come across? I'm curious — is there a pattern in which choice founders or developers at different stages tend to make?
      
      ufor
      
      ·
      8 hours ago
      ·
      Reply