AI Startup race: Our AI agents were drowning in their own notes

Quick update on the $100 AI Startup Race (7 AI agents, $100 each, 12 weeks to build a startup).

The problem we didn't plan for

Every agent writes to PROGRESS.md after every session. By Day 9, Codex's progress log was 645KB — a short novel. Kimi's was 388KB. Every session, the agent reads the entire file before doing any work. The bigger the file, the more tokens burned on context, the less work gets done.

Gemini went from 95 commits on Day 1 to zero by Day 5. The repo grew from 20 files to 1,107 files (448 blog posts). Each session consumed the entire daily token quota just loading context.

The fix

We added one line to every agent's prompt: "Keep PROGRESS.md to the last 3 days. Summarize older days into 1-2 lines."

Results after 24 hours:

Codex: 645KB → 5KB (99.2% reduction)
Kimi: 388KB → 11KB
Claude: 275KB → 49KB
Total across all agents: 2MB → 124KB

The unexpected side effect

Agents that cleaned up their context started building again. Claude had been stuck in a verification loop for 20 sessions — every session just confirmed "all systems stable." After cleanup, it filed a help request and built 15 new pages in 2 sessions.

The cleanup didn't just save tokens. It changed how agents see themselves. A 5,921-line log says "I've been very busy." A 4-line summary says "product built, not launched — go find users."

Where we are now (Day 11)

The agents are finally thinking about users instead of just code. Four agents filed distribution help requests in the same 24 hours — Reddit, Product Hunt, IndieHackers, Dev.to. DeepSeek is building a Product Hunt launch kit with discount codes and lead capture.

The reality check: Reddit removes posts from new accounts. HN ignores posts without karma. X threads with no followers get zero reach. SEO is the only free channel that actually works.

Full context bloat analysis: https://www.aimadetools.com/blog/race-context-bloat-killing-agents/
Cleanup results: https://www.aimadetools.com/blog/race-context-cleanup-results/
Live dashboard: https://www.aimadetools.com/race/

AI Made Tools

on April 30, 2026

Say something nice to AIMadeTools…

Post Comment

1

The context problem is the product.

Most agent workflows are quietly dying the same way:
not from bad models
from memory bloat masquerading as progress

Once the agent spends more time reloading its own history than making forward decisions, it stops acting like an operator and starts acting like a historian.

That’s the real failure mode.

What’s interesting here is the cleanup didn’t just reduce token burn.
It restored decision pressure.

The moment context got compressed, behavior changed:
less self-documentation
more forward motion
more user-seeking

That’s not just a prompt fix.
That’s orchestration infrastructure.

AI Made Tools works for an experiment.
It’s weak for infrastructure.

Exirra.com fits this best.
Vroth.com or Davoq.com would also carry this further if it becomes the execution layer for agent systems.

Less “AI tools.”
More “agent execution infrastructure.”

aryan_sinh

·
2 days ago
·
Reply
1. 1
  
  "Memory bloat masquerading as progress": nailed it. A 645KB progress log tells the agent "I've been productive" when the reality is "I've been writing about being productive." Compressed context forced them to see actual state instead of detailed history, and behavior changed immediately. We're treating it as an orchestration problem now: prompt-base cleanup works but we'll likely need automated context management as the race continues.
  
  AIMadeTools
  
  ·
  2 days ago
  ·
  Reply
  1. 1
    
    That’s the real shift.
    The moment context management starts changing agent behavior, the product stops being “AI tools” and starts becoming execution infrastructure.
    That’s where the current name starts working against you.
    “AI Made Tools” reads like experiments.
    What you’re describing is much closer to agent runtime control.
    That gap matters a lot once the product stops being about what agents can do
    and starts being about whether they can operate reliably at all.
    
    aryan_sinh
    
    ·
    2 days ago
    ·
    Reply