16
24 Comments

AI runs 70% of my distribution. The exact stack.

TL;DR: Four months. Six AI distribution stacks. $400/month at peak. Zero signups attributable to any of them. Then I opened a spreadsheet, listed every distribution action I take in a typical week (40 rows), and labeled each one "AI can do this" or "AI cannot do this without killing the channel." 28 rows went to AI. 12 stayed mine. Output doubled, cost dropped to $19/month, and the 12 I kept are the only reason any of this converts.

The $1,600 mistake

For four months I bought every "AI distribution" promise on the timeline. Apollo for lead enrichment. Lemlist for sequences. An autonomous agent platform that prospects while you sleep. An LLM lead-scorer. An AI scheduler. A separate AI for founder voice replication.

Six stacks. $400/month at peak. Zero measurable signup lift on flowly.run, the productivity tool I ship for freelancers and solo founders.

I knew exactly when to stop. I had paid more for AI growth tools that month than I had collected in MRR from them.

The spreadsheet that fixed it

One evening I closed every dashboard and opened a spreadsheet. I listed every distribution action I had taken that week. The list ran 40 rows. Then I labeled each row with one of two tags.

  • AI can do this. Scanning, scoring, drafting, scheduling, summarizing.
  • AI cannot do this without killing the channel. Selection, tone, decision, customer reply, cold outreach.

28 rows ended in the first column. 12 in the second. The 12 were the only rows that had ever produced a paying customer.

I had been paying $400/month to automate the 70% of distribution work that does not convert and ignoring the 30% that does.

The 12 I cannot delegate

These will look small. They are the entire engine.

  1. The final 30% edit of every shipped reply.
  2. Picking which 1 or 2 of the daily AI-generated drafts are actually worth shipping.
  3. Founder positioning calls on what stance to take this month.
  4. Replying to any customer DM or email. Every one. Always.
  5. The hand-edit of the opener and closer of any long-form post.
  6. Cold outreach to specific named humans (creators, journalists, podcasters).
  7. Any reply that names another person by handle.
  8. Comments under my own posts.
  9. Any time I am positioning against a named competitor.
  10. Any pitch that requires reading 3+ pieces of the recipient's prior work.
  11. The choice of which channel to retire when a metric drops.
  12. The closing line of every reply. AI cannot land a closer.

Each of these failed when I delegated it. Three of them cost me actual customers.

The list that makes the rest of the stack work

The single most valuable file in my distribution setup is not a prompt. It is a one-page "never-write" list. New rules show up every week. Each rule came from reading a bad AI draft and tagging the exact line where my voice broke.

Sample lines:

  • Never start a reply with "Great question."
  • Never use "leverage," "unlock," "synergy," "in the trenches."
  • Never end a comment with a question if the post is already over 60 words.
  • Never name the product in the first 80% of any reply.
  • Never quote a statistic without naming the source.
  • Never use an emoji unless quoting someone else.
  • Never write a sentence over 25 words in a draft for X or HN.
  • Never close with "Hope this helps."

Drafts produced with this list need a 30% edit. Drafts produced without it need a 70% edit, and at that point I am writing from scratch. The never-write list is the entire economic difference between "AI saves me time" and "AI costs me time."

This is what the 2026 vocabulary now calls context engineering at indie scale. You are not prompt-engineering. You are designing the information environment around the model so it can do the mechanical 70% and stay out of the 30% that needs skin in the game.

The $19 stack

Five scripts. About 200 lines of Python total. Cron-driven. Slack and email for human approvals.

  1. Inbox monitor. Polls Gmail IMAP for journalist platform queries every 2 hours. Each query gets scored by Claude Haiku against my 7 stances. Ranked list lands in Slack. I read it in 90 seconds and pick 0 to 2.
  2. Thread scanner. Pulls HN /front and /newest, my X home, my Bluesky timeline 3x daily. Returns top 5 candidates with a one-line "why this thread" and a suggested reply angle.
  3. Draft pack generator. End-of-day script produces 5 ready-to-edit drafts across 5 channels using my founder voice doc plus the never-write list. I edit roughly 30% of every draft before shipping.
  4. Daily digest. Pulls Umami plus Flowly analytics. Emails a 9pm summary of what shipped, what landed, what converted. 2 minutes to read.
  5. Pitch responder. Drafts journalist replies, 60-word constraint, founder voice. Always blocks on my one-click approval. Always.

Total API spend: about $19/month. Drifts to $24 in heavy weeks. Founder hours pulled back into product work: about 14 per week.

The day I almost killed a story

The pitch responder ran on auto-send for 8 days in March. I had set a low-confidence threshold and trusted the queue. A journalist's follow-up questions to my original pitch went unread for 5 days while the script regenerated polite boilerplate. She stopped replying around message 4. The story she eventually ran skipped Flowly.

The fix that night: every outbound message blocks on my one-click approval. No exceptions. No thresholds. No "low-risk" auto-send buckets.

The cost of that 8-day mistake was one piece of press. The cost of leaving the auto-send running would have been all of them.

This is the version of context engineering nobody publishes. The rule is not "what should the model see." The rule is "what does the model see that, if it gets one thing wrong, ends a relationship the model cannot rebuild."

Where Flowly fits

The reason I noticed the 12-versus-28 split at all is that I had been timing every distribution action inside the product I ship. Flowly is a single workspace for tasks, timers, and analytics for freelancers and solo founders who are tired of running four separate apps to answer one question: where did my week actually go. I had been running my own distribution inside it, the same way a freelancer tracks billable client work. The spreadsheet that started this rebuild came out of Flowly's analytics, not my head.

The lesson generalizes past the tool. If you cannot see the line between what AI did and what you did, you cannot price either one honestly. AI runs distribution. Flowly tells me whether running it was worth the 95 minutes a day I still own.

One ask, one bet

Do the 40-action sort before you buy another AI tool. List every distribution action you take in a week. Label each row "AI can" or "AI cannot without killing the channel." Then make the second column the only thing you spend founder hours on.

Mine is 28/12. Post yours in the comments. If anyone has a real 90/10 split working, with attribution that holds up, I will rebuild my stack to match. I want to be wrong about this. So far nobody has been.

Product: flowly.run. Free tier, 14-day reverse Pro trial, no card.

Most indie hackers who buy a $400 AI growth stack are paying to automate the 30% that converts and ignoring the 70% that bores them. They have it exactly backwards.

on May 20, 2026
  1. 1

    Read this with my coffee growing cold. The 28/12 ratio is almost
    exactly what I land on every time I do the same exercise — and the
    auto-send story is the specific bullet I dodged twice this year,
    both times by luck.
    The row I'd add to "cannot delegate": choosing which old thread to
    revive vs let die. The model finds candidates fine. It's terrible at
    the "is this still relevant 8 days later" call. I lost a real
    conversation last month because a draft sat in my approval queue too
    long — by the time I sent the polished reply, it landed as a thread
    necromancer.
    Your never-write list is the part of this post I keep re-reading.
    The rule I keep adding to mine: "never write a sentence that combines
    two strong claims into one." The model loves rhetorical stacking and
    you can smell it the second you read it back.
    One real question — is the 30% edit measured in words changed, or in
    time spent vs writing from scratch? Those drift apart fast for me on
    long-form.

  2. 1

    The multi-tool problem nobody talks about: using Claude Code + Cursor + Copilot on the same project.

    Each one starts fresh. Each one has different defaults. Each one will make a different decision about the same architectural question — and none of them will tell you they disagree with the others.

    Six months in, the codebase reflects three different opinions about how to structure the same thing.

    The fix: a CLAUDEmd file (works as Cursor rules too) that defines the non-negotiables before any tool touches the code. Stack, patterns, what's forbidden. All tools read the same source of truth.

    It's not about which tool is better. It's about making them agree with each other.

    Anyone else running multiple AI tools on the same codebase? How do you keep them consistent?

  3. 1

    The never-write list is the most underrated part of this. Everyone talks about prompts. Nobody talks about constraints. But constraints are what separate a draft that ships in 30% edit time versus one that has to be rebuilt from scratch.

    The 28/12 split also maps to something I've noticed: AI excels at work where the output is reviewable in 10 seconds. If it takes longer to evaluate whether the AI did it right than to just do it yourself, you haven't gained time — you've just moved the bottleneck.

    The March journalist story is the real lesson buried in here. Auto-send is never low-risk. One relationship lost to boilerplate is never recoverable. The human approval gate isn't friction — it's the product.

  4. 2

    Great breakdown. If you were starting over from scratch, what's the one thing you'd do earlier?

    1. 1

      Written the never-write list. I had a voice doc for months that only said what to do. Drafts were 70% wrong. The day I added the never-write section, drafts became 70% right. That one page is worth more than any model upgrade. Start it on day one.

  5. 2

    Can you make this concrete with one real example? I'd find it way more useful to see exactly what the AI does start to finish for a single HN comment that ships, including where you step in. The high-level pipeline makes sense, it's the actual handoffs I can't picture.

    1. 1

      Cron at 09:00, 13:00, 17:00 pulls top 30 threads from HN /front and /newest. Haiku scores each thread 0-10 for relevance to my 7 stances and returns the top 5 with a one-line "why this thread" and a suggested reply angle. I read the 5 in 60 seconds and pick 1. Sonnet then generates 2 draft comments for that thread using my voice doc plus the thread context plus the chosen stance. I read both drafts, pick the better one, hand-edit the opener (always), hand-edit the closer (always), ship.

      AI did about 4 minutes of work across the whole flow. I did about 6. The comment ships in 10 total minutes versus 25-30 if I were doing it fully manual. Multiply that 60% time saving across 5 channels and that is where the 14 hours per week of pulled-back founder time comes from.

  6. 2

    Genuine pushback here. Every distribution post I read measures output volume and engagement, then quietly assumes that becomes signups. Have you actually tied this stack to real signups, or is it outputs and vibes? If you have numbers I'd love to see how you attribute them, because organic attribution is notoriously messy and I'd rather hear the honest version with the gaps than a tidy funnel chart.

    1. 1

      Both, honestly. Signups are the lagging metric I care about most and the one most resistant to clean attribution because organic distribution has a 30 to 90 day delay between first touch and conversion.

      What I can measure: weekly output count, channel-level engagement (replies, upvotes, click-throughs to flowly.run), referrer reports from Umami, signup rate from each referrer over a 30-day window. The "hybrid-output conversion roughly 10x AI-only conversion" split I implied in the post comes from looking at click-to-signup on referrers I can tag and from comparing drafts shipped at 30% edit versus drafts shipped at 0% edit during one bad week.

      What I cannot measure: the long-tail compound effect of consistent presence. A founder who saw 6 of my HN comments over 3 months and then signed up via a direct visit shows up as "organic, no referrer." That is the bulk of my signups. I assume the volume helps. The numbers agree but do not prove it.

      If you build this stack, set up Umami or PostHog before you start, tag every link with UTM params, and accept that you will be flying half-blind on the long tail. That is the nature of organic distribution at this scale. Anyone selling you cleaner attribution is selling you fiction.

  7. 2

    Really want to try this but I'm not a developer, no Python and definitely no Playwright. Is there a realistic version of this for non-technical founders, or is that a hard requirement? Would love to know where someone like me should even start.

    1. 1

      Start with two scripts, not five. The two with the highest leverage are (1) a daily analytics digest that summarizes your traffic into one email and (2) a draft pack generator for one channel only. Pick the channel that costs you the most time per output.

      You can build both in a weekend with Claude as your pair. The non-technical version is the same flow in n8n or Make.com. The pipeline matters more than the runtime. The hardest part is the voice doc, and that one you write by hand regardless.

  8. 2

    The journalist failure made me wince, so thanks for writing it up instead of pretending the stack just works. That's the useful part of these posts. Has anything else broken in a similar way since you patched it? Trying to get a realistic sense of the failure surface before I build my own version.

    1. 1

      Yes, smaller scars.

      One: the thread scanner once recommended a Bluesky thread for engagement. I shipped a reply. The thread turned out to be a quote-post of a tragedy. I deleted within 4 minutes but a few people saw it. Now the scanner has a "sensitive content" pre-flag and the day's top 5 candidates skip anything tagged.

      Two: the draft pack generator produced a reply that paraphrased a competitor's marketing copy almost word-for-word. I caught it because the closer was uncharacteristically smooth. The fix was a new line in the never-write list: never use any phrase that sounds like it was already used by a SaaS landing page.

      Three: the inbox monitor once scored a podcast booking request as 2/10 because the founder voice doc did not include a "podcast guesting" stance. I missed the email. The fix was adding an eighth stance, then immediately collapsing it back to seven by merging two adjacent ones.

      The pattern is the same: every failure produced one line of context engineering. The stack is mostly the accumulated failures of the founder, written down.

  9. 2

    Has any platform actually flagged or deboosted your AI-assisted posts? That's honestly the one thing stopping me from setting this up.

    1. 1

      Not that I can detect. The 2026 detection heuristics target unedited LLM output with characteristic structural tells (em dash density, three-bullet conclusions, "Here is the thing about X" openers). The 30% human edit removes those tells. The posts that get traction are always the ones where my edit pass adds a number, a specific personal example, or a contrarian line the model did not generate.

  10. 2

    Maybe I missed it, but you reference these 7 stances over and over and never actually show what one looks like. That's genuinely the part I clicked through for. Can you break down the format of a single entry? And I'm curious why 7 specifically, since that feels oddly precise compared to just picking 5 or 10.

    1. 1

      The doc is one page. Seven entries. Each entry has the same four fields.

      Name. Two to four words. Mine include "single-tool stack undervalued," "AI removed design blocker not speed," "distribution is a feedback-signal problem," "founder voice is the asset."
      First sentence template. The opener I have used enough times that I can identify the stance from the first 8 words.
      Three bullets. The atomic claims this stance makes. Each bullet must be a complete idea, not a header.
      One example reply that nailed it. A real comment or post of mine, copied verbatim. The example is the calibration target for the model.
      The stance doc plus the never-write list is the entire context the LLM gets per task. Total prompt header: about 1,200 tokens. Per-request input adds another 300-800 depending on channel. Output 200-400. Cheap.

      The reason 7 works and 15 does not: I cannot hold 15 stances in my head consistently. The model can. But the human approval step at the end will fail if I cannot recognize my own stance in the draft. Seven is the ceiling of what I can recognize at a glance.

  11. 2

    Solid post, but I'm stuck on this part. I've used Apollo and Lemlist and they already handle outreach fine, so why hand-roll Python scripts for it? Genuinely asking what they were missing, because maintaining your own stack sounds like real overhead for a solo founder.

    1. 1

      I did. For 4 months. They failed for the same reason most "AI distribution" SaaS products fail at indie scale: they are optimized for outbound B2B SDR workflows where the slow 30% is "personalized intro line" and the bulk 70% is "send 100 emails per day." My slow 30% is "decide whether this is worth shipping at all," which no SaaS exposes as a step. Those tools assume you already decided. I had not.

      The Python stack is about 200 lines total across the 5 scripts. It gives me a seam where the human picks live. SaaS tools paper over that seam and charge for it. The seam is the entire product.

  12. 2

    Nice writeup. Curious about the model side, are you running one model across the whole pipeline or swapping per step? And if you mix them, what made you land on that split instead of just defaulting to one provider?

    1. 1

      Haiku for the scanning and scoring steps where I am triaging 50-100 candidates per day. Throughput and cost matter more than voice there. GPT-4o for short-form draft generation (replies, posts, pitches) because it tracks instructions tighter on the 60-word constraint and stops adding extra paragraphs. Claude Sonnet for long-form blog drafts and journalist pitch responses where voice fidelity is the load-bearing metric.

      I rotate when a model starts drifting from my voice doc, which happens about every 8 weeks. The assignment above is correct as of this week. By next quarter it will probably look different.

  13. 2

    $19/mo total? I'm paying more than that for Lemlist alone. Where does that actually go?

    1. 1

      Roughly Claude Haiku for monitoring and scoring ($6), GPT-4o for thread relevance ranking and short-form drafts ($7), Claude Sonnet for blog first-drafts and pitch responses ($4), and about $1-2 in Playwright cloud minutes for form filling. Some months it nudges $24. It has not crossed $30.

  14. 1

    The never-write list is quietly the most important part of this post. Everyone obsesses over prompts and model selection but the constraint layer is where the actual time savings live. Without it you're just generating plausible-sounding text that still needs a full rewrite.

    We hit the same wall building aisa.to (AI skills assessment through conversation). Early on we tried to let the model handle everything in the assessment flow. Turns out about 30% of the conversation requires judgment calls the model consistently gets wrong: when to push back on a vague answer, when someone is actually demonstrating skill vs just repeating something they read, when to change direction entirely. The rest is mechanical and AI handles it fine.

    Your 28/12 split rings true. Most founders I talk to claim something closer to 90/10 but when you ask them to show attribution, the number falls apart fast. The honest split is always uglier than the vibes-based one.

    One thing worth adding: the split isn't static. Tasks that were firmly in my "AI cannot" column six months ago have migrated over as I got better at writing constraints. The spreadsheet exercise is worth repeating quarterly.

  15. 1

    This comment was deleted 3 hours ago.

Trending on Indie Hackers
Show IH: I'm building a lead gen + CRM tool for web designers targeting local businesses without websites — starting with Spain User Avatar 65 comments How I built an AI workflow with preview, approval, and monitoring User Avatar 64 comments I built a URL indexing SaaS in 40 days — here's the honest story User Avatar 56 comments I'm a solo founder. It took me 9 months and at least 3 stack rewrites to ship my SaaS. User Avatar 24 comments After 4 landing page rewrites, I finally figured out why my analytics SaaS wasn't converting User Avatar 21 comments