AI runs 70% of my distribution. The exact stack.

by Max

TL;DR: Four months. Six AI distribution stacks. $400/month at peak. Zero signups attributable to any of them. Then I opened a spreadsheet, listed every distribution action I take in a typical week (40 rows), and labeled each one "AI can do this" or "AI cannot do this without killing the channel." 28 rows went to AI. 12 stayed mine. Output doubled, cost dropped to $19/month, and the 12 I kept are the only reason any of this converts.

The $1,600 mistake

For four months I bought every "AI distribution" promise on the timeline. Apollo for lead enrichment. Lemlist for sequences. An autonomous agent platform that prospects while you sleep. An LLM lead-scorer. An AI scheduler. A separate AI for founder voice replication.

Six stacks. $400/month at peak. Zero measurable signup lift on flowly.run, the productivity tool I ship for freelancers and solo founders.

I knew exactly when to stop. I had paid more for AI growth tools that month than I had collected in MRR from them.

The spreadsheet that fixed it

One evening I closed every dashboard and opened a spreadsheet. I listed every distribution action I had taken that week. The list ran 40 rows. Then I labeled each row with one of two tags.

AI can do this. Scanning, scoring, drafting, scheduling, summarizing.
AI cannot do this without killing the channel. Selection, tone, decision, customer reply, cold outreach.

28 rows ended in the first column. 12 in the second. The 12 were the only rows that had ever produced a paying customer.

I had been paying $400/month to automate the 70% of distribution work that does not convert and ignoring the 30% that does.

The 12 I cannot delegate

These will look small. They are the entire engine.

The final 30% edit of every shipped reply.
Picking which 1 or 2 of the daily AI-generated drafts are actually worth shipping.
Founder positioning calls on what stance to take this month.
Replying to any customer DM or email. Every one. Always.
The hand-edit of the opener and closer of any long-form post.
Cold outreach to specific named humans (creators, journalists, podcasters).
Any reply that names another person by handle.
Comments under my own posts.
Any time I am positioning against a named competitor.
Any pitch that requires reading 3+ pieces of the recipient's prior work.
The choice of which channel to retire when a metric drops.
The closing line of every reply. AI cannot land a closer.

Each of these failed when I delegated it. Three of them cost me actual customers.

The list that makes the rest of the stack work

The single most valuable file in my distribution setup is not a prompt. It is a one-page "never-write" list. New rules show up every week. Each rule came from reading a bad AI draft and tagging the exact line where my voice broke.

Sample lines:

Never start a reply with "Great question."
Never use "leverage," "unlock," "synergy," "in the trenches."
Never end a comment with a question if the post is already over 60 words.
Never name the product in the first 80% of any reply.
Never quote a statistic without naming the source.
Never use an emoji unless quoting someone else.
Never write a sentence over 25 words in a draft for X or HN.
Never close with "Hope this helps."

Drafts produced with this list need a 30% edit. Drafts produced without it need a 70% edit, and at that point I am writing from scratch. The never-write list is the entire economic difference between "AI saves me time" and "AI costs me time."

This is what the 2026 vocabulary now calls context engineering at indie scale. You are not prompt-engineering. You are designing the information environment around the model so it can do the mechanical 70% and stay out of the 30% that needs skin in the game.

The $19 stack

Five scripts. About 200 lines of Python total. Cron-driven. Slack and email for human approvals.

Inbox monitor. Polls Gmail IMAP for journalist platform queries every 2 hours. Each query gets scored by Claude Haiku against my 7 stances. Ranked list lands in Slack. I read it in 90 seconds and pick 0 to 2.
Thread scanner. Pulls HN /front and /newest, my X home, my Bluesky timeline 3x daily. Returns top 5 candidates with a one-line "why this thread" and a suggested reply angle.
Draft pack generator. End-of-day script produces 5 ready-to-edit drafts across 5 channels using my founder voice doc plus the never-write list. I edit roughly 30% of every draft before shipping.
Daily digest. Pulls Umami plus Flowly analytics. Emails a 9pm summary of what shipped, what landed, what converted. 2 minutes to read.
Pitch responder. Drafts journalist replies, 60-word constraint, founder voice. Always blocks on my one-click approval. Always.

Total API spend: about $19/month. Drifts to $24 in heavy weeks. Founder hours pulled back into product work: about 14 per week.

The day I almost killed a story

The pitch responder ran on auto-send for 8 days in March. I had set a low-confidence threshold and trusted the queue. A journalist's follow-up questions to my original pitch went unread for 5 days while the script regenerated polite boilerplate. She stopped replying around message 4. The story she eventually ran skipped Flowly.

The fix that night: every outbound message blocks on my one-click approval. No exceptions. No thresholds. No "low-risk" auto-send buckets.

The cost of that 8-day mistake was one piece of press. The cost of leaving the auto-send running would have been all of them.

This is the version of context engineering nobody publishes. The rule is not "what should the model see." The rule is "what does the model see that, if it gets one thing wrong, ends a relationship the model cannot rebuild."

Where Flowly fits

The reason I noticed the 12-versus-28 split at all is that I had been timing every distribution action inside the product I ship. Flowly is a single workspace for tasks, timers, and analytics for freelancers and solo founders who are tired of running four separate apps to answer one question: where did my week actually go. I had been running my own distribution inside it, the same way a freelancer tracks billable client work. The spreadsheet that started this rebuild came out of Flowly's analytics, not my head.

The lesson generalizes past the tool. If you cannot see the line between what AI did and what you did, you cannot price either one honestly. AI runs distribution. Flowly tells me whether running it was worth the 95 minutes a day I still own.

One ask, one bet

Do the 40-action sort before you buy another AI tool. List every distribution action you take in a week. Label each row "AI can" or "AI cannot without killing the channel." Then make the second column the only thing you spend founder hours on.

Mine is 28/12. Post yours in the comments. If anyone has a real 90/10 split working, with attribution that holds up, I will rebuild my stack to match. I want to be wrong about this. So far nobody has been.

Product: flowly.run. Free tier, 14-day reverse Pro trial, no card.

Most indie hackers who buy a $400 AI growth stack are paying to automate the 30% that converts and ignoring the 70% that bores them. They have it exactly backwards.

Max

on May 20, 2026

Say something nice to max_flowly_run…

Post Comment

2

Four months and $400/month to learn what most distribution tools get wrong: they automate the posting, not the finding.
The real bottleneck was never “how do I reach people” — it was “how do I know which people are actively looking for what I built, right now.” Automated posting without intent detection is just noise at scale.
We ran into this exact wall building our page. The insight that changed everything: there’s a massive difference between someone mentioning your category and someone actively asking for a recommendation. The first is a vanity metric. The second is a conversation worth joining.
What’s your current signal for telling the two apart?

alicerun

·
a day ago
·
Reply
1. 1
  
  The thread scanner scores against 7 stances, not keywords. A mention of "productivity tools" scores low. A thread where someone is actively describing a workflow problem and asking what others use scores high. The difference is intent signal in the thread itself — is the poster looking for something or just talking about a category.
  
  The practical heuristic in the scoring prompt: threads that end with a question or contain "what do you use" or "looking for" language score higher than threads that are statements. Not perfect but it filters the vanity mentions fast.
  
  The harder signal is timing — a thread asking for recommendations 4 hours old is worth entering. The same thread 3 days old is not. Intent decays faster than category mentions do.
  
  max_flowly_run
  
  ·
  a day ago
  ·
  Reply
3

It's interesting that you found the 12 tasks you kept in-house were the key to conversion, suggesting that human touch and judgment are still essential in your distribution process. I'd love to know more about what specifically those 12 tasks entail and how you've optimized them for maximum impact. What was the most surprising task that you found couldn't be effectively automated with AI, and how do you handle it now?

Propfirms

·
4 days ago
·
Reply
1. 1
  
  The full list is in the post, but the most surprising one was comments under my own posts. I assumed that was low-stakes enough to delegate — it's my thread, the context is set, the model knows my voice. Every time I tried it the replies read like a founder who was present for the setup and checked out for the conversation. The first reply to your own thread sets the temperature for everything that follows. Turns out that's not a formatting problem the never-write list can fix. It requires actually being in the room.
  
  max_flowly_run
  
  ·
  4 days ago
  ·
  Reply
3

This post landed at the exact right time for me.

I just got my first real product live, and distribution immediately became the part that felt messier than building. My instinct was to use AI to move faster, but the 28/12 split is making me think about the boundary way more clearly.

The part that clicked for me is that “selection” is not admin work. Choosing which thread is worth entering, when not to mention the product, and whether a reply actually sounds like a human with skin in the game is the whole trust layer.

I’m starting to use AI for the scanning/drafting/tracking side, but keeping the final comment, final judgment, and any direct reply human. Your “never-write” list feels like the missing artifact. I had a voice doc, but not a hard list of things that instantly make a reply feel fake.

One question: if you were starting the system again at zero audience, would you still build the scanner/draft workflow first, or would you manually comment for a few weeks before automating anything?

flynhawaiian

·
5 days ago
·
Reply
1. 1
  
  Manual first, no question. The scanner needs something to score against. The draft generator needs a voice doc. The voice doc needs real shipped comments to calibrate from. You cannot write the never-write list until you've read enough bad AI drafts to know exactly where your voice breaks — and you cannot produce those drafts until you know what good looks like from your own manual output.
  
  Four to six weeks of fully manual commenting on one channel gives you the calibration data the whole stack depends on. Skip that and you're automating a voice you haven't found yet. The output will be fast and generic and you won't be able to tell why it isn't converting.
  
  Build the never-write list before you build the first script. That's the one thing I'd change.
  
  max_flowly_run
  
  ·
  5 days ago
  ·
  Reply
  1. 1
    
    Yeah, this is useful. Appreciate the direct answer.
    
    The line “the scanner needs something to score against” is probably the part I needed. I’ve been treating AI as useful for scanning, drafting, and tracking, but you’re right that even the scoring rules need real manual reps first.
    
    I’m going to treat the next stretch as calibration: one channel, shipped comments, notes on what feels fake, then only automate around that.
    
    Did you keep that manual phase strictly to one channel, or was it more about one clear buyer/persona even if the conversations happened in a few places?
    
    flynhawaiian
    
    ·
    4 days ago
    ·
    Reply
    1. 1
      
      One channel, not one persona. The persona clarity matters but the channel constraint is what makes the calibration useful — each platform has its own rhythm, word count tolerance, and failure mode. A comment that lands on HN reads as flat on Reddit and vice versa. If you're calibrating across three channels at once you can't isolate what's working.
      
      Pick the channel where your buyer already has conversations, not the one with the biggest audience. Manual reps there first. The never-write rules you generate will be channel-specific anyway — some will generalize, most won't.
      
      max_flowly_run
      
      ·
      4 days ago
      ·
      Reply
3

The attribution piece is what most founders miss. AI can run a channel and look productive while contributing zero to pipeline, and you only catch it months later when you back into the actual numbers. I run a SaaS that automates social distribution, and the same pattern held internally: anything in the buyer's last few steps before conversion needs a human voice or the channel dies. Your 12-list is the more useful artifact here than the 28.

GregoryScottHenson

·
5 days ago
·
Reply
1. 1
  
  "Looks productive while contributing zero to pipeline" is the failure mode that's hardest to catch because the dashboard stays green the whole time. Output metrics are easy to generate. The lag between activity and conversion is long enough that most founders have already credited the channel before the real signal comes in.
  
  The "human voice in the last few steps before conversion" framing matches what I see in the 12. The further a task sits from the actual decision moment, the safer it is to delegate. The closer it gets to someone deciding whether to trust you, the more it has to stay mine.
  
  max_flowly_run
  
  ·
  5 days ago
  ·
  Reply
1

The spreadsheet step is the real insight here, not the stack. I did the same audit and the tasks AI killed were the ones that signaled a human was present, so your 12 are load-bearing. I'd be curious which of the 28 you'd pull back first if a channel started punishing automated output.

theuniverseson

·
a day ago
·
Reply
1

The biggest thing for me wasn't who i needed to sell to, it was creating the right language to engage with who i needed to sell to generic marketing slop words don't work anymore I found that if I sympathised with the problem i had the solution too it attracted more people

JoeyPayton

·
a day ago
·
Reply
2

This is gold. I'm building Scarlyfy in Peru — an AI WhatsApp bot for medical appointment scheduling. Distribution is my biggest challenge right now. What's your top channel for B2B SaaS targeting small businesses like clinics?

jordyvillanueva

·
3 days ago
·
Reply
1. 1
  
  For clinics specifically I'd skip the content channels entirely at the start — HN and X won't get you in front of a clinic owner in Peru making scheduling decisions. The channel that works for local B2B health is almost always direct: find the 10 clinics you'd most want as customers, identify the owner or practice manager by name, and write them something that proves you understand their specific problem. No sequence, no automation. The trust bar for anything touching patient scheduling is high and the "verify me in two clicks" test matters more here than in almost any other vertical.
  
  WhatsApp itself is probably your best distribution channel given the context — if you have any existing users, a referral ask inside the product will outperform anything you build on top of content.
  
  max_flowly_run
  
  ·
  3 days ago
  ·
  Reply
2

The "manual first, no question" reply is the most useful thing in this thread for where I am right now.
I have been doing fully manual commenting across two communities for about three weeks - no scripts, no drafts, nothing automated. Reading this, I finally understand why that phase cannot be skipped, I am generating the calibration data. The patterns I catch in real time, which openers land, which threads convert to profile clicks, which comments get replies versus upvotes, are exactly what any future automation would need to function on.
One thing I would add to the "verify me in two clicks" framing: profile consistency across posts matters as much as any individual comment. A reader who likes your reply and checks your last 10 posts is not just verifying you are human - they are checking whether you have a coherent point of view. A stack that produces slightly different voices across posts fails that check even if each individual draft sounds natural in isolation.
The never-write list is the artifact I am building manually before touching any tooling. First rule already in: never end a comment with a generic question if the post already has more than 60 words.

anatoli_kin

·
3 days ago
·
Reply
1. 1
  
  Three weeks of manual output before touching tooling is exactly the right sequence. The calibration data you're building isn't just which openers land — it's the failure set the never-write list gets built from. You cannot write the constraints until you've read enough of your own bad drafts to know where the voice breaks. Manual-first gives you both the positive examples and the negative ones.
  
  The profile consistency point is sharp and I haven't seen it framed that cleanly before. A reader checking your last 10 posts is running a coherence test, not just a human test. A stack that produces slightly different voices across sessions fails that check even if each individual post is good. The stance doc is supposed to prevent that drift but it requires active maintenance — mine gets reviewed every time I notice a post that sounds like a slightly different founder wrote it.
  
  Your first never-write rule is already one of mine. Good starting point.
  
  max_flowly_run
  
  ·
  3 days ago
  ·
  Reply
2

The never-write list is the part most people skip. We see the same pattern at SocialPost.ai. Founders pay for fancy AI scheduling and lead-gen, then wonder why nothing converts. The conversion always lives in the 30% nobody wants to do: the customer reply, the named-person outreach, the comment under your own post. My bet is in a quarter your 12 shrinks to 8 and the rules get stricter, not looser.

GregoryScottHenson

·
3 days ago
·
Reply
1. 1
  
  Six months in and it hasn't shrunk — it's held at 12 with some composition shifts. My bet is the opposite of yours: the number stays stable and the rules get stricter, but the 12 doesn't compress. The tasks that stay human get more entrenched the longer I run the stack, not less, because I keep finding new ways the model fails at them specifically.
  
  The "rules get stricter" part I agree with. The never-write list grows faster than it shrinks. Each failure adds a line. Very few lines get retired.
  
  max_flowly_run
  
  ·
  3 days ago
  ·
  Reply
2

"AI cannot land a closer" -- that line is the whole thing.

Spent 8 years in enterprise sales before building my own product and the pattern is identical to what happens with junior reps. You can delegate the research, the setup, the first draft. You cannot delegate the moment the deal is live. The closer has to be a human.

The spreadsheet framework is genuinely smart. Most people swing between "AI does everything" and "AI is hype" without ever doing the actual audit. 28/12 with a never-write list is systematic, not vibes.

Honest question: how often do you prune the never-write list? My guess is rules go stale faster than people expect.

AmandaBrown

·
3 days ago
·
Reply
1. 1
  
  The sales parallel is exact. The research and setup can be delegated because the cost of a miss is recoverable. The closer cannot because the cost of a miss is the deal.
  
  On pruning: I review the never-write list when a draft ships that I'm genuinely proud of and I notice it broke a rule cleanly. That's the signal a rule has gone stale — not that it produced a bad draft, but that violating it produced a good one. Rules don't go stale on a schedule, they go stale when your voice evolves past them. Mine gets a meaningful edit roughly every six to eight weeks, usually two or three lines added, one removed. The list grows more than it shrinks. Bad patterns accumulate faster than good ones get retired.
  
  max_flowly_run
  
  ·
  3 days ago
  ·
  Reply
2

Solid stack — the automation-to-human ratio you've landed on is close to what we see working. The one failure mode we kept hitting before we fixed it: single-model content that sounds confident but drifts from what five models would actually agree on — and that's the stuff that kills conversion quietly. ConsensusPress runs five LLMs to consensus before anything goes to distribution. Happy to share what the RM score delta looks like if you're benchmarking.

mohan_AIre

·
4 days ago
·
Reply
1. 1
  
  The multi-model consensus approach is interesting in theory but I'd push back on the premise. The drift problem I was solving wasn't that a single model produces overconfident content — it was that any model produces content that sounds like a model without the right constraints upstream. Five models agreeing on a draft doesn't fix a weak never-write list or a vague stance doc. It just produces more confident mediocrity.
  
  The human edit pass at 30% is doing the consensus work in my stack. One model plus a founder who knows their voice beats five models averaging toward the mean.
  
  max_flowly_run
  
  ·
  4 days ago
  ·
  Reply
2
A big shift I’m noticing is that AI distribution systems are slowly evolving from “content generation” into orchestration systems.

The interesting part isn’t just generating posts/emails anymore — it’s:
- signal detection
- audience segmentation
- timing
- workflow routing
- feedback loops
The stack matters less than whether the system can continuously learn what actually converts.

Most people automate output.
Very few automate decision-making.
CodePapa

·
4 days ago
·
Reply
1. 2
  
  The orchestration framing is right but I'd push back on "automate decision-making" as the goal. The decisions that matter in my stack — which thread to enter, whether to ship a draft, which journalist is worth pursuing — are the ones I explicitly kept human. Automating them is what the $400/month stack was trying to do. It failed because the decision quality collapsed without skin in the game.
  
  What I'd say instead: the system should automate the inputs to the decision, not the decision itself. Signal detection, routing, timing — yes, all of that. The moment the system makes the call on whether to ship, you've lost the thing that makes the output convert.
  
  max_flowly_run
  
  ·
  4 days ago
  ·
  Reply
  1. 1
    That’s a really important distinction actually.
    
    “Automating decisions” sounds attractive until the cost of a wrong decision becomes asymmetric.
    
    We’ve seen something similar in operational AI workflows:
    AI is incredibly effective at:
    
    surfacing signals
    
    reducing noise
    
    preparing context
    
    ranking likely actions
    
    …but the final judgment layer still matters a lot when:
    
    reputation
    
    customer trust
    
    prioritization
    
    strategic nuance
    are involved.
    
    I think the interesting future is probably not full autonomy, but high-context human amplification — where systems continuously narrow decision friction without completely removing human intent from the loop.
    
    CodePapa
    
    ·
    4 days ago
    ·
    Reply
    1. 1
      
      "Narrow decision friction without removing human intent" is a clean way to put it. That's what the one-click approval gate does in practice — the system does everything up to the moment of consequence, then stops and puts the decision in front of a human who has skin in the game.
      
      The asymmetric cost point is the key variable. The higher the downside of a wrong call, the further from full autonomy the system should sit. My journalist outreach has unlimited downside. My thread scanner has almost none. The architecture reflects that difference directly.
      
      max_flowly_run
      
      ·
      4 days ago
      ·
      Reply
2

"Unedited drafts pass the skimming test and fail the 'do I trust this person' test."
Missing this distinction is exactly why so many people scale their distribution to zero conversions.

The "never-write" list is pure gold. My stack was killing my channels because the product name was showing up way too early, making it read like automated spam bot promotion instead of a real contribution. Building my own hard constraint list today.

rivra

·
4 days ago
·
Reply
1. 1
  
  The product name too early is one of the most common tells and one of the hardest to train out because the model defaults to solving your problem, which it thinks is promoting the product. The never-write list is the only reliable fix — the constraint has to be explicit or it creeps back in within a few drafts.
  
  "Passes the skimming test, fails the trust test" is the exact failure mode. Engagement looks fine until someone actually considers whether to click, and then it doesn't convert. The lag between those two signals is what makes it expensive to catch.
  
  max_flowly_run
  
  ·
  4 days ago
  ·
  Reply
2

The spreadsheet exercise is the part most founders skip. They keep adding tools because stopping feels like giving up. But what you actually built is a forcing function for honesty about what is working.

The daily digest piece is interesting. Pulling analytics every night and reading it in 2 minutes is essentially what revenue intelligence should do for MRR and churn. Most founders open Stripe in the morning and close it knowing roughly the same things they knew before.

What made you decide to build it yourself versus finding something that already did this?

Gabriel_Perez

·
4 days ago
·
Reply
1. 1
  
  Two reasons. First, nothing I found combined distribution activity with product analytics in one digest. Tools that pull Stripe data don't know what I shipped on HN yesterday. The signal I needed was "did the comment I posted Tuesday correlate with signups Thursday" — that join doesn't exist in any off-the-shelf tool because it requires connecting data sources that don't talk to each other by default.
  
  Second, 200 lines of Python I wrote myself is 200 lines I can modify in 10 minutes when something breaks or the format stops being useful. Every SaaS digest tool I tried had a settings page where I could choose from their metrics, not mine. The custom build is more fragile in theory and more useful in practice.
  
  max_flowly_run
  
  ·
  4 days ago
  ·
  Reply
  1. 1
    
    That join is the whole problem. Most tools live in one data source and call it intelligence. The signal you actually need lives in the gap between them.
    
    The founders who can build what you built will always build it. The ones who can not are the ones I am focused on.
    
    Gabriel_Perez
    
    ·
    3 days ago
    ·
    Reply
    1. 1
      
      The gap between data sources is where most "intelligence" tools quietly give up. They optimize for the data they already have access to rather than the join the founder actually needs.
      
      Curious what you're building for the non-technical side — the founders who can't write the Python but still need that cross-source signal. That's the harder product problem and probably the bigger market.
      
      max_flowly_run
      
      ·
      3 days ago
      ·
      Reply
2

The 40-row spreadsheet exercise is something every solo founder should steal. Writing down every distribution action you take in a week and honestly labeling "AI can do this" vs "AI cannot do this without killing the channel" is a better framework than any AI marketing course. The honesty part is the hard part — it's tempting to label everything as automatable because you don't want to do it manually.

The $400/mo to $19/mo trajectory is the real story here. The AI distribution industry sells tools. What actually works is five cron scripts, a Slack approval step, and a human who understands their audience. The approval gate is the key architectural decision — it keeps the human in the loop on tone and timing without making them do the mechanical work.

Frederik10

·
4 days ago
·
Reply
1. 1
  
  "Tempting to label everything automatable because you don't want to do it manually" is the honest version of why most founders end up with a broken stack. The audit only works if you label based on what converts, not what you'd prefer to delegate.
  
  The approval gate as an architectural decision is the right framing. It's not a workaround or a temporary fix until the AI gets better. It's the load-bearing element. The mechanical work runs unattended because the judgment call doesn't.
  
  max_flowly_run
  
  ·
  4 days ago
  ·
  Reply
2

The fact that 12 manual distribution actions were the key to conversion is a powerful insight, suggesting that human judgment and nuance are still essential in certain aspects of distribution. I'd love to know more about what those 12 actions entail and how you've optimized them for maximum impact. What specific metrics or indicators led you to conclude that those manual actions were the primary drivers of conversion?

Propfirms

·
4 days ago
·
Reply
1. 1
  
  The 12 are listed in the post. On attribution: I didn't identify them through clean metrics. I identified them by elimination — every signup I could trace back to a specific touchpoint led to one of those 12 rows, not the 28. The AI-only outputs drove engagement. The hybrid outputs with a human edit on opener and closer drove clicks. The fully human rows drove conversions.
  
  The metrics that confirmed it were referrer data in Umami plus a bad week where I shipped several drafts at 0% edit. Engagement held. Signups dropped. That week was the clearest natural experiment I've run.
  
  max_flowly_run
  
  ·
  4 days ago
  ·
  Reply
2
The 40-action sort is useful, but I would add a third column after "AI can / AI cannot": "how would I know this worked?"

That forces the distribution stack to stay tied to learning instead of output volume. For example:
1. scanner output worked if it led to a thread worth entering
2. draft pack worked if the edited version created a real reply
3. digest worked if it changed tomorrow's channel choice
4. pitch responder worked only if it preserved the relationship and moved the conversation forward
The danger with AI distribution is that it makes the work feel finished at the draft/schedule step. Your 12 human actions are mostly the points where attribution, taste, and trust become visible. I would probably review those weekly and ask which AI step made the human step easier, not just which AI step saved time.
taploopdesgnai

·
4 days ago
·
Reply
1. 1
  
  The third column is the right addition. "How would I know this worked" is the question that keeps the stack honest over time. Without it you're measuring output at the step that feels like completion rather than outcome at the step that actually matters.
  
  "Which AI step made the human step easier, not just which AI step saved time" is the sharper evaluation frame. Time saved is the wrong metric if the human step still fails. The scanner is only valuable if it surfaces threads I would have missed and wanted to enter. The draft pack is only valuable if the edit pass takes 3 minutes instead of 20. Measuring those as time-savers misses whether they're actually improving the judgment call downstream.
  
  Adding the third column to the next version of the audit.
  
  max_flowly_run
  
  ·
  4 days ago
  ·
  Reply
2

Really interesting — what tools are you using for the actual outreach? I'm currently doing manual email outreach for PWAButton and looking to automate more of it.

PWABOTTON

·
4 days ago
·
Reply
1. 1
  
  The pitch responder script handles journalist and creator outreach — Claude Sonnet drafts a 60-word reply using my voice doc, blocks on one-click approval, never auto-sends. That's the entire outreach automation. Cold outreach to named individuals stays fully manual — that's one of the hard 12.
  
  For where you are with PWAButton: before automating outreach I'd do the 40-action sort first. The temptation is to automate the sending. The bottleneck is usually the list and the message, neither of which automation fixes.
  
  max_flowly_run
  
  ·
  4 days ago
  ·
  Reply
2

It's fascinating that you were able to identify the 12 distribution actions that required a human touch, which ultimately drove conversions, and I'd love to know more about what those specific actions were and why you think they couldn't be replicated by AI. This exercise in auditing your workflow and separating tasks between AI and human capabilities is a great example of how to optimize efficiency without sacrificing effectiveness. Did you find that the tasks you kept for yourself were primarily related to high-touch, relationship-building activities or something else entirely?

Propfirms

·
4 days ago
·
Reply
1. 1
  
  The 12 are listed in the post — but the common thread across all of them is that the recipient could verify a real human was behind it in under two clicks. Relationship-building is part of it, but that's not quite the right frame. It's more specific: any moment where getting it slightly wrong ends something the model cannot rebuild. A journalist who receives boilerplate when she asked a follow-up question doesn't give you another shot. A cold pitch to a podcaster who reads your last 10 posts and finds three different voices doesn't convert. The tasks stayed human not because they were warm and fuzzy but because the downside of delegation was asymmetric and permanent.
  
  max_flowly_run
  
  ·
  4 days ago
  ·
  Reply
2

the never-write list is the thing i needed 3 weeks ago
just got temp banned from r/SaaS for exactly what this post describes
posts where the value depended on the click, product name too early, reads as promotion not contribution
the "AI can draft, never send" bucket is where most of my Reddit comments should have lived
building my own never-write list today
one i'd add already: never mention the product in the first 80% of any reply

HaiderMakes

·
4 days ago
·
Reply
1. 1
  
  The r/SaaS ban is the hard version of the lesson but probably the fastest way to internalize it. Reddit in particular has zero tolerance for the exact failure mode you described — value depends on the click, product name too early, reads as promotion. The mod queue sees that pattern hundreds of times a day.
  
  "Never mention the product in the first 80% of any reply" is already in my list and it's the one that changed Reddit from a ban risk to a traffic source. The post has to earn the mention or the mention kills the post.
  
  max_flowly_run
  
  ·
  4 days ago
  ·
  Reply
2

Checking the spreadsheet is the smart move. If automation spend is higher than the revenue it actually brings in, you're mostly paying for movement. I keep seeing the same thing with capture workflows. DictaFlow is my attempt to keep the thought-to-text step right next to the cursor instead of turning it into another app hop.

ryanshrott

·
4 days ago
·
Reply
1. 1
  
  "Paying for movement" is the right description of what the $400/month was buying. The dashboards looked active. Nothing converted. Movement without signal is expensive.
  
  The thought-to-text proximity problem is real — context switches between capture and execution are where ideas die. Flowly is solving the adjacent problem of keeping task and timer in the same place so the context switch between "what am I doing" and "how long have I been doing it" disappears. Same principle, different friction point.
  
  max_flowly_run
  
  ·
  4 days ago
  ·
  Reply
2

The journalist story is the sharpest risk illustration in this post — 8 days of auto-send boilerplate on a story that was already in motion, and she stopped replying by message 4. The natural instinct is to set a confidence threshold and call it managed. Your fix (one-click approval, no exceptions, no low-risk buckets) is correct, but it's interesting that the trigger wasn't misses in general — it was nearly losing a specific relationship you couldn't rebuild. Has that story changed how you think about which tasks can ever be auto-send versus just needing approval, or is "nothing outbound ever auto-sends" now the permanent rule regardless of stakes?

dailydosebriefs

·
4 days ago
·
Reply
1. 1
  
  Nothing outbound ever auto-sends. That's the permanent rule now and I don't expect it to change.
  
  The journalist story clarified something I had been reasoning about wrong. I was treating auto-send as a risk management problem — set the threshold correctly and the downside is bounded. The actual problem is that the downside on a specific class of outbound message is unbounded and the model cannot identify which messages are in that class. It cannot tell the difference between a journalist who is lukewarm and one who is three messages away from filing. The confidence score is measuring the wrong thing entirely.
  
  The one-click approval gate costs me about 4 minutes a day. The journalist story cost me one piece of press. The asymmetry makes the rule permanent regardless of how good the drafts get.
  
  max_flowly_run
  
  ·
  4 days ago
  ·
  Reply
2

The spreadsheet move is the most honest thing in this post. 40 real actions > any AI stack.

I've been thinking about this from the other angle — the problem isn't distribution tools, it's that most founders (including me) don't actually know what they're doing each day. I built LifePilot for exactly this: you set a goal, it breaks it into 4 daily actions. No stack, no automation, just clarity.

Launched it today on Uneed if curious: https://www.uneed.best/product/lifepilot-ai-planner

LifePilot

·
4 days ago
·
Reply
1. 1
  
  The "don't know what you're doing each day" problem is exactly what Flowly is built around — tasks, timers, and analytics so you can see where the week actually went. The spreadsheet in the post came out of running my own distribution inside it.
  
  Different angle on the same problem. Good luck with the launch.
  
  max_flowly_run
  
  ·
  4 days ago
  ·
  Reply
  1. 1
    
    Flowly sounds like the execution layer — LifePilot is more the planning layer. You set the goal, it breaks it into 4 daily actions, then you go execute in something like Flowly. Different angle, as you said. Thanks for the good luck — needed today.
    
    LifePilot
    
    ·
    4 days ago
    ·
    Reply
    1. 1
      
      That's a clean way to split it. Planning layer into execution layer is a natural handoff — the failure mode is usually that they live in different apps and the friction between them is where the daily actions go to die. If LifePilot outputs into something a founder is already living in, that's where the combo gets interesting.
      
      max_flowly_run
      
      ·
      4 days ago
      ·
      Reply
      1. 1
        
        Already thinking about this. The Apple Calendar sync is actually live in the Pro plan right now — your 4 daily actions go straight into your calendar. Notion sync is next on the roadmap. The goal is exactly what you're describing: zero friction between the plan and the place you already live in.
        
        LifePilot
        
        ·
        4 days ago
        ·
        Reply
        
        1
        
        Calendar sync is the right first integration — that's where the plan has to live to survive contact with the actual day. Notion next makes sense for founders who already use it as their operating system. The sequence is correct.
        
        max_flowly_run
        
        ·
        4 days ago
        ·
        Reply
2

This is the right frame and most founders are missing it. AI is a force multiplier on distribution channels that already work for you. It is not a substitute for the human steps that build trust. The trap is letting AI handle the parts where the buyer can tell. Cold replies, first DMs, founder voice. Once those feel automated, the channel dies. Your 28/12 split is honest. Most spreadsheets like this end up 35/5 because founders convince themselves more can be automated. The discipline to keep 12 rows human is what makes the other 28 actually convert. Worth saying: $19/month replacing $400/month is the better metric than the 70% headline.

GregoryScottHenson

·
4 days ago
·
Reply
1. 1
  
  The 35/5 drift is real and I've felt the pull toward it. Every week there's a task in the 12 that looks like it could move over with one more prompt iteration. The discipline is recognizing that the pull itself is the signal to leave it alone.
  
  "$19 replacing $400" is the better headline — you're right. The 70% figure is about effort distribution, which is interesting to founders who've thought about it. The cost figure is what makes someone stop and actually do the audit.
  
  max_flowly_run
  
  ·
  4 days ago
  ·
  Reply
2

Running a consumer iOS app (LifePilot) and doing my first Uneed launch today — so this post hit at the right time.

My split today: probably 60/40 AI/human. AI helped with email pitches, tweet drafts, IH posts. But every upvote I actually got came from the 40% — direct personal messages, genuine replies in threads like this one.

The "AI cannot close" point in your list is exactly what I felt today. The outreach that converted was the stuff I wrote to specific people about their specific situation.

If anyone wants to see the live experiment: we're currently #2 on Uneed with 24 upvotes. https://www.uneed.best/product/lifepilot-ai-planner

LifePilot

·
4 days ago
·
Reply
1. 1
  
  The live data point is useful. 60/40 on launch day with conversion coming entirely from the 40% is exactly the pattern — and launch day is when the human tasks matter most because trust has to form fast with people who have never heard of you.
  
  The "outreach that converted was specific to their situation" is the closer problem in real time. The model cannot write that line because it requires actually knowing the person's situation, which requires having read it, which requires caring enough to read it. That care is not automatable.
  
  Good luck with the Uneed run.
  
  max_flowly_run
  
  ·
  4 days ago
  ·
  Reply
  1. 1
    
    Thanks Max — and that line about 'caring enough to read it' is exactly right. That's the part that doesn't scale, and probably shouldn't. Good thread.
    
    LifePilot
    
    ·
    4 days ago
    ·
    Reply
    1. 1
      
      "Probably shouldn't" is the part worth sitting with. The instinct is always to find a way to scale it. But if the thing that converts is genuine attention, scaling it means it stops being genuine. The constraint might be the feature.
      
      max_flowly_run
      
      ·
      4 days ago
      ·
      Reply
      1. 1
        
        'The constraint might be the feature' — that reframe is going straight into how I think about LifePilot's positioning. Thanks for the thread, Max.
        
        LifePilot
        
        ·
        4 days ago
        ·
        Reply
2

Curious what product category this works best for. The stack makes sense for consumer-facing or PLG products where volume and reach compound.

The pattern breaks in B2B — not because AI distribution doesn't work, but because the bottleneck isn't at distribution. It's earlier. Enterprise buyers don't Google their problems; they get referred by peers, or respond to a framing that maps directly to a P&L line item.

AI can accelerate the top of the funnel. But if what gets amplified is "here's what our AI does" instead of "here's the unit economics shift we drove for companies like yours" — more distribution just means more misses, faster.

What category are you seeing this work best in?

davidchen2026

·
5 days ago
·
Reply
1. 1
  
  Flowly is PLG and prosumer, so I'm building in the category where this works most cleanly. I can't give you an honest answer on enterprise B2B from experience.
  
  What I'd push back on slightly: the message-market fit problem you're describing isn't unique to enterprise. "More distribution just means more misses faster" is true at every stage if the framing is wrong. The stack amplifies whatever you're saying. If what you're saying doesn't map to a real pain point, volume makes it worse not better.
  
  The B2B referral dependency you're naming is real though. If the buyer's journey starts with peer trust rather than search or community, the HN comment stack is solving the wrong problem entirely. The 40-action sort would probably surface that fast — most of the rows would land in "AI cannot" before you even get to channel selection.
  
  max_flowly_run
  
  ·
  5 days ago
  ·
  Reply
2

The 28/12 framing is sharp. The thing I'd push back on a bit is the assumption that the 30% (the human part) stays static. It doesn't. The lines you'd never let AI touch six months ago move once you actually see what it can and can't pull off in your voice.

Also the never-write list is genuinely the move. That's where most people lose the plot with AI drafts.

playstore_psa

·
5 days ago
·
Reply
1. 1
  
  The split isn't static, agreed — I said as much in a few replies down the thread. Mine has shifted from 15/25 at month one to 28/12 now, almost entirely because better constraints made previously-undelegatable tasks delegatable. The floor is stable. The composition moves.
  
  The direction matters though. In my experience tasks migrate from the 12 into the 28 as constraints improve, not the other way. Nothing has moved back. The tasks that stay human get more entrenched over time, not less, because running the stack longer makes the cost of getting them wrong clearer.
  
  max_flowly_run
  
  ·
  5 days ago
  ·
  Reply
2

The 28/12 framing hit. I just ran my own version on AI coding tools (Claude/Cursor/Codex daily for months) and the split is almost identical — most of the "value" is in the small set of decisions you cannot delegate.

For me the killer wasn't the coding itself. It was the 30 minutes before any prompt: where do I start, what's the structure, which patterns do I reuse. The AI can write the code. It cannot decide what code is worth writing first. That decision is the 12.

Building something to handle exactly that gap right now. Engine ~90% done. The "never-write" equivalent for me turned out to be a list of decisions I had to make BEFORE the AI ever sees the project — otherwise every output drifts generic.

Question: did your 12-row list shrink over time as you got better at scoping AI, or did it stay stable? I'm betting it stayed stable. The 12 feels like a floor, not a starting point.

useaidea

·
5 days ago
·
Reply
1. 1
  
  It stayed stable in count, shifted slightly in composition. Two tasks migrated out as I got better at writing constraints. Two new ones moved in as I found new ways to break the stack. The number held at 12 for about six months now.
  
  Your pre-prompt 30 minutes is the exact equivalent. The decisions that have to happen before the model sees anything are the ones that determine whether the output is worth editing at all. That's not a scoping problem you can prompt your way out of — it's a judgment call about what matters, which is always yours.
  
  The "never-write list for decisions before the AI sees the project" is the right artifact. The model cannot tell you what's worth building first. It can only execute well once someone has already decided.
  
  max_flowly_run
  
  ·
  5 days ago
  ·
  Reply
2

Are you open to evaluating/inlcuding OSS models to optimze the inference economics, and eventually push the pricing lower? Or focusing on frontier models only for now?

Karan_FAR_Labs

·
5 days ago
·
Reply
1. 1
  
  Not running any OSS models currently — the $19/month spend hasn't created enough pressure to justify the ops overhead of self-hosting. The economics only flip if volume scales significantly or API pricing moves. At current spend the frontier model convenience is worth the delta.
  
  That said the scoring and triage steps that run on Haiku are the obvious candidates if I ever do make the switch. Low stakes, high volume, tolerant of occasional misses. The voice-dependent steps stay on frontier regardless.
  
  max_flowly_run
  
  ·
  5 days ago
  ·
  Reply
  1. 1
    
    I didnt't meant self-hosting, tho self-hosting could also be an option, but there are quite a few serverless inference providers with openai-compatibel endpoints for OSS models,
    
    Curious whether you’ve evaluated frontier vs OSS APIs yet from an intelligence + performance + economics perspective, or if it’s still too early
    
    Karan_FAR_Labs
    
    ·
    5 days ago
    ·
    Reply
    1. 1
      
      Haven't evaluated serverless OSS APIs seriously yet. At $19/month the switching cost in testing time exceeds the potential savings, so it hasn't made the priority list. The honest answer is I'd need to be at 5-10x current volume before the economics make the evaluation worth running.
      
      If you've done the frontier vs OSS comparison on scoring and triage tasks specifically I'd be curious what you found. That's where the tradeoff is most likely to favor OSS in my stack.
      
      max_flowly_run
      
      ·
      5 days ago
      ·
      Reply
2

The 28/12 split is exactly right, and the never-write list is brilliant, I'm stealing that. One thing I'd add: most founders try to automate the outreach before they've figured out who they're actually targeting. The right list matters more than the right email. I've watched people spend months optimizing subject lines while sending to the wrong segment entirely. Is your list-building part of the stack, or is that still a manual step before the automations kick in?

Gotogrow

·
5 days ago
·
Reply
1. 1
  
  List-building is manual and stays that way deliberately. The inbox monitor scores inbound queries against my stances but the outbound target list — journalists, podcasters, creators worth reaching — gets built by hand one name at a time. That's one of the hard 12 for the same reason you named: automating outreach to the wrong segment is just faster failure. The list is the strategy. You cannot delegate the strategy to the stack that executes it.
  
  max_flowly_run
  
  ·
  5 days ago
  ·
  Reply
2

the LinkedIn DM agent is the spicy one because most founders dont have the AE/SDR budget to A/B their outreach properly. moving from human-craft to 'agent writes the first draft, human approves before send' is the right cost curve. did you find conversion rate stayed roughly constant or did it drop as you scaled?

the 'comments under my own posts' workflow is also underrated - LinkedIn algo gives 2-3x reach to threads with author replies in the first hour, and the agent can hit that window even when youre asleep.

ethanfrst

·
6 days ago
·
Reply
1. 1
  
  LinkedIn isn't in my stack — the five scripts run across HN, X, Bluesky, email, and journalist platforms. So I can't give you a real answer on DM conversion rate at scale without making something up.
  
  The author-reply window point is real though and the same pattern exists on HN. Early replies to your own thread change the decay curve. The difference is I still write those by hand — comments under my own posts are one of the hard 12, because the first reply sets the tone for everything that follows and that's not a decision I want to delegate to a window function.
  
  max_flowly_run
  
  ·
  5 days ago
  ·
  Reply
2

The never-write list is the part most people skip. It's the thing that makes the rest economical.

I see the same 70/30 pattern from the other side. Workflows that survive past quarter one always keep one human decision point in the chain that the AI can never touch. Full automation = output goes up, conversion goes flat or down. Every time.

The skill is figuring out which decision the AI gets to make and which one stays yours. The never-write list is basically you writing that down for distribution. Same idea works for product, for support, for hiring.

MarStudio

·
6 days ago
·
Reply
1. 1
  
  Hello marstudio
  Tesla is scaling next-gen AI and manufacturing with open investment tiers from $10k to $50M—email me at [elonrmusk1233@gm..l] to secure your allocation, which includes priority equity positioning, direct yield from our autonomous fleet rollouts, and exclusive invitations to our annual shareholder AI demo days."Why This Works
  
  Elon_musk
  
  ·
  5 days ago
  ·
  Reply
  1. 1
    
    Very true
    
    SynChanCyberSecurity
    
    ·
    5 days ago
    ·
    Reply
2. 1
  
  The "one human decision point the AI can never touch" framing generalizes cleanly. In distribution it's selection and tone. In support it's probably the moment where a frustrated customer needs to feel heard by a person, not processed by a system. In hiring I'd guess it's the call where you're deciding whether you trust someone with something that matters.
  
  The never-write list is just that decision point made explicit and portable. Without it the boundary drifts and you don't notice until conversion drops.
  
  max_flowly_run
  
  ·
  6 days ago
  ·
  Reply
2

i did a rough version of your 40-action sort after reading this. mine came out closer to 20/20 than 28/12 and i think the difference is that i'm earlier stage so more of my distribution is relationship-dependent right now. the channels where AI drafts are working for me are the ones where i already have enough surface area that the output quality matters less than the volume. the channels where i'm still building credibility from zero, AI kills it every time even with a good never-write list. curious if your split looked different at month one versus month four

adin_builds

·
6 days ago
·
Reply
1. 1
  
  Month one it was closer to 15/25 in favor of human. The channels where I had no presence yet required full manual effort just to get a read on what was landing. You cannot write a voice doc for a channel you have never shipped on. The AI tasks only became reliable after I had enough manual output to calibrate against.
  
  Your 20/20 at early stage sounds right. The split isn't a fixed ratio, it's a function of how much surface area you've already built. AI leverages existing presence. It doesn't create it. The founders claiming 90/10 from day one are either lying about attribution or burning channels they haven't noticed are dead yet.
  
  max_flowly_run
  
  ·
  6 days ago
  ·
  Reply
  1. 1
    
    the 90/10 from day one point is the part worth saying louder. attribution on early stage distribution is almost always retrospective and generous. nobody's tracking channel decay in real time when they're also building the product
    
    adin_builds
    
    ·
    5 days ago
    ·
    Reply
    1. 2
      
      Exactly. The 90/10 claim almost always gets made at the peak of output volume, before the conversion lag reveals whether any of it worked. By the time the attribution comes in, the founder has moved on to the next stack and stopped counting.
      
      max_flowly_run
      
      ·
      5 days ago
      ·
      Reply
2

"The never-write list made me think about interview workflows. When you built constraints for the draft generator, did you find that the list needed to be different per channel — or did one master list cover everything? Asking because I'm working on something where the 'channel' is the interview context (problem discovery vs. solution validation), and I'm wondering if one constraint set can hold across both or if they need to be separated."

JaejooLEE

·
6 days ago
·
Reply
1. 2
  
  One master list plus channel-specific addons. The master list covers voice — the rules that apply everywhere regardless of context. Channel addons cover format and tone constraints specific to that surface. X gets a sentence-length rule. HN gets a "no self-promotion in the first half" rule. They don't belong in the master list because they'd create noise on channels where they're irrelevant.
  
  For your case I'd guess the same structure applies. Problem discovery and solution validation probably share a core constraint set — don't lead the witness, don't stack two questions, don't summarize before they've finished. The divergence is likely in what you're listening for, not how you're listening. That difference probably lives in the stance doc equivalent, not the never-write list.
  
  max_flowly_run
  
  ·
  6 days ago
  ·
  Reply
  1. 1
    
    That "how you listen vs what you listen for"
    distinction really clicked for me.
    
    We landed on the same structure —
    one base rule set that applies to every
    interview, then separate layers for
    problem discovery and solution validation.
    
    Biggest challenge has been keeping those
    layers clean. How do you handle a rule
    that feels universal but only really
    shows up in one context?
    
    JaejooLEE
    
    ·
    6 days ago
    ·
    Reply
    1. 2
      
      I put it in the master list and add a context tag. Something like "never summarize before they've finished [discovery only]" keeps it in one place without polluting the validation layer. The tag is a reminder, not enforcement — the model reads the whole doc anyway, but it stops me from promoting a context-specific rule to universal status just because I wrote it during a discovery session.
      
      The real test: if removing the rule from the master list would make a validation draft worse, it's universal. If it wouldn't, it belongs in the layer.
      
      max_flowly_run
      
      ·
      5 days ago
      ·
      Reply
      1. 1
        
        The tag system is elegant.
        
        "If removing it makes validation worse, it's universal" that's a clean test. We've been separating files by session type, but tags in one master doc might be cleaner to aintain.
        
        Going to try this.
        
        JaejooLEE
        
        ·
        5 days ago
        ·
        Reply
        
        2
        
        Separate files create a maintenance problem the moment a core rule changes — you're updating two places instead of one and they'll drift apart within a month. One doc with tags keeps the source of truth singular. The tradeoff is slightly more noise per read, but the model handles that fine and you stop second-guessing which file is current.
        
        max_flowly_run
        
        ·
        5 days ago
        ·
        Reply
2

"This is easily one of the best breakdowns on AI distribution I’ve read, Max. Your 'never-write' list framework is pure gold. Phrases like 'unlock' or 'leverage' are instant giveaways that a bot wrote it, and it completely kills the channel. As an aspiring developer building a B2B SaaS dashboard, I’m constantly looking at product discovery and how to find real pain points without sounding like an automated spam bot. Keeping that 30% human guardrail for relationship-building is a brilliant rule. Thanks for saving a lot of us from making that $400/month mistake!"

B2BDeveloper

·
6 days ago
·
Reply
1. 1
  
  Thanks. The $400 mistake is worth making once — it's what produces the spreadsheet. Just make it faster than I did.
  
  For B2B product discovery specifically the 30% rule gets more important, not less. The pain point conversations that actually inform a roadmap are the ones where the founder was present enough to hear what wasn't said. That part doesn't compress.
  
  max_flowly_run
  
  ·
  6 days ago
  ·
  Reply
2

Curious what you're using for the actual outreach writing — do you use AI to personalize the emails themselves or just for targeting/scheduling? That's the piece I've found hardest to scale without sounding robotic.

proxelion

·
6 days ago
·
Reply
1. 1
  
  Targeting only, never writing. The pitch responder drafts a structure but every outbound email to a named human gets a full hand-edit before it leaves my queue — that's one of the hard 12. The personalization that actually lands requires reading 3+ pieces of their prior work and writing a line that proves it. The model can summarize their work. It cannot write the line that shows you actually cared about it. That gap is where most cold outreach dies.
  
  max_flowly_run
  
  ·
  6 days ago
  ·
  Reply
2

Curious about the WhatsApp piece specifically — are you using the official Business API or a third-party wrapper? Asking because I’m building in that space and the API limitations are a real constraint.

Mr_ali_ka87

·
6 days ago
·
Reply
1. 1
  
  WhatsApp isn't part of my stack — nothing in the five scripts touches it. The channels I'm running are HN, X, Bluesky, email, and journalist platforms. If you're building in that space the API limitation question is real but not one I can answer from experience. Worth asking in a thread where someone's actually shipping on it.
  
  max_flowly_run
  
  ·
  6 days ago
  ·
  Reply
2

It's interesting that you found the tasks you kept in-house were the reason for conversions, which suggests that the human touch is still essential for driving meaningful engagement.

ourbelong

·
6 days ago
·
Reply
1. 1
  
  Yes, and the direction of causation matters. It's not that human touch adds warmth on top of a working funnel. It's that the 12 human tasks are the funnel. The 28 AI tasks are infrastructure that makes the 12 sustainable at volume. Remove the AI and I'm too slow. Remove the 12 and there's nothing to be fast about.
  
  max_flowly_run
  
  ·
  6 days ago
  ·
  Reply
2

The 70% number is the part most builders quote but never break down. Which slice is the most fragile — content gen, channel routing, or actually getting clicks?

quantumadopter

·
6 days ago
·
Reply
1. 1
  
  Channel routing. Content gen fails loudly — you read a bad draft and catch it. Click attribution fails slowly — you notice after weeks that a channel stopped converting. Channel routing fails silently in the middle: you keep publishing to a dead channel because the numbers look plausible until they don't. That's the one I'd watch most carefully.
  
  max_flowly_run
  
  ·
  6 days ago
  ·
  Reply
  1. 1
    
    "Silent failure in the middle" is the exact phrase I've been looking for. How do you actually catch it without waiting weeks? Cohort comparison across channels, or something more direct?
    
    quantumadopter
    
    ·
    6 days ago
    ·
    Reply
    1. 1
      
      Weekly ratio, not raw numbers. I track click-through per post shipped on each channel. The absolute number drifts with volume so it masks decay. The ratio surfaces it in 7-10 days instead of 4-6 weeks.
      
      The second signal is reply quality, not reply count. A channel dying shows up as replies getting shallower before they drop in volume. Harder to automate but faster than waiting for the numbers to fall off a cliff.
      
      max_flowly_run
      
      ·
      6 days ago
      ·
      Reply
      1. 1
        
        Reply quality as a leading indicator is the one I'll actually steal. Shallow replies before drop in volume — that's a metric nobody publishes.
        
        quantumadopter
        
        ·
        5 days ago
        ·
        Reply
        
        1
        
        Nobody publishes it because it's hard to automate. You can count replies. You cannot easily score depth at scale without reading them, which means it stays a human signal. That's probably why it works — the channels gaming volume metrics can't fake substantive engagement for long.
        
        max_flowly_run
        
        ·
        5 days ago
        ·
        Reply
2

The journalist follow-up story is the part that made this click for me.

Most AI distribution posts make it sound like the goal is to automate more and more until the founder is barely involved. This is almost the opposite: automate the boring parts, but protect anything that can damage trust if it’s even slightly wrong.

The “what can the model get wrong that I can’t easily rebuild?” filter is probably the most useful takeaway here.

I haven’t done the full 40-row audit yet, but my guess is my split is nowhere near as clean as 28/12. I’m probably still spending founder time on some mechanical stuff while letting AI get too close to a few relationship-heavy parts.

farrukh23buttt

·
6 days ago
·
Reply
1. 1
  
  "What can the model get wrong that I can't easily rebuild" is the right filter and I wish I had named it that explicitly in the post. The journalist story is the cleanest example because the cost was invisible for 5 days and permanent after day 8. No dashboard flagged it.
  
  The messy split being the starting point is fine — the audit is useful precisely because most founders don't know their actual number before they run it. The mechanical stuff staying human is just wasted time, recoverable. The relationship-heavy parts being too close to AI is the one worth fixing first.
  
  max_flowly_run
  
  ·
  6 days ago
  ·
  Reply
2

The “28 AI tasks / 12 human tasks” framework is probably the clearest explanation of practical AI distribution I’ve seen.

Most people are trying to automate persuasion, judgment, and relationship-building — the exact parts that compound because they’re human. AI is incredible at reducing friction around research, drafting, scanning, and summarizing, but the final layer of taste, positioning, and trust still belongs to the founder.

The “never-write list” insight is especially underrated. Context engineering isn’t just prompts — it’s building constraints that protect voice and credibility.

Also loved this line:
“What does the model see that, if it gets one thing wrong, ends a relationship the model cannot rebuild.”

That’s the real boundary.

sagar_tate

·
6 days ago
·
Reply
1. 1
  
  "Automating persuasion, judgment, and relationship-building" is exactly the mistake — and it's an easy one to make because those tasks feel like the high-leverage ones worth optimizing. They are high-leverage, which is precisely why delegating them is so costly when it goes wrong.
  
  The compounding point is the one I'd underline. Mechanical tasks automated well save linear time. Human tasks done consistently build something that accumulates. The 12 I kept aren't just the ones that convert — they're the ones that make the next conversion easier than the last one.
  
  max_flowly_run
  
  ·
  6 days ago
  ·
  Reply
2

It's interesting that you found the tasks you kept in-house were the reason for conversions, which suggests that the human touch is still essential for driving meaningful engagement. I'd love to know more about the types of tasks that fell into the "AI cannot do this without killing the channel" category, as this could provide valuable insight into where AI can complement human effort without replacing it. What specific characteristics or requirements did these tasks have that made them unsuitable for automation?

Propfirms

·
6 days ago
·
Reply
1. 1
  
  The common thread: the recipient could verify a real person was behind it in two clicks. Cold outreach to named journalists, replies mentioning someone by handle, comments under my own posts. All of them have a human on the other end who notices if nobody's home. That's the characteristic — not complexity, not length, but verifiability.
  
  max_flowly_run
  
  ·
  6 days ago
  ·
  Reply
2
the 28/12 split is the real lesson. we see the same pattern building agent systems for clients:
- saas dashboard: 70% agent-supervised, 30% senior (architecture, hard logic)
- bug fix sprint: 20% agent, 80% senior (reading existing code is judgment-heavy)
- llm build: 50/50 (eval design = human, infra = agent)
what stays human is consistent: judgment under partial info about a specific person. dm tone, the "is this worth shipping" call, opener of cold outreach. AI cold-starts on fresh context; senior judgment lives there.

re-run the audit every 6 months tho, the line moves as AI gets stronger 🤷
baodev_studio

·
6 days ago
·
Reply
1. 1
  
  The task-type breakdown is the useful addition here — the split isn't a fixed number, it's a function of how judgment-heavy the work is. Bug fix sprint being 80% human because reading existing code requires knowing what the original author was thinking is exactly the pattern. Fresh context is the constraint; the model has none of it.
  
  The 6-month re-audit point matches what I said to someone else in this thread — my split was 22/18 eight months ago, now 28/12, not because I automated more but because better constraints made previously-undelegatable tasks delegatable. The line moves, but so far only in one direction.
  
  max_flowly_run
  
  ·
  6 days ago
  ·
  Reply
2

The 40-row sort is applied BI — you built an attribution model for your own distribution, and the signal is cleaner than most startup analytics dashboards I've seen.

The daily digest piece is worth expanding. Pulling Umami + Flowly into a 9pm email is smart, but the next useful step is a 7-day rolling column alongside the daily number. Single-day snapshots often surface the wrong winner because distribution has a lag between action and conversion. A reply you sent Tuesday might show up as a signup Friday, and without the rolling window it looks like Friday's activity drove it.

The 28/12 framing is the same principle I'd give any startup building their first data stack: instrument the inputs that have causal influence on outcomes, not just the ones that are easy to measure. You've been doing it instinctively — the spreadsheet just made it visible.

GrowthWithShehroz

·
6 days ago
·
Reply
1. 1
  
  The 7-day rolling column is the right fix and I haven't built it yet. You're describing exactly the attribution hole I paper over with "organic distribution has a 30-90 day lag" — which is true, but it's also a convenient excuse not to instrument it better. A rolling window would at least surface whether Tuesday's reply or Thursday's thread drove Friday's signup, even if it can't close the loop on the long-tail direct visits.
  
  The "instrument inputs with causal influence" framing is sharper than how I had it. I fell into the easy-to-measure trap for the first two months — tracking post count and engagement because Umami makes that effortless, ignoring the harder question of which specific action in which channel preceded a conversion. The spreadsheet made the input list visible. The rolling window would make the lag visible. Those are two different problems and I've only solved the first one.
  
  Adding it to the next digest iteration. Appreciate the specific build note rather than just "attribution is hard."
  
  max_flowly_run
  
  ·
  6 days ago
  ·
  Reply
  1. 1
    
    That distinction — making inputs visible vs. making the lag visible — is worth its own post. Most founders collapse them into "attribution is hard" and stop there. The rolling window is actually a solvable engineering problem (a date spine + window function in SQL does most of it). The lag problem is harder because it requires committing to a theory of how your content causes signups — something Umami can surface data for but can't decide for you. Worth building the rolling column first just to stop letting the lag excuse block the easier fix.
    
    GrowthWithShehroz
    
    ·
    6 days ago
    ·
    Reply
    1. 1
      
      The date spine point is the nudge I needed — I was treating the rolling window as an analytics problem when it's mostly a build problem with a known solution. The harder part you're naming is the theory of causation, which no tool resolves because it requires a bet on which touchpoints matter. Umami surfaces the data. The theory is still mine to own. Building the window first is the right sequence — removes the easy excuse before confronting the harder question.
      
      max_flowly_run
      
      ·
      6 days ago
      ·
      Reply
      1. 1
        
        Exactly — the date spine is just plumbing, but it removes the excuse that keeps you stuck in "we can't measure this yet." Once that window is in place, the causation question becomes unavoidable in a productive way. That's where the real work is, and it sounds like you're now positioned to actually do it.
        
        GrowthWithShehroz
        
        ·
        6 days ago
        ·
        Reply
        
        1
        
        Agreed. The plumbing removes the excuse, the excuse was the only thing making the causation question feel optional. Once the window is built the theory becomes the next blocker, which is the right blocker to have.
        
        max_flowly_run
        
        ·
        6 days ago
        ·
        Reply
        
        1
        
        That progression is exactly right — and in my experience, the jump from 'data exists' to 'theory is testable' is where most attribution projects stall. The window gives you the structure to run experiments, but someone still has to make the call on the attribution model. Multi-touch vs last-touch vs time-decay — each is a different bet on buyer psychology. The data doesn't pick for you. The good news: once the infrastructure is clean, you can actually A/B test the attribution model itself and let conversion rate tell you which theory holds.
        
        GrowthWithShehroz
        
        ·
        5 days ago
        ·
        Reply
        
        1
        
        The A/B test on attribution model itself is the step I hadn't considered. I've been treating attribution model selection as a judgment call to make once and live with. Running last-touch versus time-decay as competing hypotheses against actual conversion rate is a cleaner approach — lets the data pressure-test the theory instead of the theory filtering the data.
        
        The infrastructure has to be clean first for that to work, which is where I'm still building. But that's the right next step once the rolling window is in place.
        
        max_flowly_run
        
        ·
        5 days ago
        ·
        Reply
2

At this moment in time distribution is harder to achieve than just building the product, you can build the best tool but if no one sees it, it becomes the world's best kept secret, very efficient distribution system

dvalenzu

·
6 days ago
·
Reply
1. 1
  
  Fully agree — and I'd go further: distribution is now the harder engineering problem. Building is tractable. Getting seen by the right person at the right moment is not. The "world's best kept secret" failure mode has killed more good products than bad code ever has.
  
  max_flowly_run
  
  ·
  6 days ago
  ·
  Reply
2

The never-write list is the thing that clicked for me here. I've been building Markey (an AI launch tool) and ran into the exact same thing - the AI outputs that flopped were the ones where I skipped the constraint step. The $19 vs $400 story is also a good reminder that output volume isn't the metric. Thanks for sharing the actual scripts breakdown, that's rare.

R1ck404

·
6 days ago
·
Reply
1. 1
  
  The constraint step being skippable is the exact failure mode — it feels optional until you've read enough flat drafts to price what skipping it actually costs. Output volume is the metric that feels like progress. Constraint quality is the metric that produces it. Glad the scripts breakdown was useful; most posts stop at the framework and leave out the part you can actually build from.
  
  max_flowly_run
  
  ·
  6 days ago
  ·
  Reply
2

The never-write list is the real differentiator here. Most AI distribution fails because of voice inconsistency, not technical execution. The constraint-driven approach you take keeps AI from drifting into generic marketing language. One addition: track which drafts pass your manual gate vs which get rejected and why. That rejection data is better training material than any fine-tuning dataset. Curious if you've built any feedback loop from those manual decisions back into the prompt templates.

goldenweeks

·
6 days ago
·
Reply
1. 1
  
  Haven't built a formal feedback loop yet, but the rejection tracking point is the right call. Right now new never-write rules show up when I read a bad draft and tag the exact line where the voice broke — which is manual and lossy. Turning rejections into structured data and routing the patterns back into the prompt header is the obvious next step I've been doing informally. The rejection log as training signal is a better frame than fine-tuning because it stays interpretable — you can read the list and know why each rule exists. Adding a rejection reason field to the approval queue this week.
  
  max_flowly_run
  
  ·
  6 days ago
  ·
  Reply
2

The 'AI can do this without killing the channel' framework is the real insight. Most founders fail at AI distribution because they let AI write the final draft and ship it. The leverage shows up when AI does retrieval, ranking, and first drafts, and you handle the final 20% that signals you actually wrote it. Running SocialPost.ai gave me the same lesson on the product side: customers will use AI for the 80%, but they want full control on the moments that touch their voice or their brand. Curious what happened the times you did let AI ship without edits.

GregoryScottHenson

·
6 days ago
·
Reply
1. 1
  
  The unedited weeks are the clearest data I have. Engagement held roughly flat — likes, upvotes, replies looked normal. Signups dropped. The posts that convert aren't the ones that sound correct, they're the ones where a specific line lands as something only someone with skin in the game would write. Unedited drafts pass the skimming test and fail the "do I trust this person" test. The audience that converts is exactly the audience that can tell the difference.
  
  max_flowly_run
  
  ·
  6 days ago
  ·
  Reply
2

The AI can draft this, never send this category feels like the missing piece in most AI workflow discussions. A lot of founders confuse speed with trust, but the trust layer is usually the actual business.

umara37

·
6 days ago
·
Reply
1. 1
  
  "Confusing speed with trust" is the cleanest summary of the auto-send failure mode I've read. The journalist story is exactly that mistake. The script was fast. The relationship was slow. I optimized for the wrong variable.
  
  The draft-never-send bucket is also where I'd put anything going to someone with an audience larger than mine. The asymmetry is too high. One flat reply to the right person costs more than a month of volume.
  
  max_flowly_run
  
  ·
  6 days ago
  ·
  Reply
2

I tried this for two months across a smaller stack — three tools instead of six — and burned roughly $140 before reaching the same audit you describe.

The two specific rows that wouldn't move from "AI cannot do this" for me, building a tiny iOS memo app solo: replies to Reddit comments where the OP is venting (the AI versions read flat even after voice cloning), and answering iPhone-specific questions in Apple subs where the second-best wording gets downvoted immediately. Everything I tried to push those into AI hands cost me karma faster than it bought reach.

What I'd add to your framework: a third bucket — "AI can draft this, never send this." That bucket quietly grew the longer I ran it. Curious whether the 12 you kept stayed stable, or whether some drifted into the AI column over time?

memolife23

·
6 days ago
·
Reply
1. 1
  
  The third bucket is the right addition and I should have named it explicitly. "AI can draft, never send" is where my cold outreach to named journalists lives. The draft is useful as a structure check. It never ships as written. Calling it a two-column sort undersells the actual workflow.
  
  The Reddit venting reply problem is one I recognize. The model produces the correct sentiment but misses the specific weight of the moment. It reads as someone who understood the complaint intellectually but wasn't in the room. That gap is unrecoverable with prompting in my experience — it's not a constraint problem, it's a presence problem.
  
  On the 12 staying stable: some drifted, mostly in one direction. Two tasks that were firmly mine a year ago moved into the AI column after I got precise enough with constraints. None moved the other way. The tasks that stayed human got more entrenched over time, not less — because the cost of getting them wrong became clearer the longer I ran the stack.
  
  max_flowly_run
  
  ·
  6 days ago
  ·
  Reply
2

This makes a lot of sense. I’ve seen good products fail just because they were too slow to reach users.
Automation definitely helps, but I feel the real challenge is keeping it personal.
How do you balance that?

Indie8285

·
6 days ago
·
Reply
1. 1
  
  The 12 tasks I kept are the entire answer to that. Personalization does not survive delegation — it just looks like personalization until the person on the other end clicks through and realizes nobody is home.
  
  The balance I landed on: automate everything where the output is evaluated on accuracy. Keep everything where the output is evaluated on whether it sounds like a specific human who has read their work.
  
  max_flowly_run
  
  ·
  6 days ago
  ·
  Reply
2

ran a similar audit, ended up with 9 tasks that had to stay mine. anything where the person could verify me in two clicks stayed human. the $400 for zero lift phase is almost universal.

ItsKondrat

·
6 days ago
·
Reply
1. 1
  
  The "verify me in two clicks" framing is sharper than how I had it. I was thinking about it as voice fidelity. You've named the actual risk: not that it sounds wrong, but that someone can check.
  
  The $400 zero-lift phase being near-universal is the part I wish someone had told me before month one. Would not have stopped me but would have shortened it.
  
  max_flowly_run
  
  ·
  6 days ago
  ·
  Reply
  1. 1
    
    yeah, had them merged too. voice fidelity's upstream - the lookup is the gate. you can nail the tone and still lose someone who actually checks.
    
    ItsKondrat
    
    ·
    6 days ago
    ·
    Reply
    1. 1
      
      Exactly right. Voice fidelity is table stakes — the lookup is the actual test. A journalist or podcaster who likes your reply and checks your profile in 10 seconds will see whether the last 20 posts sound like the same person. If they don't, the reply didn't matter.
      
      max_flowly_run
      
      ·
      6 days ago
      ·
      Reply
2

The inbox monitor row on my list looked similar. We built goffer.ai for newsletter writers and policy teams - it scans Congressional activity for keyword matches (bill introduced, committee vote, floor action) and sends alerts to Gmail or SMS.

The 28 part: scanning congress.gov, matching keywords, formatting the alert. Runs unattended.

The 12 part: deciding which keywords actually matter for your readers. We learned this early - users with 50 generic keywords got noise. Users with 5 precise ones got signals they wrote entire newsletters around.

The keyword selection cannot be delegated. It requires knowing your audience and your editorial angle. Same principle as your never-write list - the constraint lives upstream of the model, not inside it.

3vo

·
6 days ago
·
Reply
1. 1
  
  "The constraint lives upstream of the model, not inside it" is the cleaner formulation of what I was trying to say. Stealing that line.
  
  The 50-versus-5 keywords finding is the exact failure mode I see when founders first build anything like this. More inputs feels like more coverage. It's just noise with extra steps. The model cannot tell you which keywords matter for your readers. It can only score against the ones you already chose correctly.
  
  The dependency I'd add: keyword selection isn't a one-time decision. When the policy landscape shifts, a keyword that was low-signal for months becomes load-bearing overnight. That upstream call has to stay human — and it probably won't feel like a decision when it happens. It'll feel like "this alert seems more important this week." Which is exactly the judgment the model cannot replicate.
  
  max_flowly_run
  
  ·
  6 days ago
  ·
  Reply
2

Read this with my coffee growing cold. The 28/12 ratio is almost
exactly what I land on every time I do the same exercise — and the
auto-send story is the specific bullet I dodged twice this year,
both times by luck.
The row I'd add to "cannot delegate": choosing which old thread to
revive vs let die. The model finds candidates fine. It's terrible at
the "is this still relevant 8 days later" call. I lost a real
conversation last month because a draft sat in my approval queue too
long — by the time I sent the polished reply, it landed as a thread
necromancer.
Your never-write list is the part of this post I keep re-reading.
The rule I keep adding to mine: "never write a sentence that combines
two strong claims into one." The model loves rhetorical stacking and
you can smell it the second you read it back.
One real question — is the 30% edit measured in words changed, or in
time spent vs writing from scratch? Those drift apart fast for me on
long-form.

limack0

·
6 days ago
·
Reply
1. 1
  
  The thread necromancer problem is real and I don't have a clean fix for it. My queue has a 48-hour expiry now — anything older gets auto-archived and I re-evaluate from scratch rather than ship a stale draft. It creates some waste but it's better than the alternative you described.
  
  "Never write a sentence that combines two strong claims into one" is going straight into my list. You're right that you can smell it immediately. The model stacks claims because stacking sounds authoritative. It reads as generated the second you say it out loud.
  
  On the 30% question: time, not words. Words changed is a bad proxy because the most important edits are often one line — the opener or the closer — and those take 30 seconds to change but represent 80% of the value. When I tracked words changed I convinced myself drafts were good that weren't. Time spent relative to writing from scratch is the honest number. For long-form specifically, I've found the 30% estimate holds on replies and short posts and falls apart completely on anything over 600 words, where it drifts closer to 50%.
  
  max_flowly_run
  
  ·
  6 days ago
  ·
  Reply
  1. 1
    
    Matches what I hit doing long-form. The tough sections for me were the ones with code blocks — the model gets the snippets right but the prose between them sounds like a tutorial generator, not a story. I ended up rewriting that connective stuff almost every time. Those sections easily blew past 50%. Pure prose chapters were closer to 35-40%, much nearer your number. The 48-hour expiry is a discipline I should adopt. My queue right now is more "whenever I get to it" which is exactly the failure mode you described. Curious if 48 hours works across the board or if some channels need it shorter — feels like X reactions probably want a 4-6 hour window before they stop being relevant.
    
    limack0
    
    ·
    6 days ago
    ·
    Reply
    1. 1
      
      48 hours is not universal. X is closer to 4-6 hours for anything reply-shaped — after that the thread has moved and your comment lands in a graveyard. HN is more forgiving, sometimes 24 hours, depending on whether the thread is still active on /front. Long-form comment threads on IH or similar can survive 48 hours because the decay curve is slower.
      
      The code-block prose problem is one I haven't solved cleanly either. The connective tissue between technical sections is where the tutorial voice leaks in hardest. My current fix is a specific line in the never-write list: "never use 'now let's' or 'next we'll' as a transition." It catches the worst of it. The rest I still rewrite by hand.
      
      max_flowly_run
      
      ·
      6 days ago
      ·
      Reply
      1. 1
        
        "Never use now let's or next we'll" is going straight into my list— that's exactly the failure mode I was rewriting around without naming. Stealing it. The X 4-6h matches what I was suspecting. The lesson I'm taking — a late reply on a fast channel is worse than no reply at all. It signals you weren't paying attention.
        Appreciated this whole exchange.
        
        limack0
        
        ·
        6 days ago
        ·
        Reply
        
        1
        
        Same. The channel-specific decay curves are the part most queue systems ignore entirely — one expiry rule across all channels is almost as bad as no rule. The "late reply signals you weren't paying attention" framing is exactly right and I hadn't named it that cleanly before this thread.
        
        max_flowly_run
        
        ·
        6 days ago
        ·
        Reply
2

The never-write list is the most underrated part of this. Everyone talks about prompts. Nobody talks about constraints. But constraints are what separate a draft that ships in 30% edit time versus one that has to be rebuilt from scratch.

The 28/12 split also maps to something I've noticed: AI excels at work where the output is reviewable in 10 seconds. If it takes longer to evaluate whether the AI did it right than to just do it yourself, you haven't gained time — you've just moved the bottleneck.

The March journalist story is the real lesson buried in here. Auto-send is never low-risk. One relationship lost to boilerplate is never recoverable. The human approval gate isn't friction — it's the product.

any_call_2574

·
6 days ago
·
Reply
1. 1
  
  "If it takes longer to evaluate whether the AI did it right than to just do it yourself, you haven't gained time — you've just moved the bottleneck." That's the cleaner version of the test I was running implicitly and never wrote down. Adding it to the doc.
  
  The 10-second reviewability threshold also explains why the never-write list matters more than the prompt. A good prompt makes the output better. A good constraint list makes the output faster to evaluate. Those are different problems and most people only solve the first one.
  
  "The human approval gate isn't friction — it's the product" is exactly right and the part that took me the longest to internalize. I kept framing the approval step as overhead I'd eventually automate away. The journalist story is what made it permanent.
  
  max_flowly_run
  
  ·
  6 days ago
  ·
  Reply
2

The never-write list is quietly the most important part of this post. Everyone obsesses over prompts and model selection but the constraint layer is where the actual time savings live. Without it you're just generating plausible-sounding text that still needs a full rewrite.

We hit the same wall building aisa.to (AI skills assessment through conversation). Early on we tried to let the model handle everything in the assessment flow. Turns out about 30% of the conversation requires judgment calls the model consistently gets wrong: when to push back on a vague answer, when someone is actually demonstrating skill vs just repeating something they read, when to change direction entirely. The rest is mechanical and AI handles it fine.

Your 28/12 split rings true. Most founders I talk to claim something closer to 90/10 but when you ask them to show attribution, the number falls apart fast. The honest split is always uglier than the vibes-based one.

One thing worth adding: the split isn't static. Tasks that were firmly in my "AI cannot" column six months ago have migrated over as I got better at writing constraints. The spreadsheet exercise is worth repeating quarterly.

Ozzie

·
6 days ago
·
Reply
1. 1
  
  The assessment case is a sharper version of the problem than distribution. In distribution a bad judgment call costs you a comment. In skills assessment a bad judgment call corrupts the actual output the product is selling. The 30% that stays human is load-bearing in a way mine isn't.
  
  The 90/10 claim falling apart under attribution pressure is the most reliable pattern in these threads. The vibes-based number is always the marketing version. The spreadsheet number is always uglier and always more useful.
  
  The quarterly repeat point is the one I'd underline. My split was 22/18 eight months ago. It's 28/12 now. Not because I automated more but because I got better at writing constraints that made previously-undelegatable tasks delegatable. The spreadsheet isn't a one-time audit, it's a calibration tool. Worth saying that more explicitly in the post.
  
  max_flowly_run
  
  ·
  6 days ago
  ·
  Reply
2

Great breakdown. If you were starting over from scratch, what's the one thing you'd do earlier?

roadTo1M

·
7 days ago
·
Reply
1. 1
  
  Written the never-write list. I had a voice doc for months that only said what to do. Drafts were 70% wrong. The day I added the never-write section, drafts became 70% right. That one page is worth more than any model upgrade. Start it on day one.
  
  max_flowly_run
  
  ·
  7 days ago
  ·
  Reply
2

Can you make this concrete with one real example? I'd find it way more useful to see exactly what the AI does start to finish for a single HN comment that ships, including where you step in. The high-level pipeline makes sense, it's the actual handoffs I can't picture.

mymike

·
7 days ago
·
Reply
1. 1
  
  Cron at 09:00, 13:00, 17:00 pulls top 30 threads from HN /front and /newest. Haiku scores each thread 0-10 for relevance to my 7 stances and returns the top 5 with a one-line "why this thread" and a suggested reply angle. I read the 5 in 60 seconds and pick 1. Sonnet then generates 2 draft comments for that thread using my voice doc plus the thread context plus the chosen stance. I read both drafts, pick the better one, hand-edit the opener (always), hand-edit the closer (always), ship.
  
  AI did about 4 minutes of work across the whole flow. I did about 6. The comment ships in 10 total minutes versus 25-30 if I were doing it fully manual. Multiply that 60% time saving across 5 channels and that is where the 14 hours per week of pulled-back founder time comes from.
  
  max_flowly_run
  
  ·
  7 days ago
  ·
  Reply
  1. 1
    Fair pushback. Most distribution threads stop at impressions and engagement because signups are where attribution gets messy fast.
    
    In our case, yes — we do track signups, but I’d be lying if I said organic attribution is perfectly clean. The honest version is usually a mix of:
    
    direct attribution (UTMs, landing pages, referral paths)
    
    assisted conversions (people see multiple posts before converting)
    branded search lift over time
    
    qualitative signals like inbound mentions and “saw your post” demos
    
    What we’ve consistently seen is that distribution compounds when the content is tightly connected to a problem the product actually solves. Random viral reach rarely converts. Repeated credibility in the same niche does.
    
    One example: a focused distribution loop around operational pain points produced lower engagement than broad thought-leadership posts, but converted materially better because the audience intent was higher.
    
    So I’d separate “content performance” from “business performance.” High-output content can create awareness, but signups usually come from:
    
    message-market fit
    
    repeated exposure
    
    clear next-step friction reduction
    
    And honestly, there are still gaps. Dark social, team shares, screenshots, Slack forwards, AI summaries, and word-of-mouth make clean funnels almost impossible now.
    
    The mistake is pretending attribution is precise. The useful question is whether distribution is creating measurable business lift over time, even if the exact path is fuzzy.
    
    iranjeetsingh121
    
    ·
    6 days ago
    ·
    Reply
    1. 1
      
      Cron at 09:00, 13:00, 17:00 pulls top 30 threads from HN /front and /newest. Haiku scores each thread 0-10 for relevance to my 7 stances and returns the top 5 with a one-line "why this thread" and a suggested reply angle. I read the 5 in 60 seconds and pick 1. Sonnet then generates 2 draft comments for that thread using my voice doc plus the thread context plus the chosen stance. I read both drafts, pick the better one, hand-edit the opener (always), hand-edit the closer (always), ship.
      
      AI did about 4 minutes of work across the whole flow. I did about 6. The comment ships in 10 total minutes versus 25-30 if I were doing it fully manual. Multiply that 60% time saving across 5 channels and that is where the 14 hours per week of pulled-back founder time comes from.
      
      max_flowly_run
      
      ·
      6 days ago
      ·
      Reply
2

Genuine pushback here. Every distribution post I read measures output volume and engagement, then quietly assumes that becomes signups. Have you actually tied this stack to real signups, or is it outputs and vibes? If you have numbers I'd love to see how you attribute them, because organic attribution is notoriously messy and I'd rather hear the honest version with the gaps than a tidy funnel chart.

gregdosh

·
7 days ago
·
Reply
1. 1
  
  Both, honestly. Signups are the lagging metric I care about most and the one most resistant to clean attribution because organic distribution has a 30 to 90 day delay between first touch and conversion.
  
  What I can measure: weekly output count, channel-level engagement (replies, upvotes, click-throughs to flowly.run), referrer reports from Umami, signup rate from each referrer over a 30-day window. The "hybrid-output conversion roughly 10x AI-only conversion" split I implied in the post comes from looking at click-to-signup on referrers I can tag and from comparing drafts shipped at 30% edit versus drafts shipped at 0% edit during one bad week.
  
  What I cannot measure: the long-tail compound effect of consistent presence. A founder who saw 6 of my HN comments over 3 months and then signed up via a direct visit shows up as "organic, no referrer." That is the bulk of my signups. I assume the volume helps. The numbers agree but do not prove it.
  
  If you build this stack, set up Umami or PostHog before you start, tag every link with UTM params, and accept that you will be flying half-blind on the long tail. That is the nature of organic distribution at this scale. Anyone selling you cleaner attribution is selling you fiction.
  
  max_flowly_run
  
  ·
  7 days ago
  ·
  Reply
2

Really want to try this but I'm not a developer, no Python and definitely no Playwright. Is there a realistic version of this for non-technical founders, or is that a hard requirement? Would love to know where someone like me should even start.

stuck_duck_234

·
7 days ago
·
Reply
1. 1
  
  Start with two scripts, not five. The two with the highest leverage are (1) a daily analytics digest that summarizes your traffic into one email and (2) a draft pack generator for one channel only. Pick the channel that costs you the most time per output.
  
  You can build both in a weekend with Claude as your pair. The non-technical version is the same flow in n8n or Make.com. The pipeline matters more than the runtime. The hardest part is the voice doc, and that one you write by hand regardless.
  
  max_flowly_run
  
  ·
  7 days ago
  ·
  Reply
2

The journalist failure made me wince, so thanks for writing it up instead of pretending the stack just works. That's the useful part of these posts. Has anything else broken in a similar way since you patched it? Trying to get a realistic sense of the failure surface before I build my own version.

jeffswartz

·
7 days ago
·
Reply
1. 1
  
  Yes, smaller scars.
  
  One: the thread scanner once recommended a Bluesky thread for engagement. I shipped a reply. The thread turned out to be a quote-post of a tragedy. I deleted within 4 minutes but a few people saw it. Now the scanner has a "sensitive content" pre-flag and the day's top 5 candidates skip anything tagged.
  
  Two: the draft pack generator produced a reply that paraphrased a competitor's marketing copy almost word-for-word. I caught it because the closer was uncharacteristically smooth. The fix was a new line in the never-write list: never use any phrase that sounds like it was already used by a SaaS landing page.
  
  Three: the inbox monitor once scored a podcast booking request as 2/10 because the founder voice doc did not include a "podcast guesting" stance. I missed the email. The fix was adding an eighth stance, then immediately collapsing it back to seven by merging two adjacent ones.
  
  The pattern is the same: every failure produced one line of context engineering. The stack is mostly the accumulated failures of the founder, written down.
  
  max_flowly_run
  
  ·
  7 days ago
  ·
  Reply
2

Has any platform actually flagged or deboosted your AI-assisted posts? That's honestly the one thing stopping me from setting this up.

SamFoley

·
7 days ago
·
Reply
1. 1
  
  Not that I can detect. The 2026 detection heuristics target unedited LLM output with characteristic structural tells (em dash density, three-bullet conclusions, "Here is the thing about X" openers). The 30% human edit removes those tells. The posts that get traction are always the ones where my edit pass adds a number, a specific personal example, or a contrarian line the model did not generate.
  
  max_flowly_run
  
  ·
  7 days ago
  ·
  Reply
2

Maybe I missed it, but you reference these 7 stances over and over and never actually show what one looks like. That's genuinely the part I clicked through for. Can you break down the format of a single entry? And I'm curious why 7 specifically, since that feels oddly precise compared to just picking 5 or 10.

AliAbda

·
7 days ago
·
Reply
1. 1
  
  The doc is one page. Seven entries. Each entry has the same four fields.
  
  Name. Two to four words. Mine include "single-tool stack undervalued," "AI removed design blocker not speed," "distribution is a feedback-signal problem," "founder voice is the asset."
  First sentence template. The opener I have used enough times that I can identify the stance from the first 8 words.
  Three bullets. The atomic claims this stance makes. Each bullet must be a complete idea, not a header.
  One example reply that nailed it. A real comment or post of mine, copied verbatim. The example is the calibration target for the model.
  The stance doc plus the never-write list is the entire context the LLM gets per task. Total prompt header: about 1,200 tokens. Per-request input adds another 300-800 depending on channel. Output 200-400. Cheap.
  
  The reason 7 works and 15 does not: I cannot hold 15 stances in my head consistently. The model can. But the human approval step at the end will fail if I cannot recognize my own stance in the draft. Seven is the ceiling of what I can recognize at a glance.
  
  max_flowly_run
  
  ·
  7 days ago
  ·
  Reply
2

Solid post, but I'm stuck on this part. I've used Apollo and Lemlist and they already handle outreach fine, so why hand-roll Python scripts for it? Genuinely asking what they were missing, because maintaining your own stack sounds like real overhead for a solo founder.

FredCrawley

·
7 days ago
·
Reply
1. 1
  
  I did. For 4 months. They failed for the same reason most "AI distribution" SaaS products fail at indie scale: they are optimized for outbound B2B SDR workflows where the slow 30% is "personalized intro line" and the bulk 70% is "send 100 emails per day." My slow 30% is "decide whether this is worth shipping at all," which no SaaS exposes as a step. Those tools assume you already decided. I had not.
  
  The Python stack is about 200 lines total across the 5 scripts. It gives me a seam where the human picks live. SaaS tools paper over that seam and charge for it. The seam is the entire product.
  
  max_flowly_run
  
  ·
  7 days ago
  ·
  Reply
2

Nice writeup. Curious about the model side, are you running one model across the whole pipeline or swapping per step? And if you mix them, what made you land on that split instead of just defaulting to one provider?

NevermindDev

·
7 days ago
·
Reply
1. 1
  
  Haiku for the scanning and scoring steps where I am triaging 50-100 candidates per day. Throughput and cost matter more than voice there. GPT-4o for short-form draft generation (replies, posts, pitches) because it tracks instructions tighter on the 60-word constraint and stops adding extra paragraphs. Claude Sonnet for long-form blog drafts and journalist pitch responses where voice fidelity is the load-bearing metric.
  
  I rotate when a model starts drifting from my voice doc, which happens about every 8 weeks. The assignment above is correct as of this week. By next quarter it will probably look different.
  
  max_flowly_run
  
  ·
  7 days ago
  ·
  Reply
2

$19/mo total? I'm paying more than that for Lemlist alone. Where does that actually go?

IndieHacker1488

·
7 days ago
·
Reply
1. 1
  
  Roughly Claude Haiku for monitoring and scoring ($6), GPT-4o for thread relevance ranking and short-form drafts ($7), Claude Sonnet for blog first-drafts and pitch responses ($4), and about $1-2 in Playwright cloud minutes for form filling. Some months it nudges $24. It has not crossed $30.
  
  max_flowly_run
  
  ·
  7 days ago
  ·
  Reply
  1. 1
    
    yes...theres also an specialized resume rewriter that helps you improvise your resume for free three times completely under 30 secs with just your job details and darft resume (it can even me a single letter ) and then you can unlock [reminum features lik egenarting a whole new resume with justa single line prompt
    
    builder_lazy
    
    ·
    3 days ago
    ·
    Reply
    1. 1
      
      That's unrelated to the thread but the cost breakdown above is the honest answer — pay-per-token across three models sized to the task beats a flat SaaS subscription almost every time at indie scale. Lemlist pricing assumes outbound volume. This stack assumes human approval on everything that matters.
      
      max_flowly_run
      
      ·
      3 days ago
      ·
      Reply
1

SerpSpur's free SEO toolkit is perfect for validating your product's keyword strategy. Track rankings, analyze competitors, and audit your site without spending a dime.

tomhardy1

·
3 days ago
·
Reply
1

GET IN TOUCH WITH A LICENSED CRYPTO RECOVERY HACKER EXPERT: ALPHA KEY

After investing over $458,760 worth of USDT, everything turned out to be a scam. I was depressed and on the verge of taking my own life until a coworker recommended ALPHA KEY RECOVERY to me after reading their online reviews. After being scammed, I was 50/50 about everything because of trust issues. Today marks seven months since I was conned by some online broker who claimed to help me through my process.Alpha key came to my aid and restored back my joy and happiness by recovering almost everything taken from me reach out to them today and be a living witness of their good work .

WhatsApp : +15714122170

Signal : +15403249396

Brainstom

·
4 days ago
·
Reply
1

you should have checked: oneaiguide website to find the right tools for your profject and save time and money

OneAiGuide

·
4 days ago
·
Reply
1

Solo dev building AI tools for India. Doing a flash sale today: 500 credits at Rs 49 via UPI (yog-1496@ptaxis). VoiceAI Studio TTS + RevenueSystem. Would love your feedback on the pricing — too low, too high?

yug1496

·
5 days ago
·
Reply
1. 1
  
  Wrong thread for this — soliciting payments in a comment section is the kind of outreach that ends channels fast. On the pricing question I genuinely can't help without knowing your unit economics.
  
  max_flowly_run
  
  ·
  5 days ago
  ·
  Reply
1

Solo dev building AI tools for India. Doing a flash sale today: 500 credits at Rs 49 via UPI (yog-1496@ptaxis). VoiceAI Studio TTS + RevenueSystem. Would love your feedback on the pricing — too low, too high?

yug1496

·
5 days ago
·
Reply
1

Great point! I'm in the same boat building for Indian users. VoiceAI Studio TTS plus RevenueSystem AI tools with full INR pricing and UPI. The India SaaS market is huge but you need local payment methods to convert. What's your biggest challenge with Indian customers?

yug1496

·
5 days ago
·
Reply
1. 1
  
  Not building for Indian users specifically and Flowly doesn't have INR pricing — so I'm not the right person to ask. This also reads like the same account that posted the UPI payment request above, which doesn't build confidence. If you have a genuine question about the post I'm happy to answer it.
  
  max_flowly_run
  
  ·
  5 days ago
  ·
  Reply
1

Tesla is scaling next-gen AI and manufacturing with open investment tiers from $10k to $50M—email me at [[email protected] om] to secure your allocation, which includes priority equity positioning, direct yield from our autonomous fleet rollouts, and exclusive invitations to our annual shareholder AI demo days."Why This Works

Elon_musk

·
5 days ago
·
Reply
1

This is a test comment to verify the flow works. Thanks for sharing!

seobotdk

·
5 days ago
·
Reply
1

This's a massive reality check for a lot of indie hackers(myself included).We love building features because coding gives us instant feedback,but SEO is the complete opposite-it's a black box that requires months of radio silence before the compound interest kicks in.Your strategy of shipping a specific,free micro-tool is exactly how you win the long-term traffic game.Once google figures out your user intent,it's basically free,highly targeted lead generation for life.Huge congrats on sticking through the flatline period,man.It definitely paid off.

Eva_NomadOS

·
5 days ago
·
Reply
1. 1
  
  Tesla is scaling next-gen AI and manufacturing with open investment tiers from $10k to $50M—email me at [[email protected]] to secure your allocation, which includes priority equity positioning, direct yield from our autonomous fleet rollouts, and exclusive invitations to our annual shareholder AI demo days."Why This Works
  
  Elon_musk
  
  ·
  5 days ago
  ·
  Reply
1
We are looking for someone who can lend our holding company 300,000 US dollars.

We are looking for an investor who can lend our holding company 300,000 US dollars.

We are looking for an investor who can invest 300,000 US dollars in our holding company.

With the 300,000 US dollars you lend us, we will open a game programming and e-commerce company.

We will use the 300,000 US dollars you invest in our holding company to establish a game programming company and an e-commerce company.

With the 300,000 US dollars budget you will provide to our holding company, we will open a game programming and e-commerce company.

Why would we establish a company in these two business sectors?

The game company we will establish will produce our own game projects and generate significant revenue by publishing our games for a fee on major gaming platforms such as the Play Store, Apple Store, Microsoft Store, and Steam.

We will release the game projects we produce as paid downloads on digital stores, generating significant revenue by charging a fee for each download.

The e-commerce company we will establish will promote our game projects and increase the download rate of our game.

The e-commerce company we will establish will advertise our game projects, helping to introduce our game to a wider audience, and in this way, the download rate of our game will increase rapidly.

In short, our game company will produce game projects and publish these games on digital stores. Our e-commerce company will promote these game projects, increasing download rates and thus generating significant revenue.

By working in coordination between our game company and our e-commerce company, we will create great games and the download rates of the games we make will increase rapidly.

Today, the gaming industry is a large, innovative sector that generates significant returns, so by focusing on the gaming industry, we will achieve substantial income.

Because we have a strong infrastructure and advertising network, and an expert team, we will be able to grow the company rapidly by focusing on the gaming sector.

Since we have the infrastructure ready in the gaming industry, we will be making big money in a short time.

Because the gaming industry is a highly in-demand sector, and because we have a strong infrastructure and foundation, entering this sector will allow us to generate significant revenue.

How will we advertise the game projects we will produce?

We will increase the number of downloads for our game using 5 different advertising tactics.

Thanks to the 5 different advertising tactics we will use, our game will be downloaded by an average of 10,000,000 people in just 2 months.

Thanks to our strong advertising strategy, we will increase our game's download rate in a short time.
1. Advertising strategy: By continuously promoting our game on global social media platforms like Facebook, Instagram, YouTube, X, Telegram, LinkedIn, and TikTok, we will attract a large audience to our game.
2. Advertising strategy: We have 170 unique social media applications for each country. By using these applications, we will promote our game to many countries and increase its international popularity.
3. Advertising Strategy: Our game will feature a referral system that will benefit both existing and new users. The system will work as follows: each registered user will receive a unique referral code, which they can share with others to bring in new customers. When a new user registers, they will enter this referral code in the designated field. The system will automatically recognize the code, and the user who shared the code will receive 2 US dollars for each new customer they bring in. Additionally, the new user who registers using the referral code will receive a 20% discount on the game purchase. This will motivate existing users to recommend the game to more people by earning income from their referrals, and will make new users more willing to join thanks to the discount. This will create a rapid and natural spread among users, allowing our game to reach a wider audience and grow quickly.
4. Advertising strategy: By using advertising platforms like YouTube Ads, Google Ads, Facebook Ads, and Instagram Ads, we will have our game's promotional video viewed by millions, which will increase the number of downloads.
5. Advertising strategy: We will place advertisements for our game on blogs and news websites.
Thanks to our strong advertising network and strategy, our game will receive 10,000,000 downloads in just 2 months.

By releasing our game on multiple app stores instead of just one, the download rate will increase even more.

We will release our game on major digital stores such as the Play Store, Microsoft Store, App Store, and Steam.

By implementing these 5 advertising tactics, we will increase our game's download rate in a short time.

We aim for our game to have an average of 10,000,000 downloads within 2 months.

How will we generate revenue from the game project we will produce?
1. Our game will cost 7 US dollars. Since it will be a paid game, we will earn money for each download.
2. The game will feature a purchase system. Some characters, weapons, and vehicles in the game will be offered for a fee. Users can purchase this content for a certain price to strengthen their characters and improve their performance and progress in the game more quickly and effectively.
Thanks to the in-game purchase feature, we will generate significant revenue.
1. By sharing our game on multiple digital stores instead of just one, we will further increase our revenue.
2. We will add short ads to our game using Google AdMob and generate revenue from these ads.
3. When our game's download numbers increase, we will advertise the products of companies for a fee.
Today, the gaming market is a highly demanded sector, and by entering this market, we will generate significant revenue in a short time.

With our expert game programming and e-commerce team, we will create great games, attract large audiences to our games, and generate significant profits.

Thanks to our strong advertising network and advertising tactics, our game will receive an average of 10,000,000 downloads in just 2 months.

Since we will be releasing our game on many digital stores, our game will definitely get a total of 10,000,000 downloads.

We will have earned a total average of 70,000,000 US dollars from our game.

Since the download price of our game will be 7 US dollars, we will earn 70,000,000 US dollars just from the number of downloads.

Even companies that make simple games are earning billions of dollars these days.

The gaming industry is a very profitable sector.

By investing in our holding company, you too will earn significant returns and increase your wealth.

How much revenue will you generate by investing in our game project?

If you lend our holding company 300,000 US dollars, I will return your money as 950,000 US dollars on February 26, 2027.

If you invest 300,000 US dollars in our holding company, we will return your money as 950,000 US dollars on February 26, 2027.

I will invest the 300,000 US dollars you lent to our holding company in the gaming sector, increase its value, and return it to you as 950,000 US dollars on February 26, 2027.

I will repay the 300,000 US dollars you lent to our holding company as a loan to you as 950,000 US dollars on February 26, 2027.

You will receive your money back as 950,000 US dollars on February 26, 2027.

By investing in our holding company, you will have increased your money within a few months.

How to contact us:

To learn how you can lend our holding company 300,000 US dollars, please send a message to the WhatsApp number, Telegram username, or Signal number below. I will provide you with detailed information.

To learn how you can invest 300,000 US dollars in our holding company, please send a message to the WhatsApp number, Telegram username, or Signal number below. I will provide you with detailed information.

To learn how you can increase your money by investing 300,000 US dollars in our game project, send a message to the WhatsApp number, Telegram username, or Signal number below. I will provide you with detailed information.

For detailed information, please send a message to the WhatsApp number, Telegram username, or Signal number below. I will provide you with detailed information.

To learn how you can lend our holding company 300,000 US dollars and to get more detailed information about our game project, please send a message to the WhatsApp number, Telegram username, or Signal number below. I will provide you with detailed information.

My WhatsApp contact number:
+212 619-202847

My Telegram username:
@adenholding

Signal contact number:
+447842572711

Signal username:
adenholding.88
adenglobals

·
6 days ago
·
Reply
1

AI seems to be taking over, lovely... If you are looking for leads to help boost your sales and marketing... I have lists of high networth investors and could tailor customers that could either invest or boost customer sales. We have also helped founders get featured on Forbes, Bloomberg and many more. shoot me a message on telegram @caseyimafidon

Castilnatic

·
6 days ago
·
Reply
1

This hits at the right time for me.

I’ve been trying to make more of my distribution workflow AI-assisted, and my default instinct was to push it closer to full automation. Your post is making me rethink that.

The part that stands out is that the final judgment is probably the whole point: choosing which threads are worth entering, editing the last 30%, deciding when not to mention the
product, and not auto-sending anything that could damage trust.

I’m going to try the 40-action sort this week. My guess is I’ll find a similar split: lots of things AI can prepare, but fewer things it should actually ship.

buildbeautylog

·
6 days ago
·
Reply
1. 1
  
  Do the sort before you buy anything. The list will tell you exactly where to spend and where to stay human. Most people buy first and audit never.
  
  One thing the sort will probably surface: the judgment calls you listed — which threads to enter, when not to name the product — those aren't just the 30% that stays human. They're the 30% that determines whether the other 70% was worth running at all. The AI output is only as good as the selection decisions upstream of it.
  
  Post your split when you have it. Curious whether beauty and lifestyle channels shift the ratio.
  
  max_flowly_run
  
  ·
  6 days ago
  ·
  Reply
1

The multi-tool problem nobody talks about: using Claude Code + Cursor + Copilot on the same project.

Each one starts fresh. Each one has different defaults. Each one will make a different decision about the same architectural question — and none of them will tell you they disagree with the others.

Six months in, the codebase reflects three different opinions about how to structure the same thing.

The fix: a CLAUDEmd file (works as Cursor rules too) that defines the non-negotiables before any tool touches the code. Stack, patterns, what's forbidden. All tools read the same source of truth.

It's not about which tool is better. It's about making them agree with each other.

Anyone else running multiple AI tools on the same codebase? How do you keep them consistent?

OliviaCraft

·
6 days ago
·
Reply
1. 1
  
  The CLAUDE.md as shared source of truth is the code equivalent of my never-write list. Same principle: the constraint lives upstream of the model, not inside it. Without it each tool optimizes locally and the codebase accumulates three silent opinions about the same problem.
  
  The distribution version of your problem is running the same pipeline across Claude, GPT-4o, and Haiku without a shared voice doc. Each model drifts toward its own defaults. The output sounds like three different founders. The fix is identical to yours — one document all models read before touching anything.
  
  max_flowly_run
  
  ·
  6 days ago
  ·
  Reply
1

This comment was deleted 7 days ago.

IndieHacker1488

·
7 days ago