3
20 Comments

I run CI on AI workflows so they stop rotting. If you have one that works, I will verify it.

Most "AI workflow" guides are screenshots. You copy the setup, it half-works, and a few weeks later, a flag gets renamed, and the whole thing is dead. Nobody re-ran it.

I have been building FlowStacks (https://flowstacks.xyz) to deal with that: AI workflow recipes where CI re-runs each recipe's deterministic setup on every push and grades it. If a recipe breaks, the badge goes red. CI only claims what it can actually check: the config, the wiring, the structure. The step where a model thinks is fenced off as non-CI, because no green check can promise a model's judgment.

There are more than 100 verified recipes now, and most of them came from posts in my community. I would like to widen that, so here is a genuine open invite.

If you have an AI workflow you actually run, send it. If it passes verification:

It gets a recipe page on FlowStacks with the CI badge and your name on it as the builder.
Your tool gets a backlink from the open-source awesome-ai-workflows list on GitHub.
That part is for everyone whose recipe earns its badge. Nothing gated, no signup.

The strongest few get more than that: I feature them in the WebAfterAI newsletter (320+, thrice a week) and post them to r/WebAfterAI (11k+ members) from my account, which tends to travel further than a cold self-post(on average, my Reddit posts get 30-50K views per post). To be straight about it, that amplification is selective; the verified page and the backlink are not.

Building an open-source or AI tool yourself?
This is the same invite from the other side. A recipe that exercises your tool becomes a machine-checked "it works" page for it: proof that it runs today, re-checked on every push, instead of a screenshot from launch week or a star count that says nothing about whether the thing still installs. Your tool gets its own page on FlowStacks and a link from both the recipe and the awesome-ai-workflows list. If you would rather send people a page that keeps re-proving your tool works than a README that quietly went stale, that is what this does. It might also be genuinely useful for you: a third party verifying your setup reads very differently from your own docs.

Three ways to submit, whichever fits:

Open an issue (there is a "suggest a workflow" template): https://github.com/Neeeophytee/awesome-ai-workflows/issues/new/choose
Use the request form on the site: https://flowstacks.xyz/submit
Or just post it in r/WebAfterAI yourself. Anyone can. I run the strongest ones from my account because they get more eyes there, but the subreddit is open.

What makes something verifiable: a workflow with a deterministic spine we can check (a config that parses, a flag that must be present, a round-trip that returns a known fact).

If the repo is useful to you, a star genuinely helps the next person find it: github.com/Neeeophytee/awesome-ai-workflows. Only if it is useful.

I would start to post a build-in-public update here each week with the real numbers and what broke. Teardowns of the site are welcome, too.

posted to Icon for group Building in Public
Building in Public
on June 29, 2026
  1. 1

    The rot problem is real and worse for anything that touches the outside world. Renamed flags are one axis, but the nastier one is upstream drift: a model update changes the output format, or a site an agent depends on ships a new layout or anti-bot rule — and the workflow fails silently, still runs, just returns garbage. CI that only checks "did it error" misses those; you need assertions on the shape/quality of the output, not just exit codes. Curious how FlowStacks handles non-deterministic outputs — snapshot expected structure, or score ag

  2. 1

    The trust asymmetry is the real product here: a third party re-verifying your setup reads completely differently than your own README, and most dev tools never build that wedge. My one worry is the badge itself. You fence off model judgment as non-CI, but that's exactly where AI workflows rot in practice (the config still parses while the output quietly degrades), so a green check can signal 'works' when the part users actually care about has drifted.

  3. 2

    The screenshots rot point is real. The part that usually breaks isn't the clever prompt, it's the dumb last mile: a flag changes, an env var disappears, the selected text is different, or the active app steals focus. I built DictaFlow, and we've run into the same thing with voice workflows. The workflow looks solid until one dependency or UI assumption shifts. A machine checked recipe page is way more convincing than a launch week demo gif.

    1. 1

      Hey Ryan,

      Glad that our concept resonated with you. This is the exact reason that motivated me to build this. I got burned too many times! :)
      On a separate note, do you have a workflow that involves DictaFlow? If so, please share. I would love to verify it, share it on our platform and add it to our GitHub Repo.
      Thanks again for your reply!

  4. 2

    I like this a lot — it feels like you’re solving a real problem people just accept.

    Most workflow content does rot, and no one really questions it. Your approach of re-running and showing when something breaks just makes sense.

    What stands out to me is that you’re not trying to overpromise. You’re only verifying the parts that can actually be checked, and leaving the “model thinking” outside of CI. That actually makes the whole thing more credible.

    Also the red/green badge idea is simple but powerful. It instantly answers the only question that matters: “does this still work?”

    If I’m being honest, I think the idea is strong — the only thing I’d tweak is how quickly it clicks for someone new. Once I get it, it’s obvious, but it takes a few seconds to fully land.

    Something like:
    screenshots show what worked once
    this shows what still works

    Overall though, this feels way more like infrastructure than content, which is probably why it stands out.

    1. 1

      Thank you for the comment, this is really useful. You are right that it took a beat too long to land, and your "screenshots show what worked once / this shows what still works" framing nailed the gap better than my own copy did.

      I just rewrote the homepage subhead to lead with exactly that, so the first thing a newcomer reads is the contrast, not the category.
      The "feels more like infrastructure than content" line is going on the wall, too. Genuinely grateful, this is the kind of feedback that changes the product.

  5. 2

    The distinction between verifying the deterministic parts of a workflow and deliberately not claiming to verify the model's reasoning is what stood out to me. A lot of AI products blur that line, but drawing a clear boundary around what can actually be tested makes the trust signal much stronger. The idea of turning "works today" into something continuously re-verified instead of a one-time screenshot also feels like a meaningful shift.

    1. 1

      Hey Aryan,

      Thanks for your kind reply, glad that it resonated with you. That boundary is the whole product, honestly. The moment you claim to verify the model's reasoning, you are back to selling vibes with a green checkmark on top, and everyone can feel it.
      I got burned by rotten workflows firsthand, so Flowstacks was born of that experience. If you have a workflow that you want to be up and running always, feel free to submit it, I will verify and add it to Flowstacks.

      1. 1

        That's exactly what I was curious about.

        Reading your reply, I think there's one strategic business decision sitting underneath that boundary which becomes much more significant as Flowstacks grows, but I don't think I can do the reasoning behind it justice in a thread.

        Happy to explain what I mean if it's useful. What's the best email to reach you?

  6. 1

    The deterministic spine vs model judgment split is the right call and it mirrors what we see in business automation. Most setups that claim to break are blamed on the AI when its actually the trigger config, the webhook endpoint, or the env variable that quietly changed. The model is fine. The pipe rusted. Separating those two failure modes is the only way to know which one to fix. What tooling do you use to detect config drift separately from output drift, or do you handle both through the same verification pass?

    1. 1

      Thanks for the comment! Yes, we handle both through the same verification process.

  7. 1

    Interesting idea! AI workflows can break over time as models and dependencies change. What kinds of issues do you find most often during verification?

    1. 1

      Three things break most often, in order:

      1. Output format drift. CLI tools change their stdout without a major version bump.

      2. Flag and config key renames. A model provider renames a parameter, a CLI tool drops a flag or moves it to a subcommand. If the shell step exits non-zero, the recipe aborts. This is the most common real breakage, the workflow is conceptually fine, but the invocation is wrong.

      3. Credential requirements are buried mid-recipe. A workflow looks like it has a deterministic spine, but step 3 silently needs a token to make a real round-trip.

  8. 1

    the "no green check can promise a model's judgment" framing is exactly right. we see the same pattern from the people side — in data we've collected (aisa.to/state-of-ai-fluency), most people's AI output verification is basically "does this sound right?" which isn't verification at all. curious how you handle when a recipe's structure passes but the underlying model has degraded

    1. 1

      A green check genuinely cannot catch a model quietly getting worse while still returning something valid-shaped.

      As the badge only ever claims the deterministic spine (config parses, flag present, round-trip returns a known fact), and the model's judgment is fenced and labeled as not checked. Where a recipe can assert a concrete golden output, drift does flip it red. Everything is date-stamped and version-pinned, so it reads as "verified on this date against this model," not forever.

      Continuous structural re-runs are what we do, and closing the model output quality gap without a trustworthy grader is still an open problem. I think with aisa you are trying to solve it. Did I understand it right?

  9. 1

    Interesting approach. I run a similar automated pipeline for an AI tools directory — n8n orchestrating discovery and content generation daily. Workflow rot is a real problem once you're not watching it manually. How are you detecting the rot — error rates, output drift, or something else?

  10. 1

    The “deterministic spine” framing is solid. One thing I’d love to see on each recipe page is the last replay log or fixture it passed against, not just the badge. That makes the green check feel less abstract and gives builders something concrete to debug when it goes red.

    1. 1

      Thanks for your reply! A lot of that already exists; it is just one click in.
      Hit "Run" on any verified recipe and the badge expands into the recorded replay: the actual transcript output plus a per-fixture list (each fixture id, pass/fail, and latency) pulled from the last CI run, with the date it last passed. The expected value for each fixture is shown above that even before you click.

  11. 1

    This is a genuinely interesting approach. Too many AI workflow tutorials become outdated within weeks, so having CI verify the deterministic parts of a workflow is a smart way to keep recipes trustworthy. I also like that you're explicit about what CI can and can't validate instead of overclaiming. Looking forward to seeing how the library grows and what kinds of workflows the community contributes.

    1. 1

      Thank you, that is the line we care most about holding. The temptation to slap a green check on the model step is real, and the moment you do it, the whole signal is worthless, so we would rather make a smaller true claim than a big fake one.
      The community contribution part is what I am most curious about, too.
      Open invite stands: if you have a workflow you actually run, send it, and we will verify it.

Trending on Indie Hackers
I Was Picking the Wrong SaaS Tools for Two Years. Here's the Mistake I Finally Figured Out. User Avatar 120 comments Drop your landing page URL. I'll use Ferguson to tell you why visitors might be leaving User Avatar 101 comments I sold $6,773 in 2 weeks, with almost no existing community. User Avatar 40 comments Ferguson is LIVE on ProductHunt today... so I audited their homepage first! User Avatar 35 comments Why Remote Teams Stop Talking (And Don't Even Notice It) User Avatar 26 comments Built a local-first Amazon profit-by-SKU + QuickBooks/Xero journal tool. Looking for founding users. User Avatar 24 comments