I'm building OrchestrAI — the reliability layer for automation workflows. Here's exactly where I am.

by Nwachi

The problem I kept seeing

I've spent the last year working with automation agencies and founders running production workflows on n8n, Make, and similar tools. The same problem came up constantly: it's easy to build automations, but operating them reliably at scale is a different beast entirely.

Workflows fail silently. An AI summarisation node starts returning 40% shorter outputs — no error thrown, no alert fired. A webhook stops receiving data and three downstream automations quietly stop doing anything. A client's invoice workflow runs but processes 2 items instead of the usual 60.

Businesses don't notice the system failed. They notice when outcomes stop happening. Usually after a client complaint.

That's the specific gap OrchestrAI is built to close.

What I'm building

OrchestrAI is the orchestration and reliability layer that sits above your existing automation tools. I'm calling the category AutoOps — the same way DevOps was the operational layer that had to exist after cloud computing proliferated, AutoOps is the layer that has to exist now that every company is running dozens of automations.

MVP scope (4 features, hard locked):

Workflow Health Dashboard — unified view of all n8n and Make workflows, health status, 7-day success rate trend, real-time updates
Detection Engine — 5 rules: hard failure, execution silence, success rate drop, duration spike, output volume anomaly. The last two use a 7-day rolling baseline to catch silent degradation
Retry & Recovery Engine — auto-retries failed workflows up to 3 times, logs every attempt, flags for manual intervention when exhausted
Dependency Graph — user-declared workflow relationships rendered as an interactive node graph, with downstream warning indicators when upstream nodes fail

The technical stack

Frontend: Google Stitch (Firebase App Hosting) — the dependency graph screen is a standalone React + React Flow page hosted separately because Stitch can't render it natively.

Backend: Node.js + Express on Cloud Run. Monitoring engine is a scheduled Cloud Function that polls connected n8n/Make instances every 5 minutes, evaluates detection rules, and queues alert delivery jobs via Cloud Tasks.

Database: Firestore. 6 collections: users, integrations, workflows, executions, alerts, dependencies.

Auth: Firebase Authentication (email + Google).

First integration: n8n (their REST API gives us execution history, workflow list, and the ability to trigger retries). Make.com is second. Zapier is explicitly out of scope for MVP — their API is too closed.

Where I am right now

Problem validated with automation builders and agency owners
Full build-ready PRD written — data model, API endpoint reference, Firestore security rules, composite indexes, detection logic, sprint task breakdown
All 7 screens designed (dark mode, interactive mockups)
Google Stitch prompt guide written for each screen
Joined a pre-accelerator program
Waitlist live: orchestraii.carrd.co
Actively looking for a technical co-founder (Node.js + Firebase)
Entering MVP build phase this week

What I need now: 20 beta testers

Specifically looking for:

Automation agencies running client workflows on n8n or Make
Founders running production automation stacks
Anyone who has experienced a silent workflow failure and had to find out from a client

Beta users get free access for the full beta period, direct input on the roadmap, and I'll personally onboard each one.

If that's you, drop a comment or reach out directly. Would also love feedback from anyone who has built in this space — what am I missing, what would make you actually pay for this?

Nwachi

posted to

AI Tools

on April 8, 2026

Say something nice to DraX…

Post Comment

1

Hi Nwachi,

Really enjoyed your post about OrchestrAI. Defining the "AutoOps" category is a sharp move, and solving silent failures is a massive pain point.

I saw you are entering the MVP phase and actively looking for a technical co-founder. I'm a full-stack developer heavily focused on Node.js and building out robust backends. The stack you've laid out (Node.js + Express on Cloud Run, Firestore) aligns perfectly with the architecture I like to build.

I'd love to connect, hear more about the dependency graph challenges, and see if there's a mutual fit for the technical side of things. Let me know if you're open to a quick chat this week.

a6dulhadi

·
13 days ago
·
Reply
1. 1
  
  Really appreciate the thoughtful message — and glad the OrchestrAI post resonated.
  
  The silent failure problem is exactly the part I think most teams underestimate until workflows start scaling and trust starts breaking down.
  
  It’s great to hear the stack and architecture direction align with how you like to build as well. At this stage, I’m being very intentional about finding someone who thinks beyond implementation — architecture, product decisions, and long-term ownership matter a lot.
  
  I’d definitely be open to a quick chat to explore fit and exchange ideas around the dependency graph challenges and the broader technical direction.
  
  Let me know what your availability looks like this week.
  
  Looking forward to it.
  
  DraX
  
  ·
  11 days ago
  ·
  Reply
1

Defining the AutoOps category is a sharp move, Nwachi. You’ve identified that the real cost of automation isn't the build—it's the silent failure that erodes client trust. Moving beyond hard errors to "output volume anomalies" is the exact reliability layer needed for professional-grade workflows.
I’m currently running Tokyo Lore, a project that highlights high-utility infrastructure logic and builders who solve "scale" problems. Since you are building the definitive reliability layer for automation agencies, entering OrchestrAI could be the perfect way to turn your "AutoOps" framework into a winning case study while your odds are at their absolute peak.

⚙️ AutoOps & Automation Reliability Leads
The Silent Failure Audit: A cautionary tale from a founder who needs your "execution silence" rule right now.
Baseline Monitoring Strategies: A technical thread on the duration and volume anomalies you mentioned.
N8n vs. Make Scalability: A deep dive into the operational overhead of the exact stacks you’re integrating with.
💡 Key Anchor: Reliability is a feature, not just a status code.

Tokyolore

·
15 days ago
·
Reply
1. 1
  
  Really like how you’re thinking about this — especially the decision to validate with a toolkit before forcing a full SaaS.
  I’m building in the automation/orchestration space too (more around operational decision workflows), so I end up thinking about similar tradeoffs between flexibility vs convenience.
  Would be great to connect and exchange notes as you keep validating this.
  
  DraX
  
  ·
  11 days ago
  ·
  Reply
  1. 1
    
    Appreciate that — yeah, the toolkit-first approach felt like the fastest way to validate real pain before locking into a full SaaS.
    
    Interesting that you’re in the orchestration space too — the flexibility vs convenience tradeoff keeps coming up everywhere.
    
    Would definitely be good to compare notes — especially around what signals you’re seeing from users so far.
    
    Tokyolore
    
    ·
    10 days ago
    ·
    Reply
    1. 1
      
      Exactly — I’ve found that once you move closer to operational workflows, users care less about “features” and more about trust, reliability, and whether the system can handle messy real-world conditions.
      
      A lot of the strongest signals I’m seeing are around silent failures — automations technically running, but decisions breaking because context is missing or ownership becomes unclear. That trust gap is what pushed me toward OrchestrAI.
      
      I’m still early in validation, but the consistent pattern is that teams don’t just want automation — they want confidence in how decisions are made and visibility when things drift.
      
      Would be great to compare notes on what you’re seeing as well. Happy to jump on a quick chat this week if that works for you.
      
      DraX
      
      ·
      10 days ago
      ·
      Reply
      1. 1
        
        Yeah, that lines up a lot — especially the shift from “is it running?” → “is it still correct?”
        Silent failures are scary because they look like success on the surface. By the time someone notices, the damage is already done.
        Your point on “confidence + visibility” is key — feels like:
        → logs tell you what happened
        → but what teams need is why it drifted
        If you can surface that clearly, that’s a real moat.
        Would definitely be great to compare notes — especially across different workflows (agencies vs internal ops).
        Also, on Tokyo Lore — this is exactly the kind of system we like testing, because it’s not about features, it’s about whether people actually trust and rely on it over time.
        We usually run it with a small builder group and look at:
        → where things break in real workflows
        → what signals actually matter to users
        → what makes them trust (or stop trusting) the system
        Since you’re early but already seeing real patterns, this could be a strong fit.
        Happy to share details if you’re open 👍”
        
        Tokyolore
        
        ·
        9 days ago
        ·
        Reply