1
0 Comments

After 40+ comments on AI agent failures, one pattern is clear: the model is not the main problem

A few days ago, I posted that most AI agent failures are actually workflow failures.

I expected some disagreement.

Instead, builders from very different areas started saying the same thing:

  • customer support agents
  • DeFi support agents
  • WordPress AI tools
  • voice AI
  • ETL/data pipelines
  • sprint planning agents
  • real estate workflows
  • multi-agent chains
  • court recording systems

Different industries.

Same failure pattern.

The AI model was usually not the main problem.

The surrounding system was.

What kept showing up

Across the comments, the same problems appeared again and again:

  • unclear decision boundaries
  • hidden business rules
  • missing escalation paths
  • weak permission boundaries
  • no traceability
  • no provenance
  • unclear memory/context scope
  • handoff failures between agents
  • confident wrongness
  • poor visibility into why the AI did something

One comment described it perfectly:

The AI did not create the ambiguity.

It exposed it.

The hidden human layer

A lot of teams think they are automating a clean workflow.

But what they are actually automating is:

  • undocumented judgment
  • tacit heuristics
  • social context
  • exceptions
  • invisible escalation behavior
  • “everyone knows this” logic

Humans silently compensate for all of that.

AI agents do not.

So when the AI fails, it often looks like model unreliability.

But in reality, the model was the first participant forced to operate only from the written system instead of the implied one.

Confident wrongness is the dangerous failure mode

Visible failures are usually survivable.

If the AI says “I don’t know,” escalates, asks for clarification, or refuses to act, humans can intervene.

The dangerous case is confident wrongness.

That is when the agent:

  • sounds correct
  • responds quickly
  • satisfies the user in the moment
  • but violates the actual business rule underneath

Examples people shared:

  • promising a refund it cannot authorize
  • treating a high-value customer like a free-trial user
  • using fallback data as if it were real data
  • reading one agent’s output as instruction instead of context
  • inventing certainty because the UI wanted a clean number

That is not always a model intelligence problem.

It is often a workflow governance problem.

What production AI actually needs

The thread made one thing clear to me:

Production AI needs more than better prompts.

It needs runtime structure around the model:

  • policy boundaries
  • memory scope
  • role and identity control
  • permission checks
  • escalation paths
  • trace IDs
  • provenance
  • audit logs
  • fallback behavior
  • uncertainty visibility
  • reviewable decisions

These are not just “AI safety features.”

They are operational trust primitives.

What NEES Core Engine is trying to solve

This is the problem I am building toward with NEES Core Engine.

NEES Core Engine is a governed AI runtime layer for production AI apps.

The idea is simple:

User → App → NEES Core Engine → Model Provider → Governed Response

Instead of sending a prompt directly to a model and hoping for the best, NEES adds a governance layer around the AI call.

It is designed to help answer questions like:

  • what was the AI allowed to do?
  • what mode was active?
  • what memory scope applied?
  • what policy boundary was used?
  • when should escalation happen?
  • what trace ID belongs to this response?
  • how can the response be reviewed later?

The goal is not to replace the model.

The goal is to make the system around the model more controllable, inspectable, and production-ready.

Governance is also diagnostic

One of the best insights from the discussion was this:

Governance is not only protection.

It is diagnostic.

When you force an AI agent to operate inside explicit rules, you discover which parts of the workflow were never actually defined.

That means a governance layer does two things:

  1. It helps constrain AI behavior.
  2. It exposes workflow gaps before they become production failures.

That second part may be just as valuable as the first.

Developer preview

I recently opened a developer preview repo for NEES Core Engine:

https://github.com/NEES-Anna/nees-core-developer-preview

It includes:

  • Python quickstart
  • Node.js quickstart
  • cURL examples
  • API reference
  • governance flow docs
  • 15-minute integration guide
  • API key request template
  • developer feedback template

There is also a live sample app connected to the governed runtime:

https://naina.nees.cloud

What I am looking for

I am looking for feedback from builders working with real AI workflows.

Especially around:

  • traceability
  • escalation
  • memory boundaries
  • runtime governance
  • AI auditability
  • workflow control
  • production failure modes

My current belief:

Production AI is not just a model problem.

It is workflow design + governance + observability around a model.

Curious if others are seeing the same thing.

When your AI system fails, what usually broke first?

The model?

Or the system around it?

posted to Icon for group Startups
Startups
on May 11, 2026
Trending on Indie Hackers
I've been building for months and made $0. Here's the honest psychological reason — and it's not what I expected. User Avatar 176 comments 7 years in agency, 200+ B2B campaigns, now building Outbound Glow User Avatar 53 comments This system tells you what’s working in your startup — every week User Avatar 52 comments 11 Weeks Ago I Had 0 Users. Now VIDI Has Reviewed $10M+ in Contracts - and I’m Opening a Small SAFE Round User Avatar 46 comments The "Book a Demo" Button Was Killing My Pipeline. Here's What I Replaced It With. User Avatar 23 comments My AI bill was bleeding me dry, so I built a "Smart Meter" for LLMs User Avatar 16 comments