I'm building a small reliability layer for LLM outputs and I've been talking to engineers who run AI systems in production.
One thing I've learned this week is that there are multiple layers of AI reliability:
Structural failures
Logical failures
Agent/runtime failures
My MVP currently focuses on the first layer (structure validation), but the conversations I've had suggest many production issues happen after outputs have already passed schema checks.
For those building AI products:
What's the most common failure mode you've seen in production?
Schema issues?
Business-rule violations?
Hallucinations?
Context problems?
Something else?