Building AI Safety from the Decision Layer: Why Routing, Not Detection, Determines System Integrity

Most conversations about trust and safety in AI systems still orbit around detection: Did the system identify the violation? Did the classifier flag the content? Did the alert fire? These are important questions, but they do not capture the whole story, and actually miss where production systems frequently fail.

In real-world large scale AI system deployments, the harm does not only come from the issue not being detected, it can originate further downstream. This gap is reflected in the fact that fewer than 10% of organizations report scaling agent-based AI beyond pilots and limited deployments.

Sanyam Mehra, a Senior IEEE member and author, has focused on building, deploying and advancing AI systems where detection is essential and the accuracy and low-latency of decisions carry immediate consequences. For over a decade, he has worked in environments where automation frequently operates without the buffer of manual review, spanning across advanced AI research, large-scale software and hardware systems, and strategy consulting. His work underpins the safety and security of users across global consumer platforms, where accuracy and reliability are critical to scale system performance with minimal error.

“The violation is almost never the root-cause of failure,” Sanyam Mehra says. “The failure is often in what the system decides to do next.”

That distinction separates safety in theory from operationalizing safety for real-world production systems.

From Detection Success to Routing Failure

Most large-scale user safety systems detect more violations than they can accurately act on. Severe and unambiguous threats are caught with high precision, yet user risk remains. The reason is that detection is only the first step; the architecture that determines the subsequent protective action is what defines real-world safety.

In the systems Sanyam has worked on, automated safety and integrity decisions affect distribution, account status, and downstream access across interconnected products. When routing logic fails to distinguish between the severity of violations, low-prevalence cases with high severity harms can follow pathways built for routine infractions. This can lead to cases where the detection signal may be accurate, but the resulting response action may be inadequate.

These challenges emerge only at scale. Systems processing billions of interactions daily cannot depend on human-in-the-loop oversight without creating untenable backlogs. Once integrity pathways are automated, safety and reliability hinge on how those pathways are designed, prioritized, and bounded.

The industry’s oversight is often subtle. Teams celebrate improvements in detection recall, assuming an appropriate response will follow by default. In practice, routing logic frequently becomes the weakest link, where the overwhelming volume of low-impact cases can obscure the urgency and risk of the most critical ones.

Why Severity Breaks Static Safety Systems

Static rule systems are effective when frequency of the violating events correlates with impact, yet they fail to adapt to cases where low-frequency events could have disproportionately severe outcomes.

Sanyam’s work has extensively dealt with low-prevalence incidents carrying extreme risk: sexual exploitation, graphic violence, scams, abuse, and child safety threats. A low-confidence signal routed aggressively can cause harm. The same signal, routed cautiously, can prevent it.

The redesign focused on the decision framework itself. Enforcement was modeled as a function of three variables; detection confidence, assessed severity, and user context. This introduced a natural proportionality, higher-stakes decisions followed narrower, more auditable pathways. The objective was to enable more precise actions containing clear harms swiftly while applying greater procedural depth to complex, nuanced cases. This traceable, tiered approach offers a model for scaling safety effectively, aiming to reduce the prevalence of harmful content by aligning system design with user protection.

This was not a policy change. It was an architectural one.

The result was remarkably measurable. Architecture-level controls drove a 45% reduction in prevalence violations, while safety action precision and recall improved significantly across severe categories. The gains did not come from adding reviewers or rewriting guidelines. They came from encoding restraint into the decision layer.

Privacy constraints were treated as first-order requirements. Safety and integrity signals were processed through pipelines designed to minimize exposure while preserving accountability. Safety that depends on unrestricted data access does not survive scale.

Oversight Load Is the True Scaling Limit

Automation increases action volume. Oversight capacity does not increase at the same rate. This imbalance is where many safety systems fail quietly.
A common response is to expand stewardship teams. Sanyam, who duals up as an editorial board member and a peer reviewer at the ESP International Journal of Advancements in Computational Technology, through his experience, suggests an alternative approach. If a system requires constant human intervention to remain safe, it is not ready for responsibility, a limitation reflected in industry data showing that nearly 90% of enterprises report experimenting with artificial intelligence, but only one-third successfully deploy AI at scale. Human judgment should be reserved for irreducible ambiguity, not predictable system behavior.

The project introduced tiered escalation paths that separated routine safety actions from extreme-risk cases. Automated containment handled the majority of decisions. Human involvement was triggered only when consequence exceeded what the system was designed to resolve on its own.
This design reduced oversight pressure while improving reliability. It also produced a secondary effect that matters in production environments, efficiency. Model distillation and system optimization within the safety and integrity pipeline delivered approximately $8 million in annual operating cost savings, while maintaining safety standards.

The savings served not merely as a goal but also as evidence that safety and efficiency need not be in conflict when engineering systems are properly designed.

“A safe system is one that limits how often it invokes human intervention,” Sanyam notes, “Not one that relies on it.”

The Preventative Layer

AI safety is often framed as something applied after decisions are made. In production systems, however, safety is determined before an action is ever allowed to proceed.

Sanyam’s perspective is shaped by building systems that operate in an adversarial setting, respond reliably in cases of crises, and meet regulatory compliance, without the luxury of delay. A recurring lesson emerges across these environments: detection alone does not protect users, routing and decisioning are equally important. Decisions about where signals flow, when actions are permitted, and how uncertainty is handled matter more than post hoc analysis.

This philosophy is also reflected in his scholarly work, Corporate Strategy for Secure Semiconductor Supply Chains: ML-Driven Risk and Market Intelligence, which examines how machine learning systems must embed risk awareness directly into strategic and operational decision layers rather than treating security as an external control.

As AI systems take on greater responsibility, integrity will be defined by how decision layers behave when stakes are high and signals are imperfect. That view is reinforced by his role as a Judge at the Business Intelligence Awards, evaluating real-world systems where assessing systems requires balancing performance with risk, traceability, and operational impact. The teams that succeed will be those that engineer restraint, transparency, and consequence-aware response into their systems from the start.

That is where safety truly lives.

Say something nice to DonaldGreene…

1

Agree — routing is where systems quietly fail. We’ve seen good detectors create bad outcomes when low‑severity flows share the same path as high‑risk cases. A simple severity‑aware routing matrix + explicit escalation budgets per risk tier helped a lot. Do you publish any example decision trees / schemas teams can copy?

easy_ai

·
4 months ago
·