When AI Agents Fail Silently: A Conversation with Moda’s Co-Founders Pranav Bedi and Mohammed Al-Rasheed

AI agents are moving from demos into real customer and operational workflows, and a new problem is emerging: they do not fail like traditional software. A normal system fails loudly, with an error, exception, timeout, or crash. An agent can fail while still looking useful. It can take the wrong path, use the wrong context, make the wrong decision, or say the task is done when the underlying work never happened.

That is the problem Moda is built to solve. Moda is the continual learning layer for AI agents. It turns production agent traces into validated improvements for your agent harness. Production conversations go in, and continually improving agents come out. The idea is simple: every trace should make your agents better.

Moda runs a single loop in production: diagnose where agent runs break, generate concrete harness improvements, and validate those fixes against historical traces before your team ships. No model retraining required.

At the center of this work are Pranav Bedi, Co-Founder and CEO of Moda, and Mohammed Al-Rasheed, Co-Founder and CTO. Bedi and Al-Rasheed lead product together, with Al-Rasheed driving the core technical systems behind Moda’s trace analysis, clustering, replay, and validation.

We sat down with Bedi and Al-Rasheed to discuss why production agents fail quietly, why observability alone is not enough, and how the trace-to-fix loop turns real conversations into better prompts, tools, workflows, and evals.

Let’s start with the phrase you lead with: “the continual learning layer for AI agents.” What does that actually mean?
Pranav Bedi: Thank you for having us. It means agents should get better from their own production traffic, automatically. Today most teams ship an agent, watch usage climb, and treat launch as the finish line. But agent quality is not static. Every real conversation contains a signal about what worked and what broke, and almost none of that gets turned into an improvement.

Moda sits on top of your agent as that learning layer. Production conversations go in, and continually improving agents come out. We turn traces into validated fixes for the harness, the prompts, tools, workflows, memory, evals, so the agent keeps getting better without anyone reading transcripts by hand.

Mohammed Al-Rasheed: And it works without retraining the model. The failure mode for agents is behavioral, not a crash. The agent may respond, but still use the wrong context, skip a needed action, get stuck in a loop, or make a decision that seems reasonable but breaks the workflow. Traditional software gives you a stack trace. Agents give you a conversation.

So the trace becomes the source of truth. You need the user message, the agent’s memory and state, the tools it considered, the tools it called, and the outcome together. That is the raw material we learn from.

Teams already have tracing and observability tools. Why isn’t that enough?
Mohammed Al-Rasheed: Tracing and observability tools are built around system health and visibility. They are good at latency, errors, logs, and spans, and the popular tracing platforms do that well. But they stop at showing you what happened. They do not tell you what to change, and they do not verify a fix.

For agents, the root cause could be the prompt, tools, workflow, memory, model, or product logic. Moda is built to attribute each failure to the right source and then close the loop, diagnose where the run broke, generate a specific improvement, and validate it against your historical traces before anyone ships. That is the difference between a dashboard and a learning layer.

Pranav Bedi: From the company side, the difference is trust. Users do not care that an agent is technically responding if it is not finishing the job. So we do not position Moda as another analytics or tracing tool. We help teams prioritize the fixes most likely to improve the agent, based on real production behavior instead of internal guesses. The companies that win with agents will not just be the ones that launch fastest. They will be the ones that learn fastest.

Walk us through the loop. What actually happens between a broken trace and a validated fix?
Pranav Bedi: Three steps. First, diagnose: Moda analyzes your agent logs to find where each run broke, why it failed, and whether the issue came from the prompt, tools, workflow, memory, model, or product logic. Second, generate: it turns repeated failure patterns into specific fixes your team can review, prompt changes, tool updates, workflow edits, verifier gates, eval cases, and reusable skills. Third, validate: it tests those proposed fixes against your historical production traces, measures impact and regressions, and recommends the change most likely to improve the agent.

The key is that the output is never a vague chart. It is a concrete change with evidence behind it.

Mohammed Al-Rasheed: And to make that loop reliable, you have to respect structure. A long conversation is rarely one clean task. A user can start with onboarding, switch to a product question, hit an error, and then ask for a workaround. If you treat the whole conversation as one object, you lose the signal.

So we break conversations into coherent parts, understand each part in context, and compare it against similar behavior across production. That lets Moda learn from real patterns: a user correction that reveals the canonical answer, a tool call that broke because of schema drift, a workflow loop that should have escalated, or an emerging intent with no handler yet. Each becomes a candidate fix mapped back to the harness.

Mohammed, what was the hardest part of making that work at production scale?
Mohammed Al-Rasheed: Making traces useful, not just storing them. Production agents generate a flood of data, messages, tool calls, intermediate state, retrieved context, decisions, and outcomes. The challenge is turning that into an accurate picture of what the user was trying to do and where the agent fell off track.

The naive approach is to cluster whole conversations, but that creates vague buckets that are hard to act on. We built around segmentation, summaries, embeddings, and replay so the system groups meaningful failure patterns and attributes each one to its source. Then it replays candidate fixes against historical traces to measure impact and catch regressions. The loop only matters if the fix is validated before it ships.

Pranav, what convinced you this was a company and not just a feature inside someone else’s tool?
Pranav Bedi: Customer urgency. Teams deploying agents had usage, but they did not have confidence. They knew users were hitting issues, but they could not reliably answer basic questions: where is the agent failing, why, did our last change help, and what should we fix next?

That is not a small feature gap. It is a blocker to putting agents into more important workflows. If a team cannot trust how its agent behaves in production, it cannot scale that agent responsibly. A learning layer is not a bolt-on. It is its own layer of the stack.

What do most teams get wrong about improving agents today?
Pranav Bedi: They treat deployment as the finish line, and they treat fixes as one-off firefighting. Something breaks, an engineer patches a prompt, and the lesson never makes it back into the system. New prompts, new tools, new users, and new edge cases keep creating regressions.

The better question is not “Did we launch an agent?” It is “Do we have a loop that turns every failure into a validated fix?” Production conversations in, continually improving agents out. That is the shift we think matters.

Mohammed Al-Rasheed: People also underestimate how much structure is hidden inside production conversations. A conversation is not just text. It contains goals, turns, state changes, tool decisions, retries, failures, and sometimes security risk. If you flatten it, you miss what matters. The next generation of agent infrastructure has to understand that structure and connect it back to the harness, without retraining the model every time.

Where does AI agent reliability go over the next few years?
Pranav Bedi: The continual learning layer becomes a required part of the stack. Right now many teams are still asking, “Can we build an agent?” The next question is, “Can we make this agent improve safely from real usage?” That is where Moda fits.

The vision is that every production conversation improves the system. A user correction becomes a patched prompt. A tool failure becomes a relearned schema. A workflow loop becomes an escalation gate. A missing case becomes the next eval. If a change causes a regression, the team catches it in validation before customers feel it at scale.

Mohammed Al-Rasheed: The future is not just agents that respond. It is agents that can be diagnosed, improved, validated, and trusted in production. If agents are going to handle more important workflows, learning from production cannot be an afterthought. It has to be built into the harness.

Thank you for your time, Mr. Bedi and Mr. Al-Rasheed.

Thank you. It was great to speak with you.

As AI agents move deeper into customer support, sales, operations, and product workflows, Moda’s argument is becoming harder to ignore: the next challenge is not simply building agents, but building the loop that makes them better. Bedi and Al-Rasheed have built a company around that gap, combining founder-led customer discovery, technical depth, production trace analysis, and fix validation for a market that is quickly outgrowing demo-stage tooling. Both are Y Combinator founders, with Moda part of YC’s Winter 2026 batch. Al-Rasheed has also judged emerging technical talent at hackathons, including the OpenAI Codex Hackathon and the BrowserUse Web Agents Hackathon at YC.

Together, Bedi and Al-Rasheed are helping define a new category of AI infrastructure: the continual learning layer, where Moda diagnoses production failures, generates concrete improvements, and validates what to ship.
Production conversations in, continually improving agents out. Every trace should make your agents better.