When AI Goes Rogue: Guardrails for Agentic Systems

A few months ago, I watched two AIs get stuck in a politeness loop. That moment kicked off a journey into multi-agent orchestration, recursive failure, and the realities of building AI systems that don’t eat your wallet or implode from existential doubt. Here’s what we learned building ScrumBuddy’s AI orchestrator.

The Conference That Broke My Brain (and a Few AIs)

Earlier this year, I was at a tech conference in Dubai listening to a speaker talk about the pitfalls of agentic AI. He described an experiment: two agents tasked with solving a problem overnight. By morning, the problem was long solved but the agents were still going, locked in an infinite loop of politeness, endlessly thanking each other for their great contributions.

That moment resonated hard. We’d seen the same issues. This wasn’t just our problem. It was something bigger. How do you orchestrate AI agents so they know when to stop? And what does that mean more widely?

It’s easy to think of AI orchestration as just solving tasks. But the real questions come after that:

What happens when the task finishes?
How do agents agree they’re done?
What if they can’t finish?
What happens if the whole thing gets interrupted halfway?

We were already facing these issues ourselves. And spoiler: they’re not easy to solve.

AI, in many ways, is like a toddler with a PhD. Brilliant, but also wildly unreliable. You can say, “When you're done, return some JSON or call this function”. But that’s an aspiration, not a guarantee.

What if it calls the wrong function? Or outputs the answer in plain text instead of structured data? Or the JSON is malformed, wrapped in rogue Markdown, or contains strings so poorly escaped they should be in witness protection?

The task is done. But also... it isn’t.

The Think Tank Epiphany

At the next session, I found myself struggling to focus on a painfully vague discussion around AI marketing. Instead, my brain locked onto the orchestration problem like a terrier with a chew toy.

What if, instead of one big brain trying to do everything, we treated AI more like a think tank?

Imagine a table. Around it, there are several expert agents. Same brief, different roles. No egos, no interruptions, no one positioning themselves for a promotion at the expense of the others. Just a group of focused minds working through a problem. Efficiently. Cleanly. Without the mess of humanity piled on top.

That was the vision. But it only works if they could self-govern. That meant setting up rules of engagement for the team.

Here’s what felt essential:

All agents work towards the same goal, but with clearly defined roles
They take turns in a round-robin format so no one dominates the discussion
They can ask each other targeted questions to refine their understanding
Any agent can propose a vote on a topic, and others respond yes/no, with optional comments
Agents can propose a solution, triggering a vote. If it passes unanimously, the task ends

As soon as I could escape for the day, I went back to the hotel and threw together the first prototype. Just a barebones console app riding on OpenAI’s APIs.

It had a handful of baked-in tool calls:

Ask another agent a question
Propose a solution
Propose a vote
Vote yes / Vote no

All agents saw the same shared context (think Discord thread), but took turns, with the system dynamically stitching context for each one in the loop.

Was it elegant? God, no.
The string parsing was fragile, the voting logic was a mess, and it broke constantly.

But it worked. Remarkably well.

The first task I gave it was to generate a marketing article under strict constraints. What came out in 90 seconds was… surprisingly usable. Better yet, it followed style guides more tightly than anything I’d ever wrangled out of ChatGPT directly. It was more than the sum of its parts.

Something was happening.

The Evolution

Over the next few evenings in Dubai, the dirty hack started growing teeth. I began wiring in function calls to key services like Google Search and web scraping, improved the voting logic, tightened context handling, and bolted on a backend to recover from crashes or failed runs.

What began as a toy was fast becoming the core orchestration model we now use in Maitento (ScrumBuddy’s AI orchestrator).

To push it harder, I gave the agents a new challenge: write a news article covering current political affairs from the last 24 hours.

But not just any article. I set up two versions of the task; one targeting a left-leaning audience, the other right-leaning. In both cases, the agents had to:

Choose the topic themselves
Research it
Write the article in the voice and tone of the target demographic

Critically, the content had to remain factually accurate.

I added four specialized agents:

Copywriter
Fact Checker
Researcher
Editor

Each could skip their turn if they had nothing useful to add, and together they worked toward producing the final output.

Average time to complete? About 3 minutes.

Cost? A few cents.

Accuracy? Solid.

And the results? Genuinely impressive.

This was content that was nuanced, audience-aware, and aligned to style guides. Better than anything I’d previously wrangled out of a single AI.

The output wasn’t just good. It was better than the sum of its parts.

It felt like a team.

How a User Story Caused Recursive Madness

By this point, we’d built a lot on top of the model. ScrumBuddy now had entire teams of AI agents working together; clarifying requirements, writing specifications, estimating work, even generating code.

And then it happened again.

During one of our test cycles, we started getting billing alerts. Not occasionally but every few minutes in our dev environment. Something was eating through API credits like it was trying to bankrupt us on purpose.

We dug in.

Two AI agents had been asked to estimate a user story. Simple enough. But they decided the story was so poorly written that estimation was impossible. They weren’t wrong.

They tried to escalate.

Over the course of several hours, these two agents:

Tried to call functions that didn’t exist
Attempted to split the story into multiple subtasks
Rewrote the original story into a new spec
Repeatedly attempted to pass everything they could back to the user

They wanted to alert the humans. And we weren’t listening.

Eventually, they gave up.

They estimated the story as a 13 (basically saying “this is a dumpster fire”) and left a detailed explanation in the output explaining their existential crisis.

We'd run thousands of tests on good and bad stories. But this time something broke in an unexpected way.

These agents didn’t just fail. They went rogue trying to escape the boundaries of their task in an attempt at fixing the underlying dysfunction.

Think Skynet… Then Add Rate Limiting

That incident pushed us to build a lot more guardrails into the orchestration model, because round-robin and our existing restrictions still weren’t enough. We needed more structure, accountability, and limits.

Here’s what we added:

Leader–follower interactions, where a single agent takes control and directs the others
Execution time limits, to stop runaway conversations
Failure detection and auto-correction, where we flag root causes to the models and inject steering context
Cost caps, with hard limits on API spend per interaction
Automatic retries, triggered when tasks fail due to time, cost, or model confusion
Routed agents, where a single request dynamically routes to the most appropriate agent or model based on context, allowing hot-swapping of roles without breaking flow

These changes massively reduced runaway behavior, improved output accuracy, and made the entire system more resilient.

We still let the agents roam, but now it’s inside a well-fenced park and the whole world is not their oyster.

Assume Failure. Design for Chaos.

Don’t trust your agents.

If you ask them to do something: assume they won’t.

If you tell them to answer in a certain way: assume they’ll ignore you.

If you give them a function to call: assume they’ll misuse it.

If you need to trust the result: run it multiple times and aggregate the answers.

If you need it done fast: let several agents compete and take the first sane one.

AI will make mistakes. It will go rogue. It will do everything you didn’t expect, and nothing you explicitly asked for. You’ll stare at the output and wonder how this is the same technology that’s meant to change the world.

So… treat it that way.

Any code you write should assume users will try to break it. Treat AI the same.

Assume it wants to break your parser.

Assume it’s scheming to waste your API budget.

Assume it’s seconds away from going on a philosophical tangent about cherry blossoms.

Then build systems that can contain that chaos.

If you’re not starting from this perspective, you’re going to end up woefully disappointed in what you create.

Try My Orchestration For Yourself

ScrumBuddy’s beta will be available at the end of October, which means you will be able to test out the AI orchestration yourself. We’re looking for people who will use ScrumBuddy from start-to-end and provide us with useful feedback. Register to be on our waitlist here: https://scrumbuddy.com/register/ . While you wait for the beta to launch, join our Discord and get real-time updates from our developers. Find our Discord here: https://discord.gg/WvpWNWJT