We stopped pointing AI agents at legacy codebases without a history-aware pass first. Here's the bug it would've shipped.

A few months back, we almost shipped a "fix" that would have quietly broken a client's payment reconciliation for a partner integration nobody on the current team remembered existed.

Here's what happened. We pointed a coding agent at a legacy module during a modernization pass. The function looked like dead weight, a redundant-seeming conditional wrapping an otherwise clean payment call. Standard AI code review flagged it as simplifiable. On paper, removing it was a safe, obvious cleanup.

It wasn't.

That condition existed because of a vendor API quirk from a partner integration set up three years earlier, documented nowhere except a single PR comment thread that the current-state code review never looks at. Diff-only tooling reads what's there today. It doesn't read why it's there.

That near-miss changed how we approach legacy AI modernization now. Before an agent touches an old codebase, we run a history-aware pass first:

Full commit history
PR discussion threads
High fan-in files (lots of dependents)

All of this gets weighted and mapped before a single line gets touched. It's a slower first step, but it means the agent understands intent, not just current syntax.

The Numbers Back This Up

The data proves this isn't just our unique experience:

Teams using repository-aware tooling catch 40–60% more cross-file issues than diff-only review.
A bank trial (ANZ) reported a 42%+ drop in task completion time using the same approach.

The gains concentrate almost entirely in codebases with real history; greenfield code barely benefits, because there's no history to mine.

⚠️ The Lesson for 2026: If you're building or buying AI dev tooling, ask specifically whether the tool reasons over commit history and PR context, or just the current snapshot. That one question separates tools that look identical on a demo repo from tools that behave very differently on a 10-year-old one. We learned this the expensive way: internally, before it hit a client's production reconciliation flow.

We've since built this into how we approach every legacy AI modernization engagement, happy to go into more detail on the dependency-mapping step if anyone's evaluating similar tooling for their own stack.

Curious if others here have hit a similar near-miss with AI code review on an old codebase, what caught it before it shipped?

Tags: #ai #softwareengineering #legacycode #git #devops