1
0 Comments

How to Map Dependencies in a Legacy Codebase Before You Touch Anything

I used to think the hardest part of a migration was writing the transformation logic. Clean the data, map the schemas, run the pipeline. Done.

Then I inherited a fintech codebase that had been running in production for nine years. Nobody had written the docs. Half the original team was gone. And every time I thought I understood what a module did, I'd find another system quietly depending on it three layers down.

The migration did not fail because of bad code. It failed because we did not know what we were actually touching.

That is the dependency problem. And it kills migrations faster than anything else.

Why This Keeps Going Wrong

Legacy system maintenance already consumes up to 80% of IT budgets in some sectors. The money is going into keeping things alive, not understanding them. And when a migration finally gets greenlit, teams move fast because everyone is tired of the old system.

The result is predictable. Modernization programs that attempt a single cutover from legacy to modern architecture almost universally encounter integration failures they did not anticipate, because the legacy system has behaviors that were never formally documented. Those behaviors only become visible when something downstream stops receiving them.

The fix is not better migration scripts. The fix is doing the dependency work first, before anything else moves.

What "Dependencies" Actually Means Here

In a modern codebase with decent documentation and tests, dependencies are mostly visible. You can read the import statements. You can trace the call graph. The system makes sense.

Legacy codebases are different. They often contain data that is inconsistent, undocumented, or encoded in formats that no longer make sense. A customer record touched by 15 different versions of the software over 20 years can have fields that mean different things depending on when they were last updated. Business logic leaks into the data layer. Patches from 2011 are load-bearing.

Dependency mapping in this context means finding four categories of connections:

Code-level dependencies. Which modules call which other modules. Which functions share state. Where the same data structure is read or written from multiple places.

Data dependencies. Which systems read from the same tables or files. Where a schema change in one place will silently break a query somewhere else.

Process dependencies. Batch jobs, scheduled tasks, ETL pipelines. These are often the most undocumented and the most dangerous.

Integration dependencies. Third-party APIs, webhook endpoints, external consumers, audit log destinations. In fintech especially, a single payment service can have tentacles reaching into reporting tools, compliance exports, and integrations that nobody bothered to write down.

Miss any of these and you will find them in production.

The Practical Mapping Process

Step 1: Static Analysis First

Before running anything, run a static analysis pass on the codebase. The goal is to build a dependency graph from source code itself, without executing it.

For legacy environments with COBOL, PL/I, Natural, RPG, or mainframe JCL, you need tooling built specifically for those languages. IN-COM's impact analysis tools can analyze call relationships, data usage, job execution paths, and control flow to identify upstream and downstream impact zones across languages and systems. The key is being able to initiate an analysis from a specific program, field, or database element and see everything affected by a change to it.

The output you are building toward is a visual, queryable map of what depends on what. Not a document. A model you can interrogate.

Step 2: Git History as a Dependency Signal

Source control history is one of the most underused tools for understanding legacy systems. The commit log tells you which files change together, which modules are touched in the same bug fixes, and which parts of the codebase get modified most frequently.

Files that change together are coupled, whether or not that coupling is visible in the code. A useful formula for prioritizing effort: Hotspot Score = Change Frequency x Cyclomatic Complexity. High-change, high-complexity modules are most likely to cause surprises during migration. Map those first.

Step 3: Runtime Observation

Static analysis tells you what could depend on what. Runtime observation tells you what actually does at production load.

  • Network traffic analysis to discover undocumented service-to-service calls
  • Log parsing to surface hidden integrations
  • Database query logs to find which applications read from which tables
  • API gateway logs to find external consumers

Automated discovery tools generate real-time dependency maps helping IT teams visualize relationships between applications, databases, and infrastructure. Run this long enough to catch the batch jobs, the monthly reconciliation runs, the quarterly reporting processes. In financial systems, there are always processes that run on unusual schedules. A dependency you only discover because it runs at quarter-end is not one you want to find during a cutover.

Step 4: Interviews and Archaeological Work

No tool finds everything. The implicit knowledge in the heads of long-tenure developers and the ops engineers who have kept the system alive for years is a real part of your dependency map.

Useful questions to ask:

  • What are the things you would be afraid to touch?
  • What breaks that nobody expects when it breaks?
  • What does this system do at month-end that it does not do the rest of the time?
  • Which external partners consume data from this system in ways we might not know about?

Document everything you find here alongside the technical analysis. Future teams will need both.

Step 5: Build the Risk Inventory

The output of all this work is a risk inventory: a prioritized list of every dependency found, ranked by migration risk and business impact.

Before any code changes, map every integration point, data dependency, and undocumented behavior. Every dependency you do not find in Phase 1 becomes a surprise in Phase 3.

The risk inventory drives your migration sequence. Start with isolated, low-dependency modules. High-risk, high-dependency components come later, when your team has already moved successfully through easier terrain.

Where AI Fits In (and Where It Does Not)

AI-assisted analysis is genuinely useful for dependency mapping at scale. Tools can map call graphs across thousands of files and surface technical debt in hours rather than weeks, compressing what used to take months of consultant time into days.

The limitation is the implicit knowledge problem. AI tools pattern-match against what is visible in the code. They lack exposure to proprietary enterprise codebases and do not understand internal frameworks or organization-specific patterns. One financial services company deployed an AI dependency tool across 30 repositories without proper validation and spent six months correcting hallucinated connections while real dependencies went missing.

Use AI to accelerate the mechanical parts. Do not use it to replace the validation work.

The Fintech-Specific Layer

If you are working in fintech, dependency mapping has a compliance dimension that enterprise software in other verticals does not.

Banks spend on average 4.7x more on compliance for legacy systems versus modern alternatives. Part of what you are mapping is not just technical dependencies but regulatory ones: which data flows are subject to audit requirements, which processes feed into regulatory reporting, which integrations have contractual requirements about change notification.

35% of cloud migration projects have failed to meet industry-specific compliance standards post-migration. In financial services, that is not a technical setback. It is a regulatory incident. Your dependency map needs to include the compliance team's risk view, not just the engineering team's.

IN-COM's dependency mapping tooling is particularly useful here when dealing with cross-language, cross-system codebases that span multiple environments and regulated data flows.

What Good Output Looks Like

A completed dependency map before migration should give you:

  • A visual graph of every component and its dependencies, queryable by system, language, or data element
  • A hotspot analysis identifying the highest-risk modules
  • A documented inventory of all external integrations, including ones not visible in the codebase
  • A compliance dependency map showing which flows have regulatory constraints
  • A sequence recommendation for which modules to migrate in which order

This is what turns a migration from a risky big-bang event into a predictable, phased process. The strangler fig pattern only works if you know which components can be isolated. The dependency map is how you find out.

The Real Cost of Skipping This Step

A 2024 Stack Overflow survey found over 80% of developers regularly work with legacy code. Most of them are touching systems they do not fully understand. That is not a skills problem. It is an information problem.

The dependency mapping phase feels slow. It delays the visible work. Stakeholders get impatient. The temptation to skip it is real.

But the cost shows up clearly in the numbers. Gartner research shows 83% of data migration projects either fail outright or exceed their budgets and timelines. The most common cause is not bad tooling or bad developers. It is undiscovered dependencies surfacing in the middle of a live migration.

Three weeks of dependency mapping before a migration starts is worth considerably more than three months of firefighting after it goes wrong.

The docs will tell you what the system was designed to do. The dependency map tells you what it actually does. That gap is where migrations go sideways. Close it first.


Have you gone through a dependency mapping process before a major migration? What worked, what was useless, and what did you miss until it was too late? Drop it in the comments.

posted to Icon for group Developers
Developers
on May 20, 2026
Trending on Indie Hackers
AI runs 70% of my distribution. The exact stack. User Avatar 180 comments I'm a solo founder. It took me 9 months and at least 3 stack rewrites to ship my SaaS. User Avatar 143 comments I used $30,983 of AI tokens last month in Claude code on $200/mo plan User Avatar 45 comments We could see our AI bill, but not explain it — so I built AiKey User Avatar 25 comments AI coding should not turn software development into a black box User Avatar 24 comments my reddit post got 600K+ views. here's exactly what i did User Avatar 19 comments