Building Resilient Ad Tech: An Interview with Ex-Amazon Engineer Jeet Mehta

The digital advertising industry lives and dies by precision. Every impression, every click, and every view is tied directly to revenue. Yet the infrastructure powering this ecosystem is fragile—downtime, cache failures, and outages can cost millions in lost opportunities. Jeet Mehta, a senior software engineer at Netflix and formerly at Amazon & A9.com, has spent his career designing large-scale distributed systems that keep these high-stakes workflows reliable.

As a judge for the Globee Awards for Innovation, Mehta evaluates cutting-edge technologies across industries. But his own contributions, particularly in ad tech, showcase just how much engineering discipline is required to secure revenue streams at scale.

We sat down with him to get his personal perspective on one of his most impactful projects: building a stateless and replayable billing system that transformed the economics and resilience of online advertising.

Jeet, thanks for joining us. Can you explain the problem you were tackling in ad tech systems?

Thanks for having me. At its core, ad tech is a transaction engine. When a bid is won and an ad is shown, the platform starts receiving signals—impressions, clicks, views—that need to be recorded accurately and billed. The issue was that we were dropping a small fraction of impressions and clicks because of cache downtimes or yearly outages. In advertising, that translates into huge revenue losses, not just for us but for sellers and partners relying on accurate accounting.

I wanted to solve two things: resilience, so we wouldn’t lose these events, and cost, because the existing infrastructure was burning around $20 million annually.

What was your solution?

We re-architected the system in two big ways. First, I designed an offline replay mechanism. Instead of relying entirely on online caches, we stored state in S3—up to 200 terabytes a day. If impressions or clicks got lost, we could replay them, essentially giving us 100% tolerance to outages. That replay pipeline recovered millions per year in lost revenue.

Second, we eliminated the dependency on expensive online caches altogether. By encoding transaction state into outgoing URLs and using the end client for state storage, we slashed infrastructure costs. Then, instead of maintaining big offline clusters, I designed a way to do streaming joins directly on S3. That gave us near real-time accuracy at a fraction of the cost.

That sounds like a major shift. What was the financial and technical impact?

Financially, it was huge. Recovering millions in lost impressions annually, and saving millions in infrastructure costs every year, is transformational for any platform.

Technically, it changed how we thought about resilience. Suddenly outages weren’t catastrophic—we could backfill safely, and developers could deploy with more confidence and agility. It also pushed the boundaries of what’s possible: to my knowledge, no other company has done streaming joins on S3 at that scale and cost point.

What were the biggest challenges in making this work?

The hardest part was rethinking state. In distributed systems, everyone assumes you need an online transactional store for real-time processing. Convincing stakeholders that we could externalize state to URLs and S3—and still meet latency, scale, and correctness requirements—was a challenge.

On the engineering side, the biggest hurdle was building the join mechanism. Imagine finding the one failed impression—the needle—inside billions of bid states—the haystack—while maintaining accuracy for billing. That took careful design and iteration.

How do you see this work shaping the industry?

This project was about more than one company. It’s part of a broader industry trend toward stateless architectures, cloud-native data processing, and resilience as a product feature, not an afterthought.

If you think about streaming platforms or real-time marketplaces, the lesson applies everywhere: outages are inevitable, but revenue loss doesn’t have to be. With the right design, you can build systems that recover automatically and even operate at lower cost.

And finally, what advice would you give to engineers entering this field?

Don’t just optimize for performance—optimize for resilience. Ask what happens when the system breaks, because it will. Some of the biggest wins in my career, like this project, came not from shaving milliseconds but from ensuring revenue and user trust even under failure conditions.

Also, never underestimate cost. At scale, every engineering decision has a dollar figure attached to it. Building something state-of-the-art is great, but building it to save millions of dollars a year while making millions more—that’s engineering as a business driver.

The story of Jeet Mehta’s work in ad tech is ultimately a story about rethinking assumptions. By challenging the reliance on traditional transactional stores and embracing stateless architectures, he not only recovered tens of millions in revenue but also set a new benchmark for how resilient, scalable systems can be built. His career reflects a broader truth in modern engineering: resilience, efficiency, and innovation are no longer separate goals—they are inseparable parts of building technology that powers the world at scale.