The Next Insurance AI Advantage: Building Systems the Business Can Trust

— Photo taken at Gen AI Summit

Generative AI has moved beyond the novelty phase in insurance. The industry is no longer asking whether large language models can summarize documents, surface knowledge, or support service teams. The harder question now is whether those systems can be trusted once they begin operating inside regulated workflows, where a flawed answer is not just a technical defect but a compliance risk, a customer risk, or a decision-making risk. That pressure is becoming more visible across the sector. Capgemini identifies 2026 as the point when AI begins to drive serious value in insurance as carriers move beyond pilots and scale its use across underwriting, claims, and customer engagement.

Abhishek Kumar, AI Product Manager at New York Life and a Senior IEEE Member, works precisely at that fault line. His recent work has centered on building AI products that improve frontline productivity and reduce operational friction without weakening the controls that regulated institutions depend on. In his view, the future of insurance AI will not be determined by which firms gain access to the most powerful models. It will be determined by which firms build the discipline to deploy those models responsibly.

“Insurance has reached the stage where the real differentiator is not access to AI. It is the ability to govern how intelligence is retrieved, interpreted, and applied inside live business workflows,” Kumar says.

The Production Gap
The industry’s bottleneck has changed. For the past two years, much of the market conversation revolved around experimentation. Enterprises launched copilots, tested internal assistants, and explored retrieval-based use cases across service and operations. But experimentation is easier than institutionalization. Deloitte reported in early 2026 that only 25% of respondents had moved 40% or more of their AI pilots into production, underscoring how difficult it remains to convert promising trials into dependable operating systems.

Kumar’s work at New York Life speaks directly to that production gap. He led the product vision for a centralized retrieval-augmented generation platform designed to serve as a shared foundation for multiple business-unit-specific applications. Rather than treating each assistant as an isolated deployment, the platform established a reusable backbone for knowledge retrieval across underwriting, service, finance, marketing, field experience, and corporate strategy. That architecture reduced time to market for AI products by 75%, compressing deployment cycles from 12 months to 3 months. It also supported tools such as underwriting and service assistants that improved access to institutional knowledge for agents and representatives, contributing to annual operational savings of about $5 million.

“The hardest part is not proving that a GenAI use case can work once,” Kumar says. “It is building a foundation that lets multiple business units deploy it repeatedly, safely, and with enough control to earn trust in production.”
What makes that work significant is not simply the cost reduction. It is the operating logic behind it. In regulated insurance settings, retrieval is never just a convenience layer. The answer path itself has to be governed. Which source is being surfaced, how sensitive information is handled, how the model is constrained, and when a human remains accountable are all part of the product.

Trust by Design
For Kumar, trustworthy AI is not a communications principle. It is a systems principle. At New York Life, he established an Enterprise Guardrail Framework built around PII redaction and insurance regulatory requirements, turning governance into part of the system design rather than an afterthought added after deployment. That matters because in insurance, answer quality alone is not enough. A response can sound fluent and still be operationally unsafe. What matters is whether the model has been constrained to behave within the boundaries of a regulated business.

“Retrieval quality is only one part of the challenge,” Kumar says. “In insurance, you also need control over what the system can access, what it can say, and where human accountability has to remain intact.”

That emphasis on trustworthy behavior has shaped his broader public and technical work as well. In January 2026, he appeared as a panelist speaker at the Generative AI Summit in Washington, D.C., joining a panel titled "Human Oversight and LLMs: Ensuring Trustworthy Behavior" alongside other distinguished panellists such as VPs from New York Life, leaders from Wells Fargo), and Product Data Science Leader from J.D. Power). The discussion examined how human oversight plays a critical role in ensuring large language models behave safely, ethically, and reliably, with the panel exploring emerging frameworks for evaluation, red teaming, and governance designed to mitigate risk and bias in real-world deployments. The conversation also highlighted practical strategies for maintaining accountability and trust as LLMs become more deeply integrated into enterprise and public-sector systems. The connection between those conversations and his day-to-day product work is direct. As insurers expand AI into sensitive workflows, oversight is no longer a procedural concern handled on the side. It is part of what makes the product deployable.

The regulatory climate is moving in the same direction. In December 2025, the NAIC stated that its 2023 model bulletin requires insurers to implement written AI governance programs emphasizing transparency, fairness, and risk management, and that over half of all states had already adopted that bulletin or similar guidance. That shift matters because it confirms that trust is no longer a vague aspiration for insurers. It is becoming part of the operating environment in which AI products must function.

From Tool to Workflow
Kumar’s earlier work outside insurance reinforces the same principle from another regulated angle. In one initiative for a global bank, he led the roadmap for a customer-facing conversational AI solution designed for a credit card service center. The system reduced call volumes by 30% and generated roughly $5 million in annual savings. On the surface, that sounds like a straightforward service-efficiency story. In practice, the harder work sat underneath the user experience: establishing enough reliability, explainability, and behavioral control for the tool to be trusted in a customer-facing, regulated environment.

“AI creates durable value only when it fits the workflow around it,” Kumar says. “If the system cannot support human judgment, escalation, and accountability at the point of use, it remains a tool demonstration rather than an operational capability.”

He also developed frameworks for testing retrieval-based assistants for hallucination, bias, and drift, reinforcing a pattern visible across his work: value comes when AI systems are disciplined enough to survive production, not when they merely perform well in demonstrations. That distinction is what separates tool deployment from workflow transformation. Enterprises often talk about AI as an enhancement layer, something added on top of an existing process. But the institutions that create durable value are usually doing something more difficult. They are redesigning the workflow itself so that retrieval quality, escalation logic, human judgment, and operational control work together.

Kumar’s acceptance as a Technical Program Committee member for BigData 2026 reflects the broader relevance of that approach. His work sits at a point where product strategy, model behavior, enterprise data architecture, and institutional accountability increasingly overlap. That overlap is where much of the enterprise AI market is now being tested.

The Next Control Layer
The next stage of AI adoption in insurance will push beyond systems that read and recommend. It will move toward systems that can trigger actions, write back into workflows, and coordinate activity across business processes. Kumar is already working in that direction, exploring how AI agents can evolve from knowledge assistants into more operational tools without losing the safeguards that made the first generation deployable.

That transition will make governance more central, not less. Once AI stops merely surfacing information and begins influencing or executing work, oversight becomes inseparable from product design.

“The next chapter is not just about better answers,” Kumar says. “It is about controlled action. Once AI starts influencing or executing work, oversight is no longer an external review step. It becomes part of the product itself.”

That is why trust is the right lens for this moment in insurance AI. Models will improve. Vendors will multiply. New agents will enter the enterprise. But the firms that matter in this market will be the ones that can make intelligence usable inside the discipline of a regulated business. In insurance, that is what separates experimentation from infrastructure.