Inside the Architecture of a Real-Time Plan Recommendation Engine in Healthcare: An Interview with Haricharan Shivram Suresh

The recommendation engine market sits at $9.15 billion in 2025 and is projected to reach $38.18 billion by 2030, growing at roughly 33% annually. Healthcare and life sciences are among the fastest-rising verticals inside that curve, expanding at a 19% CAGR as personalization moves from a retail nicety into a regulated workflow. The technical bar moves with it. A consumer recommendation can tolerate a few hundred milliseconds and a noisy ranking; a plan recommendation in health insurance has to land in under 100 milliseconds, draw on fresh user signals, and stand up to compliance review.

Haricharan Shivram Suresh, Principal Data Engineer at eHealth, Inc, has spent the last two years building inside that envelope. With fourteen years of experience in data engineering, machine learning systems, and ML/LLMOps, he led the design of a real-time plan recommendation engine that helps Medicare beneficiaries find a plan that actually fits their needs. His recent technical article for HackerNoon, Building a High-Scale Real-Time Recommendation Engine with Feature Stores and Redis Observability, walks through the architecture decisions behind that work.

We spoke with Haricharan about the engineering realities of moving a recommendation engine from prototype to production in one of the most regulated verticals in the United States.

Why do so many recommendation engine projects look great in a notebook and stall the moment they have to serve real users?

The notebook story is always cleaner. You have a static dataset, a single model, and an offline metric that tells you the ranking is good. Production has none of those luxuries. Features go stale, traffic spikes, the catalog of items shifts under you, and the user’s context changes between the moment they land on the page and the moment they click. You stop optimizing for offline accuracy and start optimizing for the worst tail of latency, the freshness of the features you are scoring against, and the cost of serving a recommendation that no one actually sees.

In a regulated industry the gap is wider. A retail bot can ship a slightly off recommendation and absorb the cost in returns. In health insurance, a misranked plan has downstream consequences for the beneficiary’s coverage. So the bar is not “is the model accurate offline.” It is “does the system serve a defensible recommendation, with auditable lineage, under sub-second latency, while the underlying eligibility and pricing data is changing in the background.” That bar is what separates a notebook from a production system.

Your team at eHealth deployed a real-time plan recommendation engine. What was the actual problem you were solving?

Choosing a Medicare plan is not a one-click decision. A beneficiary is looking at premiums, deductibles, drug coverage, provider networks, supplemental benefits, and a window of eligibility that may be measured in weeks. Multiply that by the dozens of plans available in a given ZIP code, and the cognitive load on the user is the actual barrier. The traditional answer is to throw a long list on a screen and add filters. The better answer is to rank the list by what matters to that specific person, and to do it fast enough that the experience feels like a conversation rather than a search.

The win condition was a recommendation that responds in real-time, reflects the user’s current context, and is grounded enough in the underlying plan data that compliance teams can stand behind it. If the engine takes a second too long, the user has already moved on. If it ranks based on stale eligibility, the recommendation is wrong before it lands.

Walk us through the architecture. Feature stores, Redis, KNN. What are the pieces actually doing?

The architecture splits cleanly into an offline layer and an online layer. The offline layer is where features get engineered, validated, and stored — user attributes, plan attributes, historical interactions, and derived signals. The online layer is what the serving path actually touches when a request comes in, and that is where Redis sits. Redis stores the features the model needs at scoring time, plus the embeddings used for similarity search, and it returns them in milliseconds.

On top of that, KNN similarity search is the candidate-generation step. Given a user’s vector, the engine pulls the nearest plans from the index, narrows the universe from many to a handful, and hands them to a ranking model. The ranker scores each candidate using the freshest features available, applies eligibility and business rules, and emits a final list. The whole path has to fit inside a sub-100ms budget, which is why every layer matters: a slow feature lookup or a poorly tuned index will eat your latency budget before the ranking model gets to do anything useful.

Observability tends to be the part teams underinvest in. What did you instrument, and why?

Observability is the difference between a recommendation engine you can trust in production and one you cannot. With a voice agent, a wrong answer is audible. With a recommender, a wrong answer looks like a slightly different ordering on a screen, and the user may never know. So you have to instrument what the model cannot tell you on its own.

We log feature distributions, candidate sets, ranking scores, final orderings, and the lineage from each recommendation back to the features that produced it. That gives the ML team early warning when a feature drifts, gives the compliance team a reconstructable trail when a recommendation is questioned, and gives the engineering team latency telemetry at every stage of the path. Redis-level observability matters in particular because the cache hit rate and key access patterns are leading indicators of whether you are about to miss your latency budget. You catch it in the metrics before the user feels it.

You also serve as a peer reviewer for IEEE Access. How does that work inform what you build?

Reviewing is one of the most underrated learning loops for a practitioner. When you evaluate submissions for IEEE Access, you read methodology sections at a level of detail you rarely apply to your own systems. You are looking for reproducibility, for benchmarking rigor, for whether the results actually support the conclusions. Bringing that lens back into your day job changes how you design experiments.

It also keeps you current. A lot of what eventually shows up in production systems appears first in conference papers and journals—sometimes a year or two before anyone turns it into a product. Reading that work early, and having to write careful feedback on it, means you are not learning about new retrieval techniques or evaluation methods from a vendor pitch six months after everyone else. You are seeing them at the source and thinking critically about whether they would hold up outside the paper’s controlled conditions.
Where are recommendation systems headed next in healthcare, and what still has to be solved?

The next wave is about depth and accountability at the same time. The infrastructure is increasingly available — feature stores, vector databases, real-time serving — and the ranking models are catching up. What is still being worked out is how these systems reason about regulated decisions without losing the explainability that compliance and the user both need.

In healthcare specifically, the unsolved problems are around longitudinal personalization that respects privacy boundaries, integration with eligibility and pricing data that is constantly changing, and explanations of why a plan was recommended that hold up to scrutiny from a beneficiary, a broker, and a regulator. The engine that works today can rank plans well. The engine we need next can do that and tell you, in plain language, why this plan and not that one.

Getting there is less about bigger models and more about better feature engineering, better observability, and a real discipline about what you are willing to automate. That is where I expect to spend the next several years.

You also judge the Business Intelligence Group’s Excellence in Customer Service Awards. What patterns are you seeing across submissions?

Judging for the Excellence in Customer Service Awards gives you a cross-industry view of what companies are actually doing with AI in customer operations. The submissions I review cover everything from retail to telecom to financial services, and the organizations that stand out are not the ones with the most ambitious AI pitches. They are the ones with the clearest measurement frameworks.

The winners tend to share a pattern. They can tell you exactly how long a workflow took before AI and after. They can tell you conversion lift, abandonment rates, and what percentage of interactions were fully automated versus assisted. They treat the system as a measurable surface, not a marketing asset. The submissions that fall short usually have impressive demos and no instrumentation. That split is instructive. It matches what I see in my own field: the teams that invest early in observability are the ones whose AI systems are still running, and still trusted, even years after launch.