Guide to model training: Part 3 — Estimating your missing data

TLDR

Oftentimes when collecting consumer data, there are times when you’re unable to retrieve all the data. Instead of having a lack of data ruin your results, you’ll want to “guestimate” what the data should be.

Outline

Recap
Before we begin
What does impute mean?
3 ways to impute
Impute in Pandas
Next step

Recap

In the last section, we completed scaling categorical and numerical data so that all of our data is scaled properly. The higher ups want a list of past customers to target for our sales campaign, so we’re given new data that shows the history of how past customers interacted with our past 4 promotional emails.

<center> Image description Big sales are coming soon!</center>

Using the new data, our goal is to build a model for the remarketing campaign. There’s just one small problem. Code embedded in the marketing campaign email contained bugs, leaving us unable to identify what actions the people who clicked the email took. The bug occurs every 5 emails, but was patched by the 2nd wave of emails. In this section, we’ll go over imputing, a technique used to fill in unknown results.