May 21, 2019

IH Favor to Ask: willing to share in B2B SaaS app activity (no emails or identification is needed)? #ask-ih

Gen Furukawa @genfurukawa

hey Indie Hackers,

I’m in the unfortunate position of putting the cart before the horse (should have done more presales!).

I am currently at the point of building the machine learning algorithms to identify customers at risk of churning for B2B SaaS apps. however i don’t have a significant or relevant data set from which to build the initial predictive churn model.

Would any of you who run B2B SaaS apps be interested and willing to provide in app user activity data to build and refine the algorithms? No identifiable info is needed (ie email addresses), but really just historical in app activity in order to determine correlations of behavior and churn (or retention). or really whatever you feel comfortable sharing.

I have looked online for public data sets (like Kaggle) but there isn’t a lot for B2B SaaS data. suggestions of where there may be open data sets?

thanks in advance, I’m happy to answer any questions or just email me [email protected]

Gen

In return, i can offer a lot of genuine gratitude, and ideally some insights on which users are at risk of churn. many thanks.

  1. 1

    Just curious, what kind of ML algorithm are you implementing? Are you customizing each model per customer?

    1. 1

      Still TBD, but thus far random forest has been the most accurate with our initial data set, over decision trees and XGBoost.

      and at this stage, yes, we plan to modify the algorithm for each customer data set depending on the importance of various features. not scalable per se, but necessary in order to arrive at the most accurate predictions. do you have any thoughts or suggestions here?

      1. 1

        Sounds like a cool usage of algorithm. What kind of data are you applying the algorithms? User activity? Transactional data? I have asked around personally to companies in the past for my grad studies in ML, they were pretty open to sharing non sensitive data. If you offer your services, they might be more keen to try it out.

        1. 1

          Yes, to start it will be in app events (ie Mixpanel data). the goal is to layer in more data points (support interactions, nps/CSAT, email engagement, etc) to get a more holistic profile of the user. good to know that companies were open with sharing data when possible!