Ideas and Validation February 14, 2020

Is this too niche for a data analytics startup?


I'm planning to start my big data consultancy and trying to come up with an initial product that suits my skills.

I've been working with big data frameworks for a few years, especially Apache Spark, (an analytics engine) with which I have a lot of experience.

My idea is a data processing / analytics service available to companies with a lot of data that they can't process themselves. It would include ETL reporting and live streaming (both can be done using Spark).

Scenario 1: A market research company prints impressions to a log file and accumulate 10TB of new data every day. They want to analyse this data and receive a report of the results on a daily basis but they don't have the hardware or the know-how. They hire me and give me access to their S3 bucket so can I create the spark job and generate the report every day using my own AWS Spark cluster. The company has no idea and don't care how I generate the report.

Scenario 2: A company needs a live view of an aggregated measure based on new data inserted into a Mongo database in the last hour. They give me access to the database, I write the streaming job and run it on my standalone Spark cluster and expose the results via a REST endpoint. Their front end guys can then generate fancy graphs from this aggregated data.

So basically I would only handle the data processing part based on their needs. I wouldn't provide insights or in depth analytics they didn't ask for, wouldn't provide charts or any kind of front end either. Just crunch the data for them using either my own hardware or AWS.

So is this too niche or does it sound workable?

  1. 1

    So you're not going in as a consultant and setting up their infrastructure, instead you offer an outsourcing solution that gives you recurring revenue. I'm thinking of the objections you have to get past - data security is probably the first one. Then for scenario2, I (target company) will be looking for a professional-looking SLA and I'd be asking what happens if your cluster goes down etc.
    Good luck!

    1. 1

      Data security is indeed an issue, at first I would try to find local clients so I can establish some trust via face to face meetings. As for reliability, I would most likely use AWS for mission critical non-replayable processing (eg. live data) and my own cluster for processing historical data, so if there's any hiccup I can just re-run the job.

  2. 1

    It's definitely an idea I've played around with myself as well and I don't think it's too small of a niche, especially since data engineers are in such a high demand. You'll just need to make sure that the stakeholders do a lot of upfront work themselves defining the requirements and whatnot since you likely won't have domain knowledge for every industry that the clients represent.

    1. 1

      To be honest, there wouldn't be any stakeholders, just me at first, maybe another cofounder at most.

      1. 1

        Apologies, I was referring to stakeholders on the client side.

        1. 1

          Ah my bad. Thanks for clarifying.

  3. 1

    If you just add the visualization layer, you have a consulting firm that could be selling to enterprises (ie very large dollars and very slow deals with high propensity to renew.)

    I have seen, up close, 6-figure contracts for situations very similar to Scenario 1.

    1. 1

      In the beginning I wouldn't have the time to take care of the visualization layer on top of the core offering beyond a few simple graphs and pie charts. Over time the service might evolve to offer that too.

    2. 1

      Seriously? That's wild. Spark isn't even that hard to learn.

      1. 1

        Easy to learn, hard to master. :)

  4. 1

    This comment was deleted 3 months ago.

    1. 1

      I agree, I would have to think about how to explain to non-technical people what they will gain from all this.

Recommended Posts