I'm planning to start my big data consultancy and trying to come up with an initial product that suits my skills.
I've been working with big data frameworks for a few years, especially Apache Spark, (an analytics engine) with which I have a lot of experience.
My idea is a data processing / analytics service available to companies with a lot of data that they can't process themselves. It would include ETL reporting and live streaming (both can be done using Spark).
Scenario 1: A market research company prints impressions to a log file and accumulate 10TB of new data every day. They want to analyse this data and receive a report of the results on a daily basis but they don't have the hardware or the know-how. They hire me and give me access to their S3 bucket so can I create the spark job and generate the report every day using my own AWS Spark cluster. The company has no idea and don't care how I generate the report.
Scenario 2: A company needs a live view of an aggregated measure based on new data inserted into a Mongo database in the last hour. They give me access to the database, I write the streaming job and run it on my standalone Spark cluster and expose the results via a REST endpoint. Their front end guys can then generate fancy graphs from this aggregated data.
So basically I would only handle the data processing part based on their needs. I wouldn't provide insights or in depth analytics they didn't ask for, wouldn't provide charts or any kind of front end either. Just crunch the data for them using either my own hardware or AWS.
So is this too niche or does it sound workable?