I was thinking about making a simple review consolidation service like gatherup.com, endorsal.io, or repuso.com, where you provide links to your Yelp page, Facebook Reviews page, Trustpilot page, etc. up to like 100 sources, and it pulls all your reviews and displays them in a centralized place.
I was trying to determine what the easiest way to do this would be. Since these sites are just asking for links to the review page, not having you actually sign in to an integration of some sort, I assume they're just scraping the reviews from the page?
So my solution would need to have a CRON job of some sort fire once a day to fetch new reviews. At that time, for each of my customers, for each of their review sources (Google, FB, Yelp..), I send out a web scraper (Puppeteer?), and that has to find each review on page as well as progress through each page if more than one, and save it back to my database.
This seems doable but it also seems pretty complex and easy to break, like if Trustpilot changes an element on page it could break my scraper, or my scraper could get blocked easily, or a review could be easily missed. Am I thinking about this correctly or is there a more simple obvious route to go?
All review consolidation products achive this by scrapping. There isn't any better way to do this...
Instead, to maintain consistency you should scrap a particular page every day and compare this will predefined output..
You should do it just for 1 review. If the outcome don't match, then fire an email to yourself. This helps to keep an eye on changes...
This is the technic to get the data, however, pay attention to duplicate content penalty with SEO. Resharing unedited content without canonical links can get you in trouble.
It's going to be a risky business. Scaling it going to be another issue. I don't believe they'll do something about it as long as you don't hurt their business.
About scraping, first I'd look for their API, next inspect their private APIs, if they don't work out use the headless solution.
It's doable as others already did. Things break, your test cases for each review site will alert you when that happens.
Thanks for the info! Yeah I figured there may be some ToS issues too so thats yet another issue, I'll dig into the APIs to see if I can leverage that but I'm thinking this turned out to be more trouble than it's worth.
I don't have any accounts on these review sites so I'm not familiar with their logic but after a quick look I noticed some of them have "Save" buttons on business profiles. Potentially allowing regular users to follow those business' feed/reviews. If that's the case, that could be another way to fetch data.
I can't say anything about the trouble/worth. So good luck with it.