October 2, 2018

Automated web-scraping service

Hello everyone,

This is my first post. I'm a big fan of the podcast and sometimes go to the London meetup.

For the last year I've been developing an automated web-scraping service. It uses some visual AI techniques to mind meaningful data in webpages whenever there is a list/detail structure (e.g. a list of search result of amazon products).

Given a webpage URL it will return a table with 'columns' from both the list page and the detail pages.

I built this for a particular service and now I'm thinking of spinning it out into an API product. It does make the task of building content extraction workflows much easier. It's very similar to something like import.io, however I often find their automated mode is inadequate.

I'm trying to figure out whether it's a good enough idea to spend an extra month on. I have the difficult parts of the implementation done so it seems a shame not to make it available for other uses somehow.

It would definitely need to take on the status of a side-project however, I'm not able to put much sales umph behind it.


  1. 1

    Hey @RossR

    I used to use a service back in 2015 which was simply awesome but it acquired by another company and they killed it I believe. Unfortunately I can't remember the name of the tool.

    How does your product differ from the other web scrapers available in the market?

    Might be worth talking to and selling to web scrapers who offer their services on websites like Upwork and Freelancer.

    1. 1

      Thank you, speaking for people on Upwork is a great idea.

      Yeah there are quite a few similar services out there. I quite like ParseHub.

      What I haven't found is a framework or API that helps when you're scraping similar data from lots of different sources (e.g. articles, product listings)

      Diffbot is quite good if you're scraping a domain that they have trained on.

      1. 1

        It's been a long time since I've had to use scraping tools. I just remembered the tool I used to use three years back http://www.kimonolabs.com/ . Sadly they got acquired :/