Ideas and Validation February 18, 2020

Crawling a website or use a API?


Hi all,

I'm in this stage where I thinking of using either a crawling approach or API approach to collect current sales from online stores to showcase to shoppers.

Do you think I should use an API (because it may be legally sensible)?

My goal is a search engine that showcases the latest sales based in your country.

What're your thoughts?

Thank you.

  1. 2

    What are the drawbacks of api ? Is it paid ? Very limited number of calls ?

    For scraping - is the information that you want to get public (e.g. everyone can see it) or is it behind login ? If it is behind login and you want to scrape it and provide to your users - it can be illegal. I don't recall any site that displays sales data, besides ebay, that has all the data in a public view.

  2. 2

    Scraping can be a pain, expensive (you often need proxies) and way more unstable.
    If they have an API, just use that, if you really need to go for scraping I suggest you to take a look at Apify.

  3. 1

    API for sure. Avoid scraping (issue with IP, bot detection....)

  4. 1

    Definitely you should use API first. You won't have to spend time scraping and cleaning data, everything will be provided to you in a nice, structured format. You should take a look at the website's terms of service to see how can you use the data provided by them.

    If data provided by API are not enough you could start scraping, but scraping is a quite grey area. You should try to be a good website citizen and respect robots.txt, don't spam them with requests, etc.