4
4 Comments

Legalities of web scraping

Hello!

I'm a developer based in the UK and am considering a project that would include tools for scraping data from other websites. Some of this data may available in a structured form (e.g. Schema.org Microdata / JSON-LD) - other times it will mean parsing HTML / or possibly even OCR. It would always be user initiated - e.g. a user logs in, enters a URL and the scraping process runs.

The pages being scraped would all be public - nothing behind a paywall / nothing that requires a login to access. The sites being scraped could be owned by /companies hosted in countries around the world. I certainly won't be targeting the scraping of any personal information - the data that I'm intending to scrape and store would be more along the lines of details of products / services / sets of instructions - but of course as is the nature of scraping, there is a small opportunity I'd accidentally pick up personal details (e.g. somebody's name). The data being scraped would only be shown to the person that requested the scrape.

Does anyone have any experience with this - and where I'd stand from a legal point of view?

I know some websites have some sort of "don't scrape our content" in their Terms. Whilst initially I don't have any plans for monetisation for my service, I'd expect to introduce some sort revenue stream in the future - possibly making the scraping service paid-for.

I don't want to pursue the idea too hard if it's going to end up in legal issues.

Thanks.

  1. 2

    A draft of the copyright reform the European Union approved last year effectively made scarping illegal, or something very close. You may want to have a look at the final version of the approved reform to check whether such limitations to scraping are still there.

  2. 1

    Hey @gbuckingham89 have you got any final solution for your idea ? I also have a similar idea in Ireland based project

  3. 1

    mmm bummer, in the US I think web scraping is still legal afaik. Whole companies have been acquired based on scraping, see Connectifier and Linkedin Acquisition.

  4. 1

    This comment was deleted 3 years ago.

    1. 1

      Hey @anilkilic - thanks for taking time to reply!

      Sorry, I probably should have been clearer in my post (updated now) - I wouldn't be purposely scraping any personal information, so don't think GDPR itself would be too much of an issue.

      The data I'm looking to scrape and store could be described as details of products / services / sets of instructions etc.

Trending on Indie Hackers
After 10M+ Views, 13k+ Upvotes: The Reddit Strategy That Worked for Me! 42 comments Getting first 908 Paid Signups by Spending $353 ONLY. 24 comments 🔥Roast my one-man design agency website 21 comments I talked to 8 SaaS founders, these are the most common SaaS tools they use 19 comments What are your cold outreach conversion rates? Top 3 Metrics And Benchmarks To Track 19 comments Hero Section Copywriting Framework that Converts 3x 12 comments