Legal, Tax, and Accounting August 20, 2020

Legalities of web scraping

gbuckingham89

Hello!

I'm a developer based in the UK and am considering a project that would include tools for scraping data from other websites. Some of this data may available in a structured form (e.g. Schema.org Microdata / JSON-LD) - other times it will mean parsing HTML / or possibly even OCR. It would always be user initiated - e.g. a user logs in, enters a URL and the scraping process runs.

The pages being scraped would all be public - nothing behind a paywall / nothing that requires a login to access. The sites being scraped could be owned by /companies hosted in countries around the world. I certainly won't be targeting the scraping of any personal information - the data that I'm intending to scrape and store would be more along the lines of details of products / services / sets of instructions - but of course as is the nature of scraping, there is a small opportunity I'd accidentally pick up personal details (e.g. somebody's name). The data being scraped would only be shown to the person that requested the scrape.

Does anyone have any experience with this - and where I'd stand from a legal point of view?

I know some websites have some sort of "don't scrape our content" in their Terms. Whilst initially I don't have any plans for monetisation for my service, I'd expect to introduce some sort revenue stream in the future - possibly making the scraping service paid-for.

I don't want to pursue the idea too hard if it's going to end up in legal issues.

Thanks.

  1. 2

    A draft of the copyright reform the European Union approved last year effectively made scarping illegal, or something very close. You may want to have a look at the final version of the approved reform to check whether such limitations to scraping are still there.

  2. 1

    Hey @gbuckingham89 have you got any final solution for your idea ? I also have a similar idea in Ireland based project

  3. 1

    mmm bummer, in the US I think web scraping is still legal afaik. Whole companies have been acquired based on scraping, see Connectifier and Linkedin Acquisition.

  4. 1

    Hey, I'm just gonna ping @carlosw He also based in UK and have a company called Treasure Cloud storage with a privacy-first approach He seems to dig through these privacy concerns.

    It may seem irrelevant but we had a conversation about scraping e-mails, he told me that even sending an e-mail to a UK citizen without their consent could put me in some legal trouble. Seems like storing their data without their consent is on the same basket. He pointed me to ICU and GDPR. Let's hope he'll catch up. But I can tell, he'll suggest you to not do it. ;)

    1. 1

      Hey @anilkilic - thanks for taking time to reply!

      Sorry, I probably should have been clearer in my post (updated now) - I wouldn't be purposely scraping any personal information, so don't think GDPR itself would be too much of an issue.

      The data I'm looking to scrape and store could be described as details of products / services / sets of instructions etc.

Recommended Posts