I'm a developer based in the UK and am considering a project that would include tools for scraping data from other websites. Some of this data may available in a structured form (e.g. Schema.org Microdata / JSON-LD) - other times it will mean parsing HTML / or possibly even OCR. It would always be user initiated - e.g. a user logs in, enters a URL and the scraping process runs.
The pages being scraped would all be public - nothing behind a paywall / nothing that requires a login to access. The sites being scraped could be owned by /companies hosted in countries around the world. I certainly won't be targeting the scraping of any personal information - the data that I'm intending to scrape and store would be more along the lines of details of products / services / sets of instructions - but of course as is the nature of scraping, there is a small opportunity I'd accidentally pick up personal details (e.g. somebody's name). The data being scraped would only be shown to the person that requested the scrape.
Does anyone have any experience with this - and where I'd stand from a legal point of view?
I know some websites have some sort of "don't scrape our content" in their Terms. Whilst initially I don't have any plans for monetisation for my service, I'd expect to introduce some sort revenue stream in the future - possibly making the scraping service paid-for.
I don't want to pursue the idea too hard if it's going to end up in legal issues.