Hi Guys,
I want to build an application something like every week getting the special offers from supermarkets site and suggest users, list of stores for shopping at cheaper prices.
Unfortunately, most of the supermarkets have no API's. Can I web scrape their content or manually get the content from their sites and use them in my application? Is it legal?
AFAIK it is considered legal as long as you are scanning publicly available data: https://towardsdatascience.com/web-scraping-is-now-legal-6bf0e5730a78
Hey @anton_ogarkov, Thanks for the link
If web scraping was illegal Google would not exist.
TIL that google is doing web scrapping to get the data.
If i add my site to google, that mean I opt-in to get scrapped by google right?
Google offers an opt-out.
The opt-out is pretty useless against scraping in general, right? Should you go through all existing web scraping services and opt-out? That's an impossible task.
It's the opposite. All scrapping systems should offer an opt-out, just like Google and all major scapers does. That's the point.
What do you mean it's the opposite? My point is that it should be opt-in, not opt-out. Because opt-out assumes you already know about the service. You can't find all scraping services in the world and go to each of them and ask for an opt-out.
Opt-in assumes you already know the service as well. So it's pretty simple. Opt-out through robots.txt or any other similar simple solution. We don't need to reinvent the wheel, we just need to use what is already available. The issue is that most of the small companies doesn't offer an opt-out, and won't read robots.txt either. Which, from my point of view, is not illegal for public content, it's just "not ok", so my suggestion is that small companies starts using robots.txt.
I think you are missing the point of privacy in general. All consent should be explicit, not implicit.
Yes, I indeed want to know the services that I opt-in for. Would you like your photo being used on dodgy websites or for commercial purposes jus because you never heard of them and nevet opted out? Or would you prefer it to never be used unless you specifically give your consent?
Robots.txt is good and a good start to limit scraping of your content, but it's not legally binding in any way and crawlers/scrapers can still do whatever they want with the data, regardig what robots.txt says.
Now you understood my point :) Since there is no legal binding, and, since if someone will use your pics on dodgy websites, they don't even need a scraper, we got to the main subject, which is, there is no privacy for public content, and there is no law against scrapers either. With that in mind, something global and easy to setup would be a start, such as robots.txt. The main issue is who will control scrapers that doesn't consider the rules, whether its robots.txt or not. I honestly don't believe small scrapers companies would work according to any rule, so we get back to what i said, although it's legal, it's still "not ok", and we will need to live with that.
Even where scraping is legal, it’s a tough thing to build a business model on. At any point if the source decides to change whether and how they display their data, you may have to scramble to catch up. When they decide to block you, you will have to implement complex workarounds. And if their lawyers decide they want to get some billable hours in, you could spend a lot of money defending how legal it really is. Be careful going this route.
I co-founded a startup based on scraping short-term housing rental platforms (Airbnb, Homeaway, etc). We were acquired by another company that did the same thing. It's legal as long as the sites are public (don't require a user account to log in), you are not hitting the site so often as to adversely affect their servers, do not use a "click wrapper" (the site forces you to agree to the terms and services before you can see or do anything on the site) and you are not collecting and storing personal identifying data (name, email address, ss#, etc).
One caveat, though, constantly checking your scrapers to see if they're broken because of changes to the websites, will be a cost of doing business that you will need to account for. It will NOT be a "set and forget" business.
Can't emphasize this caveat enough. One site change and all your data can go out the window and you won't know until you start getting complaints. If I were you, I'd set up tests on a server that run throughout the day
@ThinkDigital Yes. Regular testing is critical.
Hey!
I found this post in the Legal, Taxes & Accounting group regarding exactly the same topic.
Have a look: https://www.indiehackers.com/post/legalities-of-web-scraping-616a12ac77
Thanks @meellbn, seems to be an same idea.
Maybe don't call yourself a scrapper, just an indexer like Google.
There are plenty of things built like that, but legality changes with locality and it's all a big spiders nest... you should seek professional answers... There is also the question of would they mind and want you not to do that...
Thanks for the advice @hatkyinc. I am in Ireland, my business idea is limited to the country, not sure with whom I can check with.
I think, have to search for a business advisor
I notice you said you’re based in Ireland - in that case it might not be as simple as others have said.
I asked a similar question a few days ago and was pointed towards the EU Copyright Reform: https://www.indiehackers.com/post/legalities-of-web-scraping-616a12ac77
I haven’t had time to fully look into it yet myself - but I think it looks like a no-go.
I mean a lawyer as the professional, that deals with web tech business I guess..
Be better to protect yourself under a limited liability company if doing things that are grey area even the slightest as in would be accepted by a court for proceedings..
This is probably the best article I read on the topic https://blog.apify.com/is-web-scraping-legal/
The legality and ethics of web scraping determine how you intend to use the data you scrape. Conducting your research and making a request before publishing any data is one way to avoid breaching copyright rules. Though it may appear to be a simple task, there are several essential factors to consider when upholding the law.
For more information about web scraping and its legality, visit.
proxycrawl.com/blog/web-scraping-the-comprehensive-guide-for-2021/
Just created an infographic related to the advantages and disadvantages of web scraping.
Hope it can be helpful:
https://miro.medium.com/max/1400/1*Az2Hmk5-fFhXXfQvRIL7Kw.jpeg
Yes its legal. Look into Linkedin vs hiQ
https://www.eff.org/cases/hiq-v-linkedin