110
55 Comments

Congrats! Web scraping is legal! (US precedent)

Disputes about whether web scraping is legal have been going on for a long time. And now, a couple of months ago, the scandalous case of web scraping between hiQ v. LinkedIn was completed.

You can read about the progress of the case here: https://parsers.me/web-scraping-or-parsing-internet-resources-what-is-it-and-is-it-legal/

Finally, the court concludes: "Giving companies like LinkedIn the freedom to decide who can collect and use data – data that companies do not own, that is publicly available to everyone, and that these companies themselves collect and use – creates a risk of information monopolies that will violate the public interest”.

  1. 1

    Great....🔥🔥🔥

  2. 4

    Counterpoint: hiQ is scraping LinkedIn's public data so they can determine whether someone appears to be job hunting and snitch to their employer. That is creepy and invasive, and if I found out someone were doing that on my website I'd want to stop them too.

    1. 12

      But, it's PUBLIC data!

      If you hang your laundry outside (dirty or otherwise) for ALL of your neighbors to see, then it's absolutely ridiculous to tell them that they can't look at it (or take pictures of it, or whatever).

      In this case, if you don't want anyone to know you're job hunting, then don't submit any info to any websites or services that are going to broadcast it publicly, right?

      1. 4

        So? If I hang my laundry outside, I actually don't think it's unreasonable for me to ask neighbors to not take pictures of it. It's certainly not unreasonable to tell someone they can't go around taking pictures of everyone's laundry and then selling them to their bosses.

        What hiQ is doing here is slimy. So why is the response that if we don't like it we should retreat from public life, rather than figuring out how to prevent hiQ from doing the slimy thing?

        1. 2

          What hiQ is doing with the data is slimy, for sure. We can work on doing things to get them to stop that (in fact, any number of disincentives from regulations to public shaming). But, what they're doing with the data is a different thing than how they are gaining accessing to it.

          As another example, let's take your gf/bf/significant other sunbathing on a public beach. Nothing is stopping some creep from taking pictures, other than the fear of looking like a creep and/or getting beat up. It's not illegal. And it's not unreasonable at all to hope no one does it and even ask people not to. But, if you really want to be protected, then you have to go to a private beach. Otherwise you are running the risk, whether you like it or not and however small, that someone will take those pics and may even sell them to others (like the paparazzi!).

          1. 3

            Sure, in real life if someone is determined to take creepy pictures of you in public it's hard to prevent them (although stalking is illegal). In this case, though, there was something to stop hiQ from scraping LinkedIn's data: LinkedIn took measures to prevent them from doing it. The whole reason for this lawsuit is that hiQ wants it to be illegal for LinkedIn to prevent them from scraping.

            I'd also like to point out that in the physical world, people realize that recording someone's public life is creepy and they mostly don't do it. So yes, technically it can happen but in practice it's a non–issue. But for some reason people in tech think the same sort of surveillance is fine if it involves technology. I can go to a public beach and be pretty confident that no one is taking pictures of me on the sneak, but if I go on the Internet I have to worry about people tracking me at every step. We should be trying to prevent that, but instead we're pushing to make it easier?

            1. 1

              Online surveillance sucks, too. But it's a slightly different issue than supplying a third party app or website with data that you know is going to be publicly available. Because most people aren't aware at the scope of data sharing, even if they signed up for it by agreeing to the terms. And sometimes the sharing even goes beyond those terms.

              I guess the beach analogy is incorrect insofar as most public beaches are publicly owned and free to access. Whereas an online service that is free is almost always privately owned and often the "cost" is your data (more people understand this now).

              I agree that we should be trying to prevent tracking, especially unauthorized tracking. GDPR is a great step forward.

              I disagree that we should have an expectation of data privacy when we willfully provide that data to an open public space.

              Edit: would you be surprised and/or upset if another service took your public comments here and published them somewhere else?

          2. 2

            uhm, in some parts of the free world, exactly this is not legal. you absolutely have to ask for permission first, before taking pictures of said person. every. single. time. no. exceptions.

            1. 3

              you have to ask everyone in public before you take a pic? only if you want to sell the pics, or no matter what? i'd be interested to know where that is, as well. can't be too many tourists visiting there, lol!

              1. 2

                it's a little trickier than that. you can absolutely make pictures of whatever you want, you DO have to ask permission if you want to show your pictures of people that can be recognized (this is where it gets tricky, since you can't guarantee that you won't show it to anybody; and no, autouploading snapped pictures to dropbox, google drive, whatever still count towards that; and no, not only faces count towards that too). I am not a lawyer, so I presume there are still many ifs and buts, but if you stand on the street snapping a pic of your girlfriend in front of the Oktoberfest, and some random stranger tells you to delete that foto you just made, you basically have to make sure that this person is not on the pic and if he is, you have to delete it.

                If the country has not dawned upon you yet, it's Germany. We Germans actually do care very much about our privacy and we'd like to keep it that way too.

                Although, the chances of someone demanding you to delete something in 2020, where most people are smombies and have their personal follow-drones hovering in the air, is marginal. At best. Walk 200meters to the Isar river, where people are sunbathing naked, people will not ask you delete it, they will enforce it.

          3. 1

            IANAL, but pretty sure taking the photos may not be illegal. Selling it without consent of the subject is. I'm pretty sure paparazzi has an exception due to the "noteworthy"-ness of the subject (similar claim to why Twitter allows content that violates its policy).

            1. 1

              "rules for thee... not for me", lol.

              i can see how selling might be illegal, but that opens up some interesting questions - like if i visit a foreign city and take a photo of some iconic building, then do i have to ask everyone i the frame for permission before i can print it as a postcard and sell it to others? this is a rhetorical argument, as we could be here all day...

      2. 1

        I think the more comparable example is if your neighbour looked at your clothes on the line and then told another neighbour something about you based on what they perceive about it.

        While not illegal or immoral, that person is giving their subjective opinion based on their opaque world-view about something public that was never intended for that purpose... and that may have consequences for the person who just wanted to dry their clothes.

        1. 1

          interpretation of the data is a different argument than how it was procured. and it's a whole another pandora's box. illegal, immoral, or not, it can definitely create problems (or GREAT comedy - see Curb Your Enthusiasm :) ).

          a lot of data interpretation/interpolation/extrapolation should be treated just like subjective opinions: taken with a HUGE grain of salt. that plus context and nuance should be applied while applying any value judgements

          1. 2

            I'm really in @jakelazaroff 's camp here. Your notion that people shouldn't use LinkedIn if they're not job seeking is flawed as LI is a "networking" site, and job seeking is one of the options available to you.

            As well, in the privacy settings of LI a registered user can hide almost all of their profile from public view to unregistered users. Therefore to access this information one has to register and login AND THEN they are bound by LI's User Agreement, Privacy Policy, and Professional Community Policies, which of course prohibits the slimy thing that hiQ is doing.
            https://www.linkedin.com/legal/user-agreement
            https://www.linkedin.com/legal/privacy-policy
            https://www.linkedin.com/help/linkedin/answer/34593?trk=microsites-frontend_legal_privacy-policy&lang=en

            I completely understand that people CAN access my information, but to do it they're also entering a contract and are bound by its terms.

            (I'm not against web scraping, but in this particular instance I just can't agree with your position.)

            1. 1
              1. I never advocated that "people shouldn't use LinkedIn if they're not job seeking". I use it for many things other than job seeking myself!
              2. I also never advocated for the sharing of data that's only available to registered users or any other data behind authentication.

              Did you reply to the correct comment?

              Edit: also, those agreements are only binding to those who have registered an account. it a web page is public (as opposed to private/behind a login), anyone can read it and they're not bound to those conditions

              1. 1
                1. "In this case, if you don't want anyone to know you're job hunting, then don't submit any info to any websites or services that are going to broadcast it publicly, right?"

                Um, that is kind-of what you said...

                1. You got me here as you were very explicit about the data being public and I read into it that you meant anything that is posted on LinkedIn, regardless of user status and privacy settings.

                LinkedIn could end this whole debacle by flipping to privacy first and making public profiles optional in the privacy settings - notifying the users of the change. Maybe this would hurt their network effect, but it would also hurt potential competitors or leeches. (I'm not even a big fan of LI, but I'm less a fan of what hiQ is doing.) hiQ's argument that their business model depends on this data is pathetic.

                1. 1

                  You can still use LinkedIn and not submit data publicly that will infer to others that you're job hunting. I do this.

                  I'm not a fan of a lot of the stuff that's going on around data privacy either. I wish it was more locked down. My stance thought is that if it's public, we can't expect it to be limited to just that site. As you said, LI could chose to lock it down. I'm not familiar at all with hiQ, but it sounds stupid if they based their entire business model on LI deciding not to do that, lol.

                  1. 1

                    "HiQ pointed out that its business model is based on access to publicly available data of people who have chosen to share this information on LinkedIn, and if it is deprived of this data source, HiQ will not be able to fulfill its contractual obligations, including contracts with large clients, and its business will be irreparably damaged."

                    This is from the Parsers article in the link from the original post, presumably paraphrased from the court ruling, but I haven't seen it directly. The audacity! LOL

    2. 1

      But now california law says tha you can’t! I m all for scraping but this enforcement action is crazy government interference

    3. 0

      I have really bad news for you...

  3. 3

    That's amazing news. It opens so many doors and opportunities!

  4. 2

    I'm really curious about this! Hopefully someone has more info and knowledge than I have on this topic:

    Is it legal to scrape house listing websites? Every such website state it's not allowed to collect data from their sites "automatically", but what can happen if you decide to do it anyway? Does this Linkedin case change anything for other industries?

    Like, the data is public and free to anyone to access, so in that case, according to the linked article, it should be OK. But in the terms of use, almost all websites say it's not allowed - which is understandable as much of their value lyes in the data they collect, which they want to protect

  5. 2

    Hey so this is a summary of the 9th Circuit Court ruling from September right? My most recent understanding is that LinkedIn is considering taking this up to the Supreme court, which I could easily see them taking up and making a decision on.

  6. 2

    Wow. Just wow. It's gold rush time. Again.

  7. 2

    I didn't have time to read through all of this but it seems like it's only applicable for public-facing data?

    1. 3

      Haha, how would you be able to access private/secure data without explicit permission mate?

      1. 4

        I'm thinking in terms of being a member of the website. Like, there are certain things you may have access to after signing in vs. being a lurker.

        Unless I'm misinterpreting the concept of "public data", which may very well be the case here.

        1. 3

          After signing in, you've implicitly or explicitly in some cases, agreed to their terms of service, which, I'm sure, prohibit scraping.

          1. 1

            That's what I figured as well.

        2. 3

          Ooh, makes sense. I completely ruled this out from "private/secure" data since you can make your scraper login for you too.

  8. 1

    I use this amazing web scrapping tool that helps me schedule 10+ meetings per week via Linkedin. It's called LinkedHelper. Check out my post about it...

    https://www.indiehackers.com/post/scheduled-36-meetings-in-14-days-using-linkedhelper-7748b6fad4

    Co-founder @ scribbl.co

  9. 1

    Does this now also apply to scraping from Instagram?

  10. 1

    Great outcome, we live in free markets, if there's demand, there'll be supply!

  11. 1

    Ben Thompson recently wrote a pretty good piece on this and referenced the Hi-Q case as well -- worth a read if you're a member. If you're not, worth signing up because his writing is top-notch.

    He focused as much on Clearview AI as Hi-Q.

    Clearview’s claims that it uses “a facial recognition algorithm that was derived from academic papers” that is “a ‘state-of-the-art’ neural net”

    The interesting point here is that (as this thread indicates), there are relatively few objections to scraping when used on sites like LinkedIn... but when that web scraping then feeds into algorithms that power facial recognition used in security systems installed in Xinjiang, that tends to arouse a different response.

    https://stratechery.com/2020/clearview-ai-the-problem-with-scraping-tradeoffs/

  12. 1

    Hooray! My site scrapes data from quite a number of sources...

  13. 1

    In the past the CFAA has been wielded haphazardly and even maliciously.¹ These cases are not over but hopefully we’re shifting towards a more sensible approach that protects the open web and those who use it.

  14. 1

    This is fantastic news for indie hackers 🙌🤗

  15. 1

    Awesome! Especially for lead generation. I think it's the HOW you use that information which needs to be monitored closely.

    What tools are you guys using for scraping???

    1. 1

      We have created a tool at www.leadsfury.com to generate business leads from Instagram, from followers of profiles to users who post in hashtags and more.

    2. 1

      This comment was deleted 4 years ago.

  16. 1

    This is awesome! I just web scraped last weekend.

  17. 1

    Does this mean their API is fully opened up to pull company and user profile information?

    1. 2

      If Linkedin would open their API public that would probably be a great day. I am using services like this one https://proxycrawl.com/scraping-api-avoid-captchas-blocks to get access to public profiles and companies, it costs a sum, I found no other way around the nightmares of Linkedin join walls, I do not even know how services like proxycrawl can do it.

      1. 1

        Well, this would also lead to the end of Linkedin, I believe. Especially for the B2B nature of it. Linkedin is already transitioning from a business network to a job/job hunter network, and something like an unregulated public facing API would be the final nail in their coffin.

        Maybe its just me, but I can think of a gazillion ways on how to monetize a public Linkedin API, and most of them not to the benefit of true Linkedin users... How would you feel like your Linkedin feed to look like your Facebook feed today, most probably 99% ads and unrelevant, retargeted shit...

      2. 1

        Is it only possible with the Professional package? It's pretty expensive for small projects and IHers

      3. 1

        This comment was deleted 4 years ago.

    2. 2

      Probably not. I think it just means that if you can get to a web page, you're allowed to scrape it. They aren't allowed to say that you can't scrape their public data.

      1. 5

        They will probably still make it difficult for you to scrape too. It just means they can't sue you, which is nice to know!

  18. 1

    Great news, thank you for the update!

  19. 1

    Thanks for posting! I'm outside the U.S. and mightn't have seen this.

  20. 2

    This comment was deleted 2 years ago.

  21. 1

    This comment was deleted 3 years ago.

  22. 2

    This comment was deleted 2 years ago.

    1. 1

      Ha! Love your user name. You'll never have to explain what you do! Also, I checked out your site. Looks solid. If I understand correctly it only works on Chrome desktop?

      1. 1

        This comment was deleted 2 years ago.

Trending on Indie Hackers
How I grew a side project to 100k Unique Visitors in 7 days with 0 audience 49 comments Competing with Product Hunt: a month later 33 comments Why do you hate marketing? 29 comments My Top 20 Free Tools That I Use Everyday as an Indie Hacker 18 comments $15k revenues in <4 months as a solopreneur 14 comments Use Your Product 13 comments