5
5 Comments

Is it legal to scrape emails from websites or social media using a software / tool?

Is it legal to scrape emails from websites or social media using a software / tool? Based on the policies in your country e.g. US, UK, Canada, etc. or even in Europe. Please share with me some advice.

Do you know any references to policies regarding this?

If it is illegal, to what extent? Who is the offender? The owner of the tool OR the user of the tool? If not, please let me know your thoughts and reasons.

Thanks a lot everyone!

posted to Icon for group Ideas and Validation
Ideas and Validation
on September 11, 2021
  1. 5

    Hey there!

    I'm a lawyer that works with startups. It will be a lengthy response, because you asked to make some references to policies and whatnot.

    However, I will put a nasty disclaimer at the end just like any self-respecting lawyer would do, haha ;)
    .
    .
    .
    tl;dr -- GDPR related --
    It is not allowed unless you notified them of such prosessing and collected their eventual explicit consent to do so. The person responsible for processing of personal data is the person that (i) determined the purpose of such processing ("I [do something] with this data, because I want to...") and (ii) determined the means of such processing ("I chose [this tool] to process data")
    .
    .
    .
    .
    Now, regarding your question:

    I see you mentioned Europe as one of your potential areas of interest. I'm pretty sure you heard about GDPR.

    Now, it is crucial to understand who's the "data controller" and, in your case, the potential offender, and what "data processing" really means.

    "Data controller" means the natural or legal person, public authority, agency or other body which, alone or jointly with others, determines the purposes and means of the processing of personal data; where the purposes and means of such processing are determined by Union or Member State law, the controller or the specific criteria for its nomination may be provided for by Union or Member State law; - art. 4 of GDPR.

    "Processing" means any operation or set of operations which is performed on personal data or on sets of personal data, whether or not by automated means, such as collection, recording, organisation, structuring, storage, adaptation or alteration, retrieval, consultation, use, disclosure by transmission, dissemination or otherwise making available, alignment or combination, restriction, erasure or destruction; - art. 4 of GDPR.

    You can see that processing may refer to multiple activities performed on personal data:
    In case of collecting publicly available data, storing it in a database, organizing it in a somewhat structured form and then making it available to other recipients (including public or private recipients), processing will include, at least, the following: collection, organisation, structuring, storage, disclosure by transmission, dissemination or otherwise making available.

    Now, who's the data controller? Whether it's a natural person (an individual) or a legal entity (LLC, etc.), their status under GPDR is determined based on the following:

    If you determine the purpose and the means of personal data processing then you're the controller.

    So, you are the data controller, if you:

    (i) determine the purpose of data processing --> "Why do I scrape contact details? Because I want to do something with them later"; and
    (ii) determine the means of such processing --> "I collect data via this tool that I chose".

    It becomes more complicated if processing is performed by an employee on behalf of their company, and such processing is part of their job. In this case the company is the controller, because the company decided on (i) the purpose of the processing and (ii) the means with which the processing should be performed.

    However, if an employee processes data on their own behalf (not because the company asked them to do so - it's not part of their job duties), then such employee will be deemed data controller, as they decided to do so in their own interest. I'm pretty sure you've got this part of the problem.

    Now, what about the tool? Is the company or an individual behind the tool responsible for processing? In short - yes. But they are not deemed data controllers if they didn't determined the two aforementioned key points.

    Such company or individual, that developed the scraping tool, will be deemed data processor.

    "processor" means a natural or legal person, public authority, agency or other body which processes personal data on behalf of the controller; - art. 4 of GDPR.

    It means that such tools as Octoparse, for example, are deemed data processors, because they execute instructions that have been given to them by you.

    Again, remember what I said before? You determined the (i) purpose of processing, and (ii) the processing means (by choosing and using Octoparse).

    Yes, Octoparse, as the data processor, has a lot of obligations that are the same as the data controller has, but they are not responsible for what you will do with the acquired data later.

    Parsing tools have their Terms and Conditions, Acceptable Use Policies and other policies that explicitly state that you must not use such tools to acquire data illegally, must not use acquired data to commit crime, fraud or illegally use it and so on.
    .
    .
    .
    .
    So, it all boils down to your last question: who is responsible?

    Well, if the owner of the tool is the data controller that collects personal info --> then the owner is the data controller and is responsible for further use of data.

    If the natural person or a legal entity behind a tool is a mere data processor, they are only responsible for keeping the tool privacy-friendly. Like they should build it with the "privacy-by-design" approach. They cannot be held liable for further use of the information, as long as they were not aware of unacceptable use of their tool, of such information and whatnot. Here, I can write a lot about cases when a data processor is liable, but that's not needed now.
    .
    .
    .
    .
    The last question - is it okay to collect publicly available info?
    Well, the person made it publicly available so why should I care, right?

    Wrong.

    You shall always keep in mind that the data must be processed in accordance with the scope of why such data has been made publicly available.

    Think of it this way. If I provide Facebook with my email and do not hide it from my publicly available profile, then Facebook will make it publicly available on my page. Facebook is not responsible for disclosure by transmission, dissemination or otherwise making my email publicly available, as I has accepted their Terms of Use that state that I'm able to hide it or make it publicly appear on my profile page.

    However, I did not made my email publicly available to be included in a sort of a database that will later be sold or used for any other purpose, nor Facebook did. In short, there was no such intent or consent of mine that my data might be treated this way.

    It's a bit tricky and should be considered separately for every piece of personal data you intend to collect (process).

    Therefore, you shall always obtain prior consent from the person whose personal data you collect, store in a database and later put behind a paywall as a "lead database" or something. Even if you do not put a price tag on it, but merely create a product such as "Free list of 1000 cryptobloggers' contact details", they might sue you.

    Database is just an example. Your ultimate scope of use might be different, but also falls under the aforementioned rules.

    A good example is that in European Union and many countries outside of it, official authorities publicly disclose info about companies, including some personal info about their directors, etc.

    Not long ago a company that collected such info was fined with ~220,000 EUR. It has been also stated that if the company wanted to collect publicly available info and put it in a database, the company should have contacted each person individually to inform them about such processing, and collect their eventual consent to do so. Company said that this would have cost them ~8,000,000 EUR to comply with.

    Of course, they filed an appeal, but you can see that GDPR does not tolerate such use of publicly available data, including further use of scraped data.

    You will find similar regulations in the US, UK, Canada and other countries.

    Hope it wasn't too complicated or boring.

    Best of luck!
    .
    .
    .
    .
    The nasty disclaimer I warned you about :)

    The contents of this comment do not constitute legal advice, are not intended to be a substitute for legal advice and should not be relied upon as such. You should seek legal advice or other professional advice in relation to any particular matters you or your organisation may have.

    1. 1

      I wish I could meet you in person. I would give you a big hug. Thank you so much.

      In essence, just for clarity, people who use the tool which only extracts public data are responsible for how they use the data.

      I like that you mentioned that each person whose information is collected must give consent.

      Does this mean that the person who extracted the data can send one email to all contacts asking for their consent?

      If yes, this helps them get their YES and NO answers so they can be compliant.

      Or is this strategy of using one consent email permitted by law?

      1. 1

        For your use case, it might be enough to send a link explaining what data you want to collect and for what purposes, including how it will be used.

        Basically, a form that links to a privacy policy tailored to your use case.

        The form should have a field that allows to specify what email the person wants to provide you with and a checkbox that says "by providing your email you agree to...".

        Or

        The form might include a pre-filled email field where their email that you want to use is included. It will also contain the aforementioned checkbox.

        But, I would strongly advise you against using the second type of form, as it's difficult to prove that the person that you send the form to was the owner of the email that you intend to use.

        It's better to allow the person to specify the email. This way you will be safe as the person who introduced the email in the field is responsible for providing the correct data.

        Remember that in the era of the internet, you cannot control whether the person providing the email is indeed its owner.

        Use it to your advantage :)

        Ideally, you would enable a double opt-in (just Google "GDPR double opt-in").

        Anyway, you should link to a privacy policy explaining what you intend to do.

        Considering that I don't know your exact use case, I'm not able to provide any further guidance.

        Take care!

        The contents of this comment do not constitute legal advice, are not intended to be a substitute for legal advice and should not be relied upon as such. You should seek legal advice or other professional advice in relation to any particular matters you or your organisation may have.

        1. 1

          Thank you so much @Klience.

          Speaking of seeking legal advice, I would really love to connect with you on such basis so I can engage you for gainful opportunities.

          Please share your email to me privately. I have my personal email on my profile. Sadly, Twitter is still banned in my country so email or LinkedIn is our best bet. Cheers.

  2. 1

    Since web scraping is still a relatively new computer-related concept, in most countries the line between legitimate and evil use of this technique is still hard to define. Because of that, there have been a lot of lawsuits to battle its legality in recent years. Even till now, the line is still obscure.

    Although the law is not clear, there are still some regulations applied to unauthorized web scraping in some countries.

    In the US, there are major types of legal claims that website owners can use to avoid undesired web scraping. The legality of web scraping varies across countries. In most countries, the law enforcement specifically for web scraping is not clearly defined yet. In my opinion, web scraping is definitely not a crime as long as you are on the right track.

Trending on Indie Hackers
Your build-in-public audience is not your market. I learned the difference the slow way. User Avatar 194 comments I built a WhatsApp AI bot for doctors in Peru — launched 3 weeks ago, 0 paying customers, and stuck waiting for Meta to approve my app User Avatar 61 comments Built a "stocks as football cards" thing. 5 days in, my launch tweet got 7 views. What am I missing? User Avatar 33 comments From broke and burned out as a PM, to launching my SaaS and optimizing my health User Avatar 32 comments Why Claude Skills Are Becoming Important for Tech Careers User Avatar 24 comments I kept starting projects and dropping them. So I built a system that wouldn’t let me User Avatar 23 comments