9
11 Comments

So... How do you guys feel about webscraping?

I am currently on a lookout for a project idea.

Many products from the market I am researching are using a lot of web scraping and "questionable" use of publicly available APIs (directly going against terms of service).

How would you feel about starting a business of that type? Would you do it, if product was very in demand?

It seems kind of shady to me and I would be scared my project would be shut down before even starting generating serious revenue.

And yet there are multiple projects, that are basing their functionality exactly on that - flight search systems, social media monitoring tools, any kind of "domain specific" search engine.

We had a loud case in Poland where very successful startup of that kind caught the attention of Facebook. And yet they are still functioning, are generating revenue and founder lives lavish lifestyle, posting YouTube videos of his travels...

So... What do you think? :)

  1. 4

    It’s legal in the U.S. and it’s a good way to solve the chicken & egg problem with some ideas. I’m all for it.

  2. 2

    Standard IANAL disclaimer. This advice is re: ethics of scraping.

    I think it depends on what you're doing with the information. If you're doing it to benefit people whose information is being scraped, or scraping large corporations' information to give to the community, I say it's fair game. The sibling comment mentioning Plaid is a good example.

    On the other hand, if you'd be scraping people's public LinkedIn profiles so you can snitch to their employer that you think they're job hunting… don't do it! (There's a case before the U.S. Supreme Court where the plaintiff is literally doing this exact thing).

    Whatever you do, think long and hard about ways your scraped information might be misused. And always allow people to opt out.

  3. 1

    I've built a BrickSeek competitor that scrapes inventory and pricing data from different retailers much faster than BrickSeek does. People on SlickDeals have showed interest, and I'd like to monetize it. But I'm also scared of it getting big enough and then being blocked.

    1. 1

      Hey man, interested in talking to you about an inventory checker similar to brickseek

  4. 1

    Gonna make a dangerous comment here, but if you get 0 users even if it is a crime, you have no witnesses ;)

    In all seriousness, just try it, see if it works and if you get traction, you can contact the company you are scraping from and they might get you access to that data for a price. They are very unlikely to stop you if you use proxies anyway so they should be okay to sell you access to their public data.

  5. 1

    The thing is, the terms of service only apply to users. A website can't legally prevent others from scraping its (publicly available) data no more than a company can prevent others from taking pictures of its billboards.

    They can try to prevent it from happening by blocking bot traffic, however. It's one of the greatest obstacles I had to overcome when I started this service.

    But web scraping is not illegal. And in my opinion, copying publicly available data from huge corporate websites for people to use is not unethical either.

  6. 1

    It's certainly an interesting space, and I've used such a service before, and then wrote a crawler myself, for a contract with a 'well known bookseller' here in the UK (against Google Shopping in fact, to get comparison prices).

    Being rather more contrarian though, and having also had to deal with contact-form spam and my weblogs filling up with daily drive-by attacks - I'm thinking about the other side of that. Can a service that detects and blocks bad-bot and crawlers be useful?

    So I've made a little exploratory project I call 'BotRegistry' - https://www.indiehackers.com/post/bot-or-not-2cb9ab12c0 to see if there was any interest from others.

  7. 1

    @mpodlasin I personally wouldn't go in that space again.

    Not sure if you're referring to Growbots (another successful Scrapping-based startup from Poland) but our previous startup was a competitor and we decided to shut it down after a year of playing cat and mouse with LinkedIn.

    You pretty much hit the nail on the head.

    1. Avoid starting a domain-specific search, putting all your eggs in one basket.
    2. Not all scrapping is legal. Login-gated scrapping is against the ToS of all the platforms that own data moats.
    3. Even if the content is public, most of these platforms have their dedicated anti-scrapping departments that you would need to play cat and mouse with.
    4. Successful tools DO get shut down very often. So, yes, it could happen to you, too.
    5. The ones that still operate have a few lawsuits or cease and desists requests pending.

    The only exception I'd contemplate is something like import.io that provides the tools for people to scrape, as it's not based on a specific website and it's mostly geared towards users in need of small-scale scrapping of publicly available sources (like, scrapping a table from an eCommerce website).

    ---
    That doesn't mean you can't be successful by doing exactly the opposite. SEO tools scrape tons of data from Google, Clearbit has managed to scrape LI and other platforms and it's still operating and making a ton of money. There are plenty of examples. At the end of the day it's up to you to decide if you want to play that game and what is your risk-sensitivity. For me, it was a no go.

  8. 1

    IANAL but it's interesting to note that this is exactly the technique used by Plaid when they started out.

    I.e., many banks did not have an official API, so they attempted to login to forms intended for people, and reverse engineered the connections that way.

    It's a classic case of it’s easier to ask forgiveness than to get permission, and while it worked out for them, your experience may vary.

    1. 1

      Not only Plaid, but every bank account aggregator in the world is doing this. Mint.com, Bankin' in Europe etc. The situation is starting to change with the PSD2 regulation in Europe, that forces banks to provide an API.

      Source: I worked for Fiduceo, a big bank account aggregator in France. We finally got acquired by a big French bank.

Trending on Indie Hackers
Yayy! Made my 2nd sale in one month 43 comments Help me positioning my SaaS product 24 comments Need feedback about the landing page 22 comments 🤯Blown Away, Everyday. 20 comments Productized service: Got my 1st client (€2500/m) with 100% upfront payment 17 comments Need Feedback About My Landing Page 14 comments