April 9, 2019

Do you do web scraping frequently?

Amie Chen @hyperyolo

Hi Indie Hackers,

I'm making a web scraping tool and really need to ask y'all for some inputs:

  • Does your company or yourself pay for a scraping service now (like import.io)?
  • What's the use case? What data you'd like to collect?
  • What's your current process like?
  • How often do you do it?

Thanks in advance for sharing!

  1. 2

    We use them at my job, but depending on what your using it for it can feel icky, we’ve used seamless and we’ve used skrapp before.
    I’ve never really used any others I can think of.

    1. 1

      Thanks for replying! skrapp seems interesting.

      So were you mostly interested in emails/contact lists? What kinda steps did you have to take to prepare the data for analyzing (assuming that's the purpose of scraping)?

      1. 1

        We look for specific companies or company types, and then certain job titles at that company.

  2. 1

    Hey Amie,
    doing something similar at the moment: https://www.indiehackers.com/forum/finally-launched-my-web-scraping-tool-scrapify-6dcbb9634c Looking for a co-founder right now.

  3. 1

    If you have few hours to learn a tool, you can learn iMacro, it's an extension (Chrome, firebase, io...) that is incredible powerful and intuitive

  4. 1

    We can say that I spend almost all my free time on the computer. At work, the order in the programs is monitored by the administrator. This is normal. But at home, I often have to sit for the PC too. Most of the cache is information from https://duckdice.io/lottery and data that uses Google. Since I play a lot and look for a lot of information, I have to clean it myself. Good programs help me with this. Did I answer your question?

  5. 1

    I have build a custom solution to scrap data from github and build metrics on it (link: http://opensourceanalyzer.herokuapp.com/ ).

    Process is when user insert a "not-scraped" github link, process start, do the job and display result, so it's used every time users need a new scrape.

    note: I plan to work more on it, basic stuff here

  6. 1

    I'm yet to find a great scraping solution that isn't outrageously expensive. Price is where I tend to start. Demonstrating becomes the next critical thing for me.

    I've found that when I need a scraper it's to automate a human task that I'd rather do with a scraper than a turk. For me it's never an ongoing operation given that the scraper fails once the xpaths of a page change. At that point once it breaks the value goes out the door.

    Basic scraping I'd rather do from the console directly.

    I avoid scraping with sessions and cookies because it's incredibly difficult.

    There once was a tool called kimono. IT WAS AMAZING! I was sad to see it go.

    Hope this feedback helps.

    1. 1

      Interesting! This is very helpful.

      I agree that due to the ever-changing nature of webpages, it's usually not an ongoing operation. What do you do with the data you scraped? Do you usually have to clean it in say google sheet before it's usable for business purpose?

      1. 1

        If I build it I tend to model the data in JSON while I digest. It's a bit silly to create the additional workload.

        At that point I drop it in mongo or similar db, wrap in API and go to town.