13
27 Comments

What do you use your web scraper for?

Was curious to know what do you currently use your web scraper for and whether the data you scrape would make for a good info product.

  1. 4

    I have crawled/am crawling these:

    • Finnish news paper headlines - I made it for ML because I wanted to build something to recommend me articles I would find interesting. I didn't actually build anything but I am sharing the data since why not.
    • new chrome web store items - started doing this 2.5 years ago; I was collecting stats about popular categories, to discover truly new items (web store does not surface these), and to learn insights about the "bigger picture"
    • Since last month I'm collecting twitter lists. I wanted to find some myself but they are not easy to find on twitter even if you use advanced search
    • I used to collect tweets from trump and I did some sentiment analysis on them and stuff; it was out of curiosity mainly and I was playing with IBM watson services. I stopped doing this after I started collecting twitter lists.

    I use python; whatever api client (twitter), packages, etc. and deploy to gcloud because you can easily setup cron jobs on there.

    1. 1

      Thanks @neea seems to be a lot of people scrapping Twitter data.

      1. 1

        I think it is because you can easily crawl twitter. My latest crawler I didn't even create a legit app. I used incognito mode in the browser and picked up a guest token which allows making request without an app. I had to spend a while to get the rate limit/frequency correct but beyond that it is working. Twitter is much more open to this type of activity than some other platforms. Reddit is also very accessible. I want to do something with reddit as well.

  2. 3

    I use cheerio to collect data from the Shopify App Store. It turns out that it's possible to sell a simple spreadsheet as an info product.

    1. 1

      Thanks @LukaszWiktor your post was actually the inspiration for mine.

    2. 1

      Oh wow this is something new. Had no idea you can sell such dataset on gumroad. How is you experience with it? Does gumroad has any policy that prohibits selling scraped data?

      1. 3

        Gumroad prohibits selling private data, but publicly available information seems to be okay.

  3. 2

    There is an online grocery platform called 'Big Basket', during lock down due to extraordinary demand for online grocery they had introduced random slot system through which one can book grocery only when there is an available slot.

    I had this need gap - 'Alert me when slots in BigBasket is open' posted on my problem validation platform, for which I built a tool using Go port of Puppeteer to build 'BigBasket Slot Alert' to address that need gap.

    Open-Source, used by many to alert them when BigBasket slot is available.

  4. 2

    I have been working with some clients to scrape any arbitrary website and analyze keywords, images, is the site still active? kind of things.

    Also doing some eCommerce scraping on a large scale to build an API and make it easy to fetch data from any eCommerce website.

    1. 1

      What library/tech were you using to scrape the data? Did you got blocked? Proxy? Captcha?

      1. 1

        Python and Scrapy for pretty much everything. I use scraperapi for proxy which works relatively fine unless you need a good speed like <2secs. Most of my jobs are for client who needs a spreadsheet or scheduled cron job so works fine for me.

  5. 1

    I use Crawlera to scape Amazon delivery notice pages. For packages delivered by "Amazon Logistics", Amazon's own delivery notice pages are the only way to see the current status of a delivery. (As compared to UPS or USPS whose shipping data is available via APIs like EasyPost...)

    I do this to enhance the user value of my side project: wheresmystuff.co. This way, I can help people track Amazon packages along with deliveries via traditional carriers like UPS, etc.

    1. 1

      hey @ethanteng thanks it seems that e-commerce is a massive use case for scrapping. Best of luck with wheresmystuff.co.

  6. 1

    I am using Selenium to collect comments from gmaps.

    1. 1

      hey @joaoRMCarvalho would be interested in what you use the comments for.

      1. 1

        Hello @papertrail,
        The first version we used to extract businesses data (as email, phone-number, address,etc) to promote the project nopers.net.
        By the way, I made a post about it: https://www.indiehackers.com/post/feedback-on-nopers-your-reusable-qr-code-e0c2291668
        Feel free to comment. I'll be thankful.

        Now, we are studying a new feature where the main goal is to extract and analyze by keywords the comments of the businesses registered on our platform.

  7. 1

    Packt Publishing is offering a free ebook every day, provided you visit a website so I wrote a simple script to scrape book title and email it to me to see if I want to bother claiming it or not.

    https://www.packtpub.com/free-learning

    Data is worthless as is, but if someone compiled a few sources like that you could make an automated "Daily Free Stuff" newsletter or something.

  8. 1

    I'm using it to collect coronavirus data, and unemployment data all around the world , i have create a web dashboard with all the stats in react and i have publish the api, fre all the developer Who Need them. Both projects, client side and server side , are open source and you can find them on GitHub. The client https://www.ncovid19.it/ , and the server https://github.com/emulk/covid19API . I am using both python and php on a Linux server .

  9. 1

    This comment was deleted 4 years ago.

  10. 2

    This comment was deleted 4 months ago.

    1. 1

      hey @peterparker. What would you scrape for private usage.

      1. 1

        This comment was deleted 4 months ago.

          1. 1

            This comment was deleted 4 months ago.

        1. 1

          I am currently looking for a flat in France. This product is excellent. Exactly what I would need, with the connectors for French market.

          1. 1

            This comment was deleted 4 months ago.

  11. 1

    This comment was deleted 3 years ago.

    1. 1

      Is this online somewhere where we can take a look? Sounds like an interesting idea.

      1. 1

        This comment was deleted 3 years ago.

    2. 1

      That sounds like a really interesting project. What library/tech did you use to scrape?

      1. 1

        This comment was deleted 3 years ago.

        1. 1

          Did you ever got blocked? Banned? Had to use proxies? Captcha problems? How did you got past those?

          1. 1

            This comment was deleted 3 years ago.

    3. 1

      Nothing at the moment, I have in the past scrapped product and price information so mainly ecommerce applications. What peaked my curiosity was a IH post where scraped data was repurposed and sold as an info product.

      1. 1

        This comment was deleted 3 years ago.

    4. 1

      This comment was deleted 3 years ago.

Trending on Indie Hackers
How I grew a side project to 100k Unique Visitors in 7 days with 0 audience 49 comments Competing with Product Hunt: a month later 33 comments Why do you hate marketing? 29 comments My Top 20 Free Tools That I Use Everyday as an Indie Hacker 16 comments $15k revenues in <4 months as a solopreneur 14 comments Use Your Product 13 comments