September 13, 2018

Scraping job posts

For those of you who have created job sites how are you scraping the jobs? What tools are you using or is it custom built?

  1. 2

    For a project I've started that is very dependant on scraping pages, I use diffbot. It's expensive but the results it achieves from such little input are quite astonishing.

  2. 2

    I have a site that does this (, and it's a mix of RSS feeds, hitting the page with an HTTP client (I'm using ruby), or in extreme cases where Javascript is required, I use the same headless chrome tools (e.g. capybara) which are more typically used to do automated feature testing.

    It's very hit-or-miss. For example, I often have problems with Stack Overflow because they use several different layouts, and they change them frequently, making scraping a difficult cat-and-mouse game (they have RSS, but the feed doesn't provide the full job description). Angellist works very aggressively to block scraping using a bunch of techniques, so I stopped trying.

    I've been experimenting with using to auto-generate RSS feeds from sites that don't have RSS, but haven't implemented any yet.

  3. 1

    Just a post I needed to see. Looking for this exact info.

  4. 1

    For a side project I scraped job postings for keyword analysis, and used a golang package called colly

  5. 1

    You don't ? At least I didn't for

    1. 1

      So did you source the jobs yourself first ?

      1. 2

        I told my friends I launched this project and give them a lot of coupons so they could help me fill in the website with free job posting