For those of you who have created job sites how are you scraping the jobs? What tools are you using or is it custom built?
For a project I've started that is very dependant on scraping pages, I use diffbot. It's expensive but the results it achieves from such little input are quite astonishing.
It's very hit-or-miss. For example, I often have problems with Stack Overflow because they use several different layouts, and they change them frequently, making scraping a difficult cat-and-mouse game (they have RSS, but the feed doesn't provide the full job description). Angellist works very aggressively to block scraping using a bunch of techniques, so I stopped trying.
I've been experimenting with using fetchrss.com to auto-generate RSS feeds from sites that don't have RSS, but haven't implemented any yet.
Just a post I needed to see. Looking for this exact info.
For a side project I scraped job postings for keyword analysis, and used a golang package called colly
You don't ? At least I didn't for findamaker.io
So did you source the jobs yourself first ?
I told my friends I launched this project and give them a lot of coupons so they could help me fill in the website with free job posting