January 14, 2020

Rolling Product Update Sync

Jesse Bethke @JesseBethke

Getting good product data is hard. For Social Inertia, I had to individual curate those listings by pulling content from the manufacturers, publishers, and various gaming communities. That good data is what made Social Inertia successful in the first place. WSG would flounder if it couldn't get the same quality of data.

My distributor does not have a REST API for product details, but they do have an ordering interface exclusively available to retailers. The data is there.

I prepared an HTML scraper that could invoke those product details pages, parse out the details, and route them back in through my product API. This includes getting references to images that could be queued for download.

I wanted to make sure I didn't inundate my distributor's platform with crawls and scrapes. So I setup another queue for products in need of updating. A Lambda routine pops a product of the queue once a minute, fakes a log in to the distributor to get a session cookie, and then scrapes the product's data. It takes about a week to roll through the entire product inventory and capture data, but that's a healthy spread. Product details will now be perpetually up-to-date.

Next up: resizing those uber-huge images to be web and social media friendly.

Loading comments...