What are you guys using for background workers nowadays?

by Wil Chung

Back when I was working with Rails about 4-10 years ago, I used background workers (sidekiq, faktory, delayed job, resque) to offload processing that can happen outside of the request/response cycle, to keep response times down. Examples might be sending email or making different sizes of uploaded images.

What do people of all stacks use nowadays? Do JAM stack devs use web workers and service workers? Or are people using AWS lambda for this stuff (or beanstalk? blah)? Or are you using Kubernete's job runners? Or do you offload it to Zapier? Or you leverage something specific, like Github actions? Or do you use big data tools like Apache Spark, because it's specific to the job, and a generic runner won't do? Or are there mature services for the specific offline processing that you need that you can just punt it to a 3rd party?

For me, we've co-opted event handlers in Hasura as a poor man's background job runner, and I wish there was something that gave me more visibility into job throughput and performance, easier debugging, chainability between jobs, and integration with 3rd party services. Anyone have a job running framework/platform that they love?

Wil Chung

posted to

Developers

on June 15, 2020

Say something nice to iamwil…

Post Comment

1

@iamwil, just read your post about how you used Ruby on Jets to build a job engine for ruby. Sounded awesome so I built a prototype. Came out great.

I'm curious, did you have some sort of a dashboard with your job tasks? Any way to manage the jobs outside of the AWS login pages?

Thanks so much. Really appreciate your insight and expertise!!

kenmazaika

·
6 years ago
·
Reply
1

We use Google Cloud Kubernetes Engine and Kubernetes's job abstraction.

maxweber

·
6 years ago
·
Reply
1

I mostly use AWS Lambda or Batch job using Cloud events depending on how long the job needs to run. If it's less than 15 minutes, LAMBDA is a great fit. For anything more than that AWS Batch job with perhaps Fargate works great.

sam26880

·
6 years ago
·
Reply
1

Celery (for Python) + RabbitMQ

I run a bunch of background tasks too with crontab + bash scripts

simplisticallysimple

·
6 years ago
·
Reply
1

I moved my sidekiq workers to jets / Lambda

KevinColemanInc

·
6 years ago
·
Reply
1. 1
  
  Curious: How come? What does serverless get you that sidekiq didn't?
  
  iamwil
  
  ·
  6 years ago
  ·
  Reply
  1. 2
    
    I was building a data extraction tool for processing large number of documents.
    
    Originally, we had 1 box with ~6 threads running sidekiq to run the text extraction code. We were constantly re-processing 100k+ documents as we updating our tooling. Sidekiq would take hours to process all of the documents after a code update.
    
    By moving to Lambda, we were able to spawn ~1k workers that processed everything in minutes. To have something comparable with sidekiq would either be running a sidekiq on a larger number of ec2 machines that we would scale up and scale down (while we worked during biz hours) or use lambda.
    
    KevinColemanInc
    
    ·
    6 years ago
    ·
    Reply
1

Previously used Celery (Python) but mostly do backend in Node/Typescript so I'm using Bull (https://github.com/OptimalBits/bull). It's worked great for us so far.

yonidejene

·
6 years ago
·
Reply
1. 1
  
  I had used bull back in 2014/15ish, and I had some jobs that got stuck. I'm guessing it's improved lots since then.
  
  I missed the instrumentation and monitoring that sidekiq and faktory gave you though. Usually with these jobs, I want to know the individual throughput, and error rate.
  
  iamwil
  
  ·
  6 years ago
  ·
  Reply
  1. 1
    
    Agreed.
    
    For monitoring, we're using Arena (https://github.com/bee-queue/arena). I really wish it was all integrated like Sidekiq tho.
    
    yonidejene
    
    ·
    6 years ago
    ·
    Reply
1

Sidekiq + Active Job (Rails) backed by Redis. I have processed more than 9 million jobs at this point, but don't remember having problems with them. Occasionally, Redis connections time out (for a few seconds), but it is never a big deal.

I considered using AWS Lambda + SQS for my current project (website monitoring service which heavily depends on background jobs), but didn't want to get locked-in for such critical component of my application. In contrast, I can run Sidekiq on a run-of-the-mill virtual server, which is a commodity these days.

jmstfv

·
6 years ago
·
Reply
1

I've always stuck to a plain old python (async now) executable running in a docker container reading from an SQS queue. We even use this pattern at my work and it's scaled flawlessly to millions (maybe billions now) of jobs a day with no problems at all. Having a distinct queue helped us ensure we were truly decoupled whereas using a framework like celery led to mistakes where things looked decoupled but really weren't and you wouldn't find out until you get the nasty side effects.

krishan711

·
6 years ago
·
Reply
1
I think it depends on a number of things, but I've used the following in different scenarios:
- Node worker threads
- Lambdas (often for things like image resizing, etc)
- RabbitMQ (often for things like sending email, etc)
rfitz

·
6 years ago
·
Reply
1. 1
  
  What are the number of things it depends on?
  
  iamwil
  
  ·
  6 years ago
  ·
  Reply