6
14 Comments

What are you guys using for background workers nowadays?

Back when I was working with Rails about 4-10 years ago, I used background workers (sidekiq, faktory, delayed job, resque) to offload processing that can happen outside of the request/response cycle, to keep response times down. Examples might be sending email or making different sizes of uploaded images.

What do people of all stacks use nowadays? Do JAM stack devs use web workers and service workers? Or are people using AWS lambda for this stuff (or beanstalk? blah)? Or are you using Kubernete's job runners? Or do you offload it to Zapier? Or you leverage something specific, like Github actions? Or do you use big data tools like Apache Spark, because it's specific to the job, and a generic runner won't do? Or are there mature services for the specific offline processing that you need that you can just punt it to a 3rd party?

For me, we've co-opted event handlers in Hasura as a poor man's background job runner, and I wish there was something that gave me more visibility into job throughput and performance, easier debugging, chainability between jobs, and integration with 3rd party services. Anyone have a job running framework/platform that they love?

posted to Icon for group Developers
Developers
on June 15, 2020
  1. 1

    @iamwil, just read your post about how you used Ruby on Jets to build a job engine for ruby. Sounded awesome so I built a prototype. Came out great.

    I'm curious, did you have some sort of a dashboard with your job tasks? Any way to manage the jobs outside of the AWS login pages?

    Thanks so much. Really appreciate your insight and expertise!!

  2. 1

    We use Google Cloud Kubernetes Engine and Kubernetes's job abstraction.

  3. 1

    I mostly use AWS Lambda or Batch job using Cloud events depending on how long the job needs to run. If it's less than 15 minutes, LAMBDA is a great fit. For anything more than that AWS Batch job with perhaps Fargate works great.

  4. 1

    Celery (for Python) + RabbitMQ

    I run a bunch of background tasks too with crontab + bash scripts

  5. 1

    I moved my sidekiq workers to jets / Lambda

    1. 1

      Curious: How come? What does serverless get you that sidekiq didn't?

      1. 2

        I was building a data extraction tool for processing large number of documents.

        Originally, we had 1 box with ~6 threads running sidekiq to run the text extraction code. We were constantly re-processing 100k+ documents as we updating our tooling. Sidekiq would take hours to process all of the documents after a code update.

        By moving to Lambda, we were able to spawn ~1k workers that processed everything in minutes. To have something comparable with sidekiq would either be running a sidekiq on a larger number of ec2 machines that we would scale up and scale down (while we worked during biz hours) or use lambda.

  6. 1

    Previously used Celery (Python) but mostly do backend in Node/Typescript so I'm using Bull (https://github.com/OptimalBits/bull). It's worked great for us so far.

    1. 1

      I had used bull back in 2014/15ish, and I had some jobs that got stuck. I'm guessing it's improved lots since then.

      I missed the instrumentation and monitoring that sidekiq and faktory gave you though. Usually with these jobs, I want to know the individual throughput, and error rate.

      1. 1

        Agreed.

        For monitoring, we're using Arena (https://github.com/bee-queue/arena). I really wish it was all integrated like Sidekiq tho.

  7. 1

    Sidekiq + Active Job (Rails) backed by Redis. I have processed more than 9 million jobs at this point, but don't remember having problems with them. Occasionally, Redis connections time out (for a few seconds), but it is never a big deal.

    I considered using AWS Lambda + SQS for my current project (website monitoring service which heavily depends on background jobs), but didn't want to get locked-in for such critical component of my application. In contrast, I can run Sidekiq on a run-of-the-mill virtual server, which is a commodity these days.

  8. 1

    I've always stuck to a plain old python (async now) executable running in a docker container reading from an SQS queue. We even use this pattern at my work and it's scaled flawlessly to millions (maybe billions now) of jobs a day with no problems at all. Having a distinct queue helped us ensure we were truly decoupled whereas using a framework like celery led to mistakes where things looked decoupled but really weren't and you wouldn't find out until you get the nasty side effects.

  9. 1

    I think it depends on a number of things, but I've used the following in different scenarios:

    • Node worker threads
    • Lambdas (often for things like image resizing, etc)
    • RabbitMQ (often for things like sending email, etc)
    1. 1

      What are the number of things it depends on?

Trending on Indie Hackers
6 weeks solo, 2 rejections, finally live but nobody told me marketing would be this hard User Avatar 93 comments Building ExpenseSpy solo, no funding — launching June 17 on iOS & Android User Avatar 45 comments Hi IH — quick update. The MVP is live. User Avatar 34 comments I built a $5/1k-listing CRE data API because CoStar is overkill for first-pass scans User Avatar 18 comments Day 7: 51 people answered my question. I wasn't ready for what they said. User Avatar 18 comments Building LinkCover – Day 3: Payment is live. No more building, time to sell. User Avatar 15 comments