Developers October 26, 2020

Using MongoDB as a Message Queue

Shivansh Vij @doctorqubit

Hi All,

We've recently been testing MongoDB as a persistent message queue vs something like RabbitMQ for https://lynk.sh and https://pigeonpost.io. I'm trying to decide if we should write a blog article explaining how we went about using both tools or if there's already a general consensus.

Edit:
It looks like there's definitely a general consensus, we will be publishing a blog article about our adventures with MongoDB queues around next week!

Blog article on MongoDB vs RabbitMQ for message queues
  1. I'm interested!
  2. It's definitely RabbitMQ
  3. It's definitely MongoDB
Vote
  1. 2

    Oh i've never heard of MongoDB being used for a message queue. Does it have data structures to support locking and popping items from a list? (async or blocking)? How do you handed ACKs and failure scenarios where you need to add items back to a queue? Because if it does not have data structures to support these mechanisms - is it any better than using a database like Postgres?

    At the moment I use Redis for message queues - I honestly thought redis was the standard and pretty easy to use too. Have used RabbitMQ before but makes sense at higher volumes and needing to support different queue types (have used it on a system that had 1m+ queue items per day).

    1. 1

      My issue with Redis is just that it's an in-memory db. It's not persistent and I really like the idea of having our job states nice and persistent.

      MongoDB is also right there and isn't another piece of the puzzle that the API or Infrastructure teams need to manage.

      As for ACKs and locking, I've built a pretty solid way of atomically updating the status of a job to processing so no other workers pick it up, and then if our lifecycle manager sees a job stuck for too long or a dead worker it re-queues the jobs that were being processed.

      1. 1

        Yea I get the idea of reducing the tech debt and making the stack simple. This is a common misconception with Redis. It can be configured to persist to disk (anyone running redis in production should do this).

        I work in the python ecosystem, for me when using workers and schedulers, there are libraries built for this purpose. Most of them support Redis by default, but the larger tools also support Kafka, and RabbitMQ as a message queue.

        So I've never had to build these ACKing and locking manually. Ive had to understand it to tweak it for my specific tasks when needed.

        But yea, this is why I use Redis and I don't really think much about it. For any production app, you need to have async tasks, scheduling and background workers. So I don't think building your own makes sense unless your task and use case is really special.

  2. 2

    I suppose it depends on the complexity of your needs, but Mongo isn't really designed to be a message queue system and I would not be surprised if you ran into issues down the road. As others have mentioned, SQS is good for this. Personally, I am a big fan of GCP PubSub.

  3. 2

    Have you tried SQS?

    1. 1

      We were using AWS SQS back when our infrastructure was hosted on AWS. We've been trying to migrate as much of our infrastructure onto Digital Ocean as possible, and we would have used SQS if it was necessary but since we found a working solution with MongoDB we decided to go in that direction.

  4. 1

    streams may help? I haven't used it but seems very relevant.
    https://www.mongodb.com/blog/post/an-introduction-to-change-streams

  5. 1

    If you already have mongoDB in your deployment setup, this does seem attractive.

    are you using some type of pubsub wtih mongodb? afair that only works with tailable cursors on capped collections, which basically means data will get thrown away at some point.
    But this maybe OK for a message queue. if you're basically doing polling to fetch your next set of commands its not a great architecture - depending on the frequency/size of the messages.
    redis seemed much more designed for a push service. Also I think postgres is better in that regard.
    What else does rabbitMQ give you? It seems much more sophisticated in terms of things like being able to send the messages to more than one host (fanout routing) and more control of ackback when handled.

    very minimal example in 10 lines:
    https://gist.github.com/scttnlsn/3210919/fd275d69eb3dd3c9586e342d38187eabad809e35

    1. 1

      We were looking at cursors but like you said, data might get thrown away. When it comes to sending emails we definitely can't lose data. That's the main reason I don't want to use something like Redis for this.

      As for polling, we've actually pipelined our polling process so that one part of the worker fetches and locally queues emails while a different parts takes jobs from the local queue and sends them as fast as it can. This keeps things running really fast and lets us scale out really easily.

      The usual problem with pipelining is that it slows down the entire process (a message takes N seconds to get polled and added to the local queue and another M seconds to get processed by the local queue).

      Thankfully, like I said, since we roughly know how long it takes to send an Email, by keeping our local queues capped we can basically guarantee an email will get sent with about the same waiting queue period as it would using something like RabbitMQ

  6. 1

    I've been using mongodb as a queue (emails, letters) for a few years, no complaints for my use cases. I'd be interested to see your implementation.

  7. 1

    When I built ZoneMTA (https://github.com/zone-eu/zone-mta) I went with MongoDB instead of RabbitMq because it was impossible in RabbbitMq to limit messages per destination, unless when creating dedicated queue for every MX which would be super cumbersome. When you have queued thousands of emails for a single MX then you do not want to process these right away as the MX usually limits parallel TCP sockets from a single IP. You want to be able to take messages down the queue that are for some other MX orherwise you would end up with a bottleneck (eg you could process tens or hundreds messages in parallel but all the messages in the top of the queue are against a MX that allows to open 2 sockets only). MongoDB is less efficient but you can use queueries instead of blindly taking whatever comes next.

  8. 1

    Having used RabbitMQ for a lot things, was there something wrong with RabbitMQ's persistence capabilities? Using it requires syncing each message to disk before processing, but as you said, it seems like real-time (or near real-time) performance isn't necessary for your use case. How would you handle scaling message queue consumption and routing with MongoDB as a queue over RabbitMQ?

    Also, have you considered something like Kafka? Sure it's probably overkill, but is going to provide you with many more features and room to scale over time.

    In a way, I definitely want to know more about how you're doing this.

    1. 1

      Long story short, we have a pretty simple use case. We need to be able to queue emails to be sent by our In-house Mail servers for https://pigeonpost.io, and we know that sending an email can take up to 3 seconds depending on the receivers mailbox.

      The only thing wrong with RabbitMQ was deploying and managing it. With MongoDB we get to use MongoDB atlas for all the heavy lifting, but with rabbitmq unless we pay for an external cloud hosting service like cloudamq we have to do it ourself.

      We’ve set up a local queue with MongoDB which basically allows us to queue a message for sending while pulling down another message from the database. This keeps things pipelined properly and it only really works because we don’t need the speed of rabbitMQ. Scaling is also extremely straightforward since we can just use different collections with different status markers to scale up.

      1. 1

        Have you tried an email service such as Postmark? They have built-in queues as part of their service.

        1. 1

          We're actually in the process of building a transactional email competitor to Postmark called Pigeon Post (https://pigeonpost.io). Obviously we want to be able to send our emails as fast (or in some cases faster) than our competitors so it was important for us to explore the various pros/cons of infrastructure decisions.

          1. 1

            Interesting! But your link to the docs is broken for me. Love that the pricing scales down to zero. Pay-as-you go FTW!

            1. 1

              Thanks! We've made sure that all of our products have a big free tier.

              We design our services to be as cost effective as possible, and since we're a small team that's just looking to build good tools and get noticed in the community we're not really after ridiculous profit margins either. Keeping our costs downs means keeping our customers' costs down.

              As for the docs links, we're going to be releasing those in our closed beta next week, so we have it offline for now. If you sign up for our newsletter we'll make sure you have a spot in the closed beta!

  9. 1

    We've done a lot of internal testing and while RabbitMQ is faster, it's much more cost effective to use MongoDB if you don't need near-instantaneous job delivery.

    In situations where a queue will build up we found that there's actually really little difference between the two.

Recommended Posts