Developers February 26, 2020

Database design for storing chats


I am interested in the best design for storing messages between different users something like WhatsApp you can have group chats and whatnot. I have experience with no SQL databases but what is the best way to store the messages as I can see it become extremely large.

Would it help to cache some of the messages to help reduce API calls to the DB?

What would the schema look like?

For a no SQL DB I imagined having the usernames of people in a chat separated by colons as a key e.g username1:username2. This would mean usernames can't have colons. That key could then have all the messages in the chat and from who it was sent from and with time stamps.

So essentially like with any DB I am trying to make it efficient and scalable any ideas?

  1. 3

    I would use a standard SQL db.

    Very simplified table structure:


    • id
    • name
    • etc


    • id
    • sender (user id)
    • recipient (user id)
    • content
    • timestamp


    • sender (user id)
    • recipient (user id)

    (Would be different if conversations can ave more than 2 participants).

    You would probably need to look into pagination (loaded as needed convo history chunks).

    If your application server is struggling with a high number of requests, and many of those requests are returning the same data each time (at least for some conversations), you could throw a redis cache or similar in front of the whole thing

  2. 3

    A lot of it comes down to the purpose of storing these messages, and how often they're going to get accessed. In general though, you can probably reach decent enough scale with any mainstream DB if you know what you're doing with it.

    My concern with the NoSQL approach here is for a significantly long chat log you're probably going to end up with some crazy long entries. Storing all chats a user has ever made under a single entry will probably be quite long, so you may need to start looking at ways to break it up. I'm most familiar with Mongo, and here you can start doing references and such, although I'd argue at that point you're starting to move towards a more SQL like structure.

    A standard SQL solution would probably suffice, with each each row being a chat message. You could reconstruct any conversation from there. This would scale fine, and gives you a very easy way to archive messages later on.

    In terms of caching, you can look at this independent of your DB. You can certainly cache the messages with any number of caching services if you expect the data to be accessed enough. Caching comes with own bag of worms in terms of your invalidation strategy, so I'd make sure that caching has enough benefits to spend time solving these problems.

  3. 1

    Let's start with some basic assumptions and create very naive (yet powerful up to a good amount of load) solution.

    1. Each conversation will be in a, virtual chatroom. (Even 2 people are talking to each other, still this is a chatroom with 2 people)
    2. As soon as new chatroom is created (one party send the first message to other), generate a unique id for chatroom. For simplicity, let's assume we are generating incremental integer IDs for each conversation (starting from 1)
    3. Out app will be viral and it's going to attract lots of users and so many messages will be passed between people. We can't rely on only one table for storing/querying messages. We need multiple tables.

    Now, let's go and create 10 tables with schema. (Do not append messages to each other. They all are different items.) (Our schemas primary/partition key is chatroom_id and sort key is sent_at)

    table1 > {chatroom_id, sender_name, message, sent_at}
    table10 > {chatroom_id, sender_name, message, sent_at}

    Your application logic will calculate which table to read/write for each chatroom. Let's say I am the first one sending message via this system. So my chatroom id is 1, application will perform 1mod10 operation and write my conversation in to table1.

    If your chatroom id is 78, than 78mod10===8 so your conversation will be written into table8.

    While querying chatroom messages, basically perform "get last 100 messages from table8 with chatroom_id is 78"

    Does such design makes sense? I hope it helps

  4. 1

    I would use ArangoDB for chats. I have played with it and it seems great as you can make these graph like structures.