Do you prefer SQL or NoSQL for your backend?

by Csaba Kissi

I wonder what DB do you use for your backend.

I usually use MySQL (Postgres). Never used Mongo so far. For very simple projects where security isn't a priority I just use a simple JSON formatted file.

Csaba Kissi

posted to

Developers

on November 27, 2019

Say something nice to ckissi…

Post Comment

6

SQL all the way. We currently have over 250 tables in our SaaS app, and to try and manage that in NoSQL would just be a nightmare. I know there is a popular argument that it is "hard to change the schema" in SQL databases, but if you design your schema right from the ground up, and have processes in place for this, then it is really a non-issue. I much prefer normalised SQL tables over duplicating data n times across several NoSQL tables.

That said, we DO use RethinkDB, which is a NoSQL database, extensively in our app as well - though it is used mainly to capture API data streams from external parties for temporary storage while we work on it. In these cases, we expect differing table structures, so using NoSQL works for this particular use case.

Devan

·
6 years ago
·
Reply
4

It is depend on the project. My default database is postgres. I can use its nosql features as well with json datastore. I think 90% of the projects can just happy with these setup especially because the tooling and infrastructure is very solid around it. we have multi TB database with 10^15 records and we can manage it with optimization. Having said that, i worked with project with mongo, dynamodb, firestore/datastore and all sort of “big data” solutions. all have a place in a complex project either as a primary database - firestore with firebase is especially good for mobile first, quick projects where the direction is not clear -, or secondary database as a part of a data pipeline - we used mongodb to store data from various data provider because their always changing data model and then we processed it and moved the clean data to progress.

szferi

·
6 years ago
·
Reply
1. 1
  
  Wow, 10^15 records are huuuge. I wonder how far can you go with optimization on such a large amount of data.
  
  ckissi
  
  ·
  6 years ago
  ·
  Reply
  1. 1
    
    Very far. (eg. sub-second query responses with parallelization, of course with appropriate hardware). If you read the engineering blog of large products like twitter you can get a sense that with sharding, replication, tooling, and optimization you can go very very far. (eg: https://blog.twitter.com/engineering/en_us/topics/infrastructure/2017/the-infrastructure-behind-twitter-scale.html )
    
    szferi
    
    ·
    6 years ago
    ·
    Reply
    1. 1
      
      Koszi Feri!
      
      ckissi
      
      ·
      6 years ago
      ·
      Reply
3

Yay, a data question!

As of now, the plan is to use MySQL (might switch to Postgres) and Cassandra (NoSQL).

I plan to insert user inputs into MySQL tables since I'm grabbing quite a few pieces of data from them that can easily be categorized into certain tables and pulled up via JOINS. This is particularly for scaling purposes, as I don't want to take up a lot of space on whichever cloud service I end up using.

As for Cassandra, that's where I store the data points that I gather from externals sources not provided by the end users (e.g., government data, Glassdoor). The reason why I chose Cassandra (or NoSQL in general) is because these datasets are very large, and NoSQL has the advantage over SQL databases in terms of reads/writes query times.

It's actually recommended to use NoSQL databases for MVP or "pilot" projects since most people end up changing their database structures after deploying their project/product. But if you're pretty confident that the overall architect of the database won't change, then starting out with SQL tables can be beneficial.

Shameless plug-in: I wrote a Medium article not too long ago on how to utilize Cassandra with Python and Docker, for those that are interested. For those that just want to better understand the use cases for NoSQL databases, skip all the way to the bottom to the "Real-World Use Cases for NoSQL Databases " section

https://medium.com/swlh/building-a-python-data-pipeline-to-apache-cassandra-on-a-docker-container-fc757fbfafdd

caitlinjohnson

·
6 years ago
·
Reply
1. 3
  
  I'm currently using SQL for all situations regardless of the state of the project. No problem so far.
  
  ckissi
  
  ·
  6 years ago
  ·
  Reply
  1. 1
    
    You likely won't experience problems for a while until you start getting into "Big Data" territory, which is where NoSQL has the advantage over SQL. But even then - unless query response times are crucially important for your end users - you probably can keep using SQL without any issues.
    
    I'm only leveraging both NoSQL and SQL because I'm somewhat OCD with data architecture :)
    
    caitlinjohnson
    
    ·
    6 years ago
    ·
    Reply
    1. 1
      
      Storage tech is so varied and specific to use case these days that it's a bit problematic to even begin the conversation at SQL vs. NoSQL. But I can see the simplification is useful for discussion. The choice of technology itself is more often a function of your needs for consistency, atomicity, availability, authorization, write vs. read throughput, etc.
      
      I hope I don't come off too negative here, but what drove you to the conclusion that "NoSQL has the advantage over SQL" in the realm of "big data"? I'd be interested in some literature to back that up. Don't know that it is too wise to spread blanket statements like that.
      
      I'm making some assumptions about your definition of "advantage" here, but let's take "read and write performance" — it's more likely a function of use case, data modeling, query optimization, hardware, network, and plenty of other things that aren't specific to any particular technology.
      
      Technology choice is a variable there, but it's usually dependent upon the others (i.e. picking a bad technology for your use case will impact performance; e.g. picking HDFS when you don't have large table operations that can't fit into memory will probably not be as efficient as another choice).
      
      Speaking of big data: what kinds of operations are we talking for "big data" — just storage, or are we going to need to be performing real-time analytics? SQL vs. NoSQL might not even make sense in that context.
      
      adamhp
      
      ·
      6 years ago
      ·
      Reply
      1. 1
        
        "what drove you to the conclusion that "NoSQL has the advantage over SQL" in the realm of "big data"? I'd be interested in some literature to back that up".
        
        Sure: read up on Eric Brewer's work on CAP Theorem which proves you can simultaneously only ever guarantee two of: Consistency, Availability and Partition Tolerance. The third option can only be provided 'eventually'.
        
        Relational Databases (RDMS) traditionally favour Consistency and Availability. Partitioning is available - but it's eventual (usually via Master - Slave replication). Traditionally to scale up a relational database, it involves beefing up the database server to a more powerful machine - but this becomes expensive quickly and you can only go so far. Partitioning is what's required (so you can scale horizontally across multiple servers) - but this isn't guaranteed at real-time.
        
        NoSQL are typically either Available/Partition Tolerant (AP) - such as Cassandra. Or Consistency/Partition Tolerant (CP) such as Mongo.
        
        Really, what kind of database you decide upon depends entirely on the problem you're trying to solve. Look at CAP Theorem - and choose the two most important facets for your problem. And from that choose the right tool for the job. There's a diagram here which can help you decide on a tool: https://i.stack.imgur.com/a9hMn.png
        
        NathanSchultz
        
        ·
        6 years ago
        ·
        Reply
      2. 1
        
        Storage tech is so varied and specific to use case these days that it's a bit problematic to even begin the conversation at SQL vs. NoSQL
        
        No one should be having the "SQL vs. NoSQL" conversation. Instead, they should be asking themselves "What datasets should I put in a SQL database? And which ones should I put in NoSQL?"
        
        but what drove you to the conclusion that "NoSQL has the advantage over SQL" in the realm of "big data"?
        
        Not necessarily just me, but numerous people out there stating the advantages of NoSQL vs SQL when it comes to Big Data. They are more scalable than SQL but that usually comes at the cost of transactional guarantees. Read/write query response times become relatively faster with NoSQL compared to SQL as the size of the dataset grows. This has been proven at my previous place of employment when we compared response times between our MongoDB dataset and our Postgres dataset (we use same kind of data and executed same queries to eliminate bias).
        
        Don't know that it is too wise to spread blanket statements like that.
        
        You sound fun.
        
        I'm making some assumptions about your definition of "advantage" here, but let's take "read and write performance" — it's more likely a function of use case, data modeling, query optimization, hardware, network, and plenty of other things that aren't specific to any particular technology.
        
        You're not wrong as it relates to the function of use case, data modeling, etc. Not everyone is going to want the fastest response time if they care more about having normalized tables and saving space. But the assumption regarding "advantage" is wrong - I'm specifically talking about read/write responses in Big Data sets.
        
        Speaking of big data: what kinds of operations are we talking for "big data" — just storage, or are we going to need to be performing real-time analytics? SQL vs. NoSQL might not even make sense in that context.
        
        Both. I'm not talking about Big Data in terms of storing years of legacy data to meet federal/state requirements. I'm talking about Big Data in terms of (1) the data being received from end-users and external parties and (2) the utilizing of said data for real-time usage (e.g., reporting, end-user interactions).
        
        caitlinjohnson
        
        ·
        6 years ago
        ·
        Reply
2. 1
  
  Cassandra is a great tool - but it has hidden maintenance costs. Highly suggest using a managed service if you are going the Cassandra route.
  
  For the company I work for currently we were able to get enough performance out of SQL w/ the exact schema we used in Cassandra + a clustered index. (high resolution sensor data) Switching to SQL saved us lots of money in hosting, and in dev maintenance time.
  
  connorl33
  
  ·
  6 years ago
  ·
  Reply
  1. 1
    
    I've seen articles that suggest Cassandra in production requires a dedicated head to manage any sized cluster, but I've also seen folks say that it's not necessarily true and that it can be avoided if done right (or with minimal maintenance). I'm willing to educate myself on how to properly structure Cassandra databases and avoid the headaches, especially since it will also benefit me with my current day job.
    
    'preciate the concerns though!
    
    caitlinjohnson
    
    ·
    6 years ago
    ·
    Reply
2

Don't use NoSQL as your source of truth! It seems easy at first and gets very hard very quickly.

MongoDB is the database I have worked with most (~6 years) during my professional life so far, other than that there was a bit of CouchDB and S3, but mostly SQL variants like MySQL, Postgres (~5 years) and or Microsoft SQL.

I've used MongoDB in two different projects and each time the project ended up being ported to Postgres.

What made MongoDB attractive at first were two things: we dind't really have relational data in the beginning and it seemed easier to get started with, since you don't have to think about schema.

Our data does get more and more relational over time though, as more requirements become apparent. Maintaining relations in NoSQL is a pain. Nested JSON objects become harder to query and have to be offloaded into their own collections eventually. There's no ON DELETE CASCADE in MongoDB.

The other trap is not having to think about schemas, because in the end, you do have to think about your schema, in your application logic. "you don't have to write migrations in MongoDB!" they said. Turns out, you do have to write migrations, when your app changes, but it's really hard. Either you end up writing a script looping over your database entries, or you write the migration as part of your application logic, or both.

What I found the hardest for me was actually writing queries in MongoDB. If you think SQL is hard, try MongoDB. Simple queries in MongoDB are very simple at first (see https://docs.mongodb.com/manual/reference/sql-comparison/ ), but anything more complex ends up in a huge JSON monstrosity that makes you beg for some SQL, just look at this SQL-to-MongoDB chart: https://medium.com/@pranava_swaroop/sql-to-mongodb-mapping-chart-2d7ccda5dbaf Want to know which clients have written more than three comments during the last two weeks? Good luck with that!

I don't know why anyone would not use Postgres. It gets more awesome features every year: text search is pretty darn good and fast since last year, and it just got an awesome calculated column feature these days and lets you have JSON with indexes if you want.

There were times when I had used NoSQL successfully: for caching, or scaling (S3), or when distributing a simple web app for offline use with easy sync (CouchDB) but always in combination with an SQL database as the source of truth.

strathausen

·
6 years ago
·
Reply
1. 1
  
  I agree with you on most of the points. Do you use an ORM with Postgres? What web framework do you use?
  
  berk
  
  ·
  6 years ago
  ·
  Reply
  1. 1
    
    I'm usually using an ORM, but occasionally I was also fine just using raw queries with parameters. I'm using nodejs with express or koa, and django.
    
    strathausen
    
    ·
    6 years ago
    ·
    Reply
2

Great question! I’m not an expert at all but I was kind of under the impression that NoSQL was a fad and was dying out but actually especially on IH I’ve heard of a lot of people using it still. Would love to hear what the appeal is.

I use Postgres for all my projects that need a db

desmondmc

·
6 years ago
·
Reply
1. 2
  
  NoSQL is definitely not a fad. One of AWS's most used services is DynamoDB, which is their NoSQL solution. Unlike RDS or any SQL database, DynamoDB is fully scalable without you ever having to set up replication or sharding, taking care of bin logs etc.
  
  NoSQL is great if you dont know the database structure of your project, or if you do know the structure & you know that schemas will be changing periodically.
  
  As you know, in SQL, if you had to change the schema of a table, it would be an absolute headache. Especially if your product is live.
  
  sosa067
  
  ·
  6 years ago
  ·
  Reply
  1. 2
    
    Interesting. I guess i only thought that because some senior backend guys at my old job used to talk about how they switched to MongoDB once and it was a huge mistake and they had to switch back. Do any companies use NoSQL at huge scale? If so who? I'm really curious
    
    desmondmc
    
    ·
    6 years ago
    ·
    Reply
    1. 1
      
      It's never a question of "Is 'generic storage technology' bad or good?"
      
      It's always a question of which storage technology is going to fit my use case and my data needs.
      
      adamhp
      
      ·
      6 years ago
      ·
      Reply
    2. 1
      
      I'm really curious too. According to this link, https://www.mongodb.com/who-uses-mongodb seems lot of big guys use it.
      
      ckissi
      
      ·
      6 years ago
      ·
      Reply
      1. 1
        
        You have to remember that MongoDB is just a variant of NoSQL. There's still DynamoDB, CassandraDB, Redis, CouchDB etc.
        
        sosa067
        
        ·
        6 years ago
        ·
        Reply
  2. 1
    
    This is why I was thinking about Mongo for one of my next projects.
    
    ckissi
    
    ·
    6 years ago
    ·
    Reply
  3. 1
    
    "As you know, in SQL, if you had to change the schema of a table, it would be an absolute headache. Especially if your product is live." yes, that's true. This is why it's really worth it to push enough effort to design your schema from start.
    
    ckissi
    
    ·
    6 years ago
    ·
    Reply
2. 1
  
  I haven’t used a sql db in the last 15 years, only non sql.
  Totally depends on what data you’re trying to store.
  
  jaakit
  
  ·
  6 years ago
  ·
  Reply
  1. 1
    
    This is interesting you were able to exist totally without relational DB.
    
    ckissi
    
    ·
    6 years ago
    ·
    Reply
    1. 1
      
      Our IT company was a bit ahead of it's time with Software AG Tamino XML Server.
      After doing the courses almost 18 years ago and seeing how much simpler and more powerful NoSQL is, it became my natural choice for most things. That said, I did spend about eight years on postgresql for larger systems. SQLITE is also really cool.
      
      I'm a big fan of key-value stores though. Simplicity is always a goal for me.
      
      jaakit
      
      ·
      6 years ago
      ·
      Reply
3. 1
  
  It's not dying out - if anything, it's growing in usage. Netflix, Facebook, Uber, and other big companies use NoSQL databases, in addition to a lot of startups.
  
  caitlinjohnson
  
  ·
  6 years ago
  ·
  Reply
  1. 1
    
    Yes, true. Probably most of them use it in parallel with some kind of SQL DB.
    
    ckissi
    
    ·
    6 years ago
    ·
    Reply
    1. 1
      
      Exactly! They both have pros/cons and can be leveraged to those advantages wherever appropriate.
      
      caitlinjohnson
      
      ·
      6 years ago
      ·
      Reply
1

I'm using mongo on my current project. I had a hard time getting PostgreSQL working with node and express. Any tutorials on getting them connected and pushed to production?

chazie9

·
6 years ago
·
Reply
1

I decided to go with MongoDB for my current project. It’s flexible and easy to work with in NodeJS. By the nature of my app, the amount of data each user can reasonably generate is relatively small. I won’t be offering a free tier initially so, unless a paying customer decides to spam the service, I should hit scaling issues years down the line (if ever) with several thousand customers onboard. At that stage, I’ll either migrate to a more robust solution or, hopefully, be able to hire someone who can do that better than I could.

radek

·
6 years ago
·
Reply
1

As most previous posts already say... It really depends.
For my side project, I use a multi-model DB, mainly a document store. in both projects, I don't handle any billing/finance data on my side, so this is IMHO ok.
In the company I work we build software for telecom providers, where we have messaging part, transport and billing/finance and audit log domains and for all those cases we have different storage solutions in place. nontransactional data in mongo, auth stuff in Redis, billing in Percona/MySql cluster with sharding, audit log use time-series type storage.

altafino

·
6 years ago
·
Reply
1

I used to be a hardcore postgresql dude, but on my latest project switched to nosql, makes prototyping way easier cuz you don't get bogged down in migrations and stuff when you decide you need to add or delete a column

armen

·
6 years ago
·
Reply
1

I'm using MongoDB. Mostly because it's what I'm familiar with plus it's essentially JSON and I work primarily in JS.

nprail

·
6 years ago
·
Reply
1
If you find yourself asking this question, the answer is clear - SQL database.
You start thinking NoSQL only if you faced with:
- existing system with severe performance problems
- highly volatile data strictures (which is extremely rare).
javadiagrams

·
6 years ago
·
Reply
1. 2
  
  I'm curious as to why those are the only reasons you'd consider NoSQL over SQL?
  
  I'm working on a project that's built with SQL but it really shouldn't have been. And not for those reasons.
  
  nprail
  
  ·
  6 years ago
  ·
  Reply
  1. 1
    I come from a pragmatic experience working in a different companies (and my own startup).
    
    If you asking as an employee: make friends with your architect and decide it over a beer.
    
    If you asking for your own product: think of your next potential customer and see what gets him on board faster with less QA time (that would be SQL I assume).
    As sad as it is... I don't see any reasons to rebuild architecture if your projects has traction and no pressing performance limitations.
    
    javadiagrams
    
    ·
    6 years ago
    ·
    Reply
    1. 2
      
      I still don't understand how SQL will automatically result in less QA time (I assume you mean Quality Assurance?). For the aforementioned project, SQL has actually slowed us down because the nature of the data makes NoSQL a much more logical choice. I'm not saying SQL never makes sense. There are a lot of cases were it does. It just depends on the project and who is building the project.
      
      But yeah, definitely not a good idea to rebuild the architecture of a project just to change the database without a really good reason.
      
      nprail
      
      ·
      6 years ago
      ·
      Reply
1

I naturally choose no sql as it fits better with the data I use.
Funny thing is that you say you use a Json file, which is nosql🤟🏼👍

jaakit

·
6 years ago
·
Reply
1. 1
  
  Yes, but it's super small. Let's say not more than 20 records. Mostly just for settings. Like config store.
  
  ckissi
  
  ·
  6 years ago
  ·
  Reply
1

Would you believe me if I said that I get pretty far using an Excel spreadsheet converted to CSV?

I'm successfully using MongoDB on some things and MariaDB on others. Redis tends to make an appearance in all of my projects. All have their pain points and all are good enough if you are already familiar enough with them.

Getting hung up on the database technology tends to be a scale problem most people don't / may never have.

Funny that you mention JSON files. I too go that route on smaller stuff. That's usually my MVP database if it's going to be read-only.

joshtronic

·
6 years ago
·
Reply
1. 1
  
  Do you use Redis for caching purposes?
  
  ckissi
  
  ·
  6 years ago
  ·
  Reply
  1. 1
    
    Primary, yeah. Also use it a ton of persistent storage of statistical data (DAU / MAU tracking and stuff like that) as well as anything that I want to stuff in there with an expiration. Even though I use it a ton for persistent data, I still wouldn't use Redis as my primary data store (just the fact that it's 100% in memory makes it less than ideal).
    
    Been exploring using MongoDB more for the caching of things, but to the best of my knowledge (based on extremely limited research) there's no concept of a TTL / expiration date, which would rule it out for most of my use cases as a Redis replacement. MongoDB does have in-memory capabilities which is what has had me considering using it in place of Redis for caching.
    
    Hasn't been touched in these threads at all that I have seen, but regardless of which server you end up going with, including a caching layer is going to speed things up quite a bit. Feel like people lean into picking A technology instead of picking a series of technologies that can complement their strengths (Redis / Memcached in front of any RDBMS servers as a good example)
    
    joshtronic
    
    ·
    6 years ago
    ·
    Reply
    1. 1
      
      Been exploring using MongoDB more for the caching of things, but to the best of my knowledge (based on extremely limited research) there's no concept of a TTL / expiration date, which would rule it out for most of my use cases as a Redis replacement. MongoDB does have in-memory capabilities which is what has had me considering using it in place of Redis for caching.
      
      What about setting up a daily script to delete data that's older than ____ hours/days/months? I'm unaware of your tech stack so that may not be feasible in your situation, but at my previous job we had a daily AWS Lambda function that did this for us.
      
      caitlinjohnson
      
      ·
      6 years ago
      ·
      Reply
      1. 1
        
        Definitely an option, although I personally like to avoid too many moving parts so the built in mechanism is preferred :)
        
        With that, the project of mine that I was considering moving the caching from Redis to MongoDB has a mechanism in place already that clears the cache on deploy (since that's the only time things need cleared) so wouldn't take much for me to do that.
        
        joshtronic
        
        ·
        6 years ago
        ·
        Reply
        
        1
        
        Definitely an option, although I personally like to avoid too many moving parts so the built in mechanism is preferred :)
        
        That's fair! I absolutely resonate with this. Good luck!
        
        caitlinjohnson
        
        ·
        6 years ago
        ·
        Reply
1

I prefer datalog :-) https://youtu.be/hicQvxdKvnc

maxweber

·
6 years ago
·
Reply
1. 1
  
  Interesting. Never heard about it.
  
  ckissi
  
  ·
  6 years ago
  ·
  Reply
  1. 1
    
    You can think of it like https://immutable-js.github.io/immutable-js/ as a database.
    
    maxweber
    
    ·
    6 years ago
    ·
    Reply
1

This comment was deleted 5 years ago.

DeletedUser

·
6 years ago