11
53 Comments

Let's discuss databases? :D I would like to know about your database and your journey in using it

I wanted to check-in and ask developers to discuss about the databases being used underneath their applications. I'm very much interested in databases and most applications use them, so I have a few questions for You to start off the discussion! :D

What kind of database are you using? Relational database? Graph database? Key-Value store? And why did you choose to use it?

Which database are you using? Postgres? MySQL? Redis? Something else? And why did you choose to use it?

Do you self host your database? Why so? And how do you find managing the self-hosted database?

Or do you use a Cloud service (DBaaS) like AWS, GCP, Azure etc DB services? And why did you choose to use the service?

  1. 6

    That is a lot of questions :D Let me go through them one by one for Pirsch Analytics (pirsch.io).

    What kind of database are you using? Relational database? Graph database? Key-Value store? And why did you choose to use it?

    We use a relational database (Postgres) for user data and configuration. The database is tiny (< 200kb) right now. It's really only there to store emails, passwords, your dark/light mode preference, client IDs and secrets, and so on.

    The real data (website statistics) are stored in ClickHouse on a more beefy machine. It's about 12GB right now (we started in January) and most of the data got created by a single client within two days (~10 million entries). ClickHouse is an OLAP database and exactly what you need when doing analytics. It's amazing how fast it is. All data on our dashboards are calculated life, with no pre-aggregated statistics.

    Do you self host your database? Why so? And how do you find managing the self-hosted database?

    Yes. We use Docker and Compose for Postgres, making it really easy to set up and maintain. ClickHouse was installed on Ubuntu for maximum performance. We had to adjust a few configuration files, that get sometimes messed up when updating the database. But otherwise, it was really easy to install. I would say you should only do this if you have some prior experience with Linux.

    Or do you use a Cloud service (DBaaS) like AWS, GCP, Azure etc DB services? And why did you choose to use the service?

    No, but our VMs are based on Hetzner Cloud (really good pricing btw!). The reason you should consider setting it up yourself is pricing, really. Managed databases are nice, but can get really really expensive and you don't have as much control.

    1. 2

      Thanks for sharing your experience @m5blum ! And yes, I used to work for a client, a big company who opted to use their own hosted databases and we were one of the platform teams managing it, and their reason was cost too, as they had too many Postgres and Redis databases and few others too like MongoDB and MySQL, and also many Kafka message brokers and a few RabbitMQs. I always wondered about the cost of management though, because I remember some of our screw ups 😅 and some difficulties

      Do you think managing is not much cost? I guess ClickHouse is easy to self host and manage? Also, what kind of control do you mean when you say you don't have much control in DBaaS services? I mean, I do know that some of them don't allow customizations, for example, long ago when were managing Postgres instances, the client mentioned how Google at the time didn't allow managed Postgres to use Postgres Extensions, not sure what's the case now. And the client used Postgres Extensions heavily. Is that the kind of control and customization you mean? And / what other kind of control are you looking for?

      And are there managed services for ClickHouse? I know some databases and data stores that don't have managed services always, or even if they do, the options are pretty less, hence the question

      Btw, I have heard about Hetzner and few others too before. It's unfortunate that big corps are getting too much of the pie when there are many good players in the cloud market. Good to hear Hetzner is working out great for you! I think the cloud is all about the trust and some of these big corps have lots of it by default in many cases, unlike the smaller players.

      1. 2

        I always wondered about the cost of management though, because I remember some of our screw ups 😅 and some difficulties

        If you need to hire someone to manage that many databases, it's probably cheaper to use a service :)

        Do you think managing is not much cost? I guess ClickHouse is easy to self host and manage? Also, what kind of control do you mean when you say you don't have much control in DBaaS services?

        As long as I can do it myself, yes it's pretty cheap. ClickHouse doesn't require too much maintenance. I've done two updates this year. One with difficulties, leading to one hour of downtime, because of configuration mess up.

        What I'm looking for when setting up my own DB is control over performance-critical configuration and data ownership. I haven't done any optimizations to the configuration yet, but with ClickHouse you have many options to get the maximum out of it. This highly depends on the use case, so I'm not sure managed services can replicate that. What I mean by data ownership is that I know where my data is hosted, I have access to it, and can access the hard drive if required. I think this is important when building a privacy-friendly service.

        And are there managed services for ClickHouse?

        Sure, there is the obvious choice from Yandex (https://cloud.yandex.com/en/services/managed-clickhouse) and other bigger contributors, like Altinity (https://altinity.com/). But you can just google for offers.

        1. 1

          If you need to hire someone to manage that many databases, it's probably cheaper to use a service :)

          Haha, yeah

          As long as I can do it myself, yes it's pretty cheap.

          Makes sense!

          because of configuration mess up.

          I have been hearing a lot of people talk about configuration testing, and how many downtimes have been due to configuration issues. I guess we as a industry will soon start doing more and more of configuration testing too

          I think this is important when building a privacy-friendly service.

          Ah, okay. Do you think managed services are not privacy friendly? A lot of us do use VMs from the cloud along with storage, network etc. Using their SaaS on the cloud should be okay too, no? Or you think VMs etc is fine for privacy, but there might be less privacy at the SaaS level?

          but with ClickHouse you have many options to get the maximum out of it.

          I have heard about this before too. It's crazy how one has to get configurations right to get the best performance, but I guess it's not easy for the DB creators too, because they are trying to build general purpose software with many knobs. I noticed this interesting project which tries to solve the problem of tuning - https://ottertune.com/ . I think it used to be a research project previously, as that's how I found out about it - https://db.cs.cmu.edu/projects/ottertune/ , through the CMU DB group website

          obvious choice from Yandex

          Ah, I should have searched first. I did check the ClickHouse site and the creator Yandex site too and noticed it's a search engine, but didn't know Yandex had a cloud with services. Thanks for sharing! I just noticed there are quite some services for managed ClickHouse

          1. 3

            Using their SaaS on the cloud should be okay too, no? Or you think VMs etc is fine for privacy, but there might be less privacy at the SaaS level?

            Generally, it's fine, but in our case, I wanted to make sure I know where it is hosted and it's a German company. Sadly, Hetzner doesn't offer managed databases.

            I noticed this interesting project

            Thanks, I will check that out!

            1. 2

              wanted to make sure I know where it is hosted

              Ah yes, with data laws like GDPR and what not, it's good to know where the data is present. I'm starting to see DBaaS services partnering with cloud providers and giving more control as to where the database is being hosted, as the region where the data is present also brings with it the data laws of the region apart from the laws pertaining to the user's (data owner) nation, and when serving at global level, it becomes important to obey all the data laws

              Hetzner doesn't offer managed databases.

              It could go as feedback to them ;) Who knows, they might venture into it if they see it viable ;)

  2. 3

    I use Postgres and MySQL for almost all of my clients and side projects. I have tried Mongodb but didn't like it so went back to SQL again. Lately I have been playing around with Firebase but still love SQL and will be using it for a long long time.

    And yes I self host them on digitalocean or linode.

    1. 1

      Ah, interesting! About self hosting, what all things do you do / take care of? like disk resizing, getting alerts when cpu/memory/disk uage is high etc

      1. 1

        If you would like to know what goes into a production PostgreSQL setup for self-hosting, my book Deployment from Scratch features a stand-alone PostgreSQL demo. Here's the README: https://gist.github.com/strzibny/4f38345317a4d0866a35ede5aba99a1e

        1. 1

          Thanks for sharing @strzibnyj ! I learned a few things by managing 100+ production Postgres instances with my previous team but if I ever need to learn more, I'll be sure to checkout the book! My team did a talk on managing Postgres and some of the issues we faced - https://pgconf.in/conferences/pgconfin2020/program/proposals/142 but it was just the tip of the iceberg

          1. 2

            Oh, then I believe you won't need my book, it's probably more basic than what you did already! Cheers.

            1. 1

              I read the book's index in the site, it was good! Great work!

      2. 1

        Most of my works don't involve a lot of data. What I do basically is create a backup cretion script and store the backup. For alerts, I can set that in linode's dashboard so I get an email when the usage is over X%.

  3. 2

    I use MySql for persistent data. Redis of quick access data.

    I have built GraphQL in PHP to access the data in MySQL.

    Right now ,I do self host MySQL with back script running every 2 hours.

    I am planning to avoid self hosting of MySQL to avoid maintenance overhead.

    1. 1

      About backup, where do you store the backup? Something like S3 object storage? Do you use some tool as part of the script? Something like https://restic.net/ ?

      1. 1

        I just save them in Same VPS where MySQL is installed for now

          1. 1

            Am I doing anything wrong here ? Should I move the backup somewhere else ?

            1. 1

              There's no right or wrong, only features and tradeoffs. Storing in the same VPS makes things simple and easy to manage. Also, I'm assuming you also delete the older backup files. I think it all depends on use case and what features you need. If you think that the node disk getting corrupt is a possibility, then you may you wanna keep it elsewhere. But that makes things complex and maybe also expensive. Btw, I haven't seen disk corruptions happen personally though, in the cloud that is. Me and my team have done backups like a madmen previously without knowing what is required. I think you will have to understand your backup and restore strategy and what you are aiming for and also the criticality of the application. For example, if I'm doing a side project which has no users and is just a fun project, I wouldn't be checking out backups and restore and stuff. I would recommend reading more on backups and checking out different strategies if you really wanna focus on it. I'm still learning so I can't say much

              The last time me and my team did backups, we did two kinds - one was application level where application provided the backup data, for example pg_dump, mysqldump etc, which I think you already know and are doing, and we put the data in object storage buckets (GCS), and the other backup was disk level backup using cloud provider's disk backup feature, and this is something that the cloud provider takes care of and they also optimize how they store the backup and how to keep it safe etc too I guess. And we used to keep the GCS backups from long time ago too, no cleanup of very old backups. We also didn't have a very clear restore strategy, atleast for restore of disk backup. Restore of application backup was straightforward as applications provided tools and took care of that.

              I recently noticed an example of backup to S3 over here - https://litestream.io/ and that's how I asked if you backup to something like S3.

    2. 1

      Thanks for sharing @sachingk ! What's the overhead specifically in the maintenance by the way? Just curious what's the exact pain point

      1. 2

        I just don't want any failures and data loss. I am thinking if I use Data base as service, then I would stop worrying about

        • Scalability
        • Backup
        • Reliability

        And focus on Application and logic.

  4. 2

    I use just PostgreSQL if I can get away with it. It's versatile. Redis for non-persistent data. There is very little reason to deviate from these two for most projects I might be making...

    1. 1

      Thanks for sharing @strzibnyj ! I can clearly see Postgres and Redis as big projects in the database domain. I'm just wondering if the new and probably immature (compared to Postgres and Redis) projects will get any fair chance to show their might if they are actually good

      1. 1

        For example, there are projects like TiDB, TiKV, etcd, CockroachDB and more. https://dbdb.io/ has a whole list of databases, but most wouldn't have even heard of many of them 😅 Not to mention, I'm also interested in looking at DBs which are simple with very specific features compared to all-in-one general purpose DBs and trying to see how they all compare with each other and tradeoffs they all come with. Not to mention, most DBs are general purpose to a good extent so that they can work with different kinds of data and for quite a good set of use cases, and I guess it's for a very good reason - so that one doesn't have to build a database for every new app they build, which is apparently the main reason how general purpose out-of-the-process databases came into being

        1. 2

          PostgreSQL's NoSQL features are now faster than some single purpose NoSQL databases. It's really hard to complete. Why would I start with any other database?

          That said, things change all the time and we might get some new good database one day.

  5. 2

    This is an interesting question that is close to my heart because I spent the last 2 years solving this problem for developers building.

    As always, every case is unique and you should always use the best tool for the job but after spending my whole career building apps I know generally what works and what doesn't work.

    The short answer is: Relational database + GraphQL.

    Why?

    Most data is relational in applications being build.
    GraphQL is by far the most powerful way of querying and data.

    The problem is building a GraphQL server is hard and takes a lot of time.

    This all changed a few years ago with tools like Hasura GraphQL Engine, Prisma, Fauna that automatically generated a GraphQL API based on your database.

    For GraphQL it's easy. There is a protocol and there is good tooling and clients like GraphQL Code Generator for TypeScript and Apollo Client for requesting and managing data.

    For database, I'd recommend any modern SQL database. PostgreSQL is the most loved SQL database amongst developers.

    I told you at the beginning that I spent 2 years solving this problem for myself and other developers. You can learn more about that at my company nhost.io.

    1. 1

      Interesting! Thanks for sharing! It helps us all learn :D

  6. 2

    Postgres pretty much anything. Supports JSON and pretty much anything you want a DB for.

    1. 1

      How do people get around Postgres not supporting mult-master configurations? If you only have 1 master node and multiple slave nodes - and the master goes down - it will take down the ability to do all writes?

      To me this seems quite a big limitation that will always me go with Galera Cluster / Mariadb SkySQL

      1. 1

        What about sharding? It's possible through Postgres Extensions like Citus - https://github.com/citusdata/citus

        1. 1

          But yeah, I agree that I have heard a lot of people mention about how Postgres was not built with distributed database in mind. But I think it might change! I remember hearing about how Postgres will get multi master feature soon (long ago 😅), and that it's already present in a paid tool, I think this one - https://www.enterprisedb.com/products/postgresql-automatic-failover-manager-cluster-high-availability , search for bidirectional replication or multi master in that web page

    2. 1

      Interesting! I know lot of people who bet on Postgres. I have been on platform teams and have seen most backend developers use Postgres and Redis a lot for their high traffic services. Do you think a purpose built DB can outperform Postgres and the likes which are more general purpose and support a lot of things? Just curious and want to learn from your experience. And yeah, I know that Postgres is pretty mature as it's been around for long and has been battle tested by tons (almost the whole world I guess) in production environments, not to mention there are also few other such mature DBs. But yeah, these days, there are too many options out there, some general purpose, some very specific, what do you think about that in general?

  7. 2

    MongoDB :)
    NoSQL DB with self-hosting or cloud-hosting options available.

    You can deploy new connection in minutes using their cloud solution - MongoDB Atlas - and it even gives you 3 parallel instances on a free tier to maximize up-time, which is neat :P

    1. 1

      I did give MongoDB Atlas a try once to checkout how the getting started experience is and it was indeed neat! I think most DBaaS options are pretty much thriving to be fast when it comes to the time to get started with the Cloud DB including the deployment time which can take some time but it's getting better and better. How do you find the experience of using MongoDB Atlas in production / a live app?

      1. 2

        Have been building for 5+ years on MongoDB and never had any issues in production

  8. 2

    I use FaunaDB just because it seemed to be SaaS oriented, and originally I thought they also have native solution to multitenancy, which turned out to be misleading documentation... They actually admitted that to me that they should probably change the phrasing there.
    It's not bad, and you can start for free while your project is small, but -

    1. The learning curve wasn't easy. There's a lot of things you have to know in order to use it properly.
    2. I don't like all the user functions. I feel that major part of my code is not managed property in git etc, and it's very hard to test changes because it's not part of a new version that you can test before production

    In a new project I'm using firebase which so far is great.

    In addition, I use cloudflare key-value. It's super fast and globally available on the edge servers. For anything that k/v is a good fit, that's my first choice. I use it to make some data available for serving in run time, even when that data is stored in another DB. It makes things fast, more highly available, and also saves me read operations from Fauna/Firestore.

    I also just started to use influxdb for events that I can later query for analytics. Writing was easy but parsing query results is a mess because they return "annotated csv" which is a csv that can have multiple tables with different schemas, and each table can have more than one header row for all kind of metadata. I didn't like it at all. I think next time I'll try Amazon timestream for that purpose.

    1. 1

      Wow! Thanks for sharing the interesting experiences @Danbars ! :D I'm gonna checkout the cloudflare kv store which I haven't heard of before

      1. 2

        It's part of their workers deal.
        I actually also use their workers for serverless backend code. It's free / $5/month and you get A LOT. It's very fast and supports 9 languages. I use it with js.

        1. 1

          Ah yes, cloudflare workers is popular I guess, basically any serverless thing has become popular ;) Even serverless DBs ;) Thanks for sharing! I'll try it out sometime! I only read about workers in their blog when it came out

  9. 2

    Firebase for simple stuff. AWS for complicated stuff.
    Most times Firebase just suffice, I've my own wrappers for Firebase SDK to allow for POJO interaction with it.

    Reason to choose firebase - ease of use with web apps and generous free tier.
    Reason to use AWS - I know it quite well (I worked for amazon for several years in the past) + generous dynamoDB free tier + generous AWS lambda free tier.

    1. 1

      Didn't know about DynamoDB free tier, will check it out! And thanks for sharing the experience!! :D

  10. 2

    Im currently using Fauna, JAMStack focused. Easy to use and great free tier.

    I was sponsored by them to do a set of videos and after that I was hooked and bet on them for my SaaS. Wrote all the DB functions, indexed and collections in 1hr.

    Would recommend!

    1. 1

      Interesting! 1hr?! :O Nice!

      1. 2

        Yeah I’d say someone who hasn’t even used it before could probably get a fairly complex data structure under control.

        It’s just collections of documents, but the power is in how you request it. It’s transactional api calls, so if your using JavaScript based back end or frontend you can just write a very intuitive function call e.g

        Get(Collection(“collection_name), (“by_this_id))

  11. 2

    Firebase Firestore - after coming to the realization I don't want to self-host.
    Used to use MySQL for most things. For professional work with a team where hiring a DBA is a future possibility I've used mongodb, orientdb, aws neptune, mysql, postgres (and sqlite) only one I wouldn't use now is orientdb

    1. 1

      Ah! And yeah, self-hosting can be hard when the app becomes pretty critical and DB also in turn becomes very critical and not some toy side project

  12. 1

    For deposur.de we use DynamoDB on AWS (fully managed key-value database).
    It has several reasons:

    1. We use Amplify and AppSync. With VTL (Apache Templating language) we are able to resolve the fields in GraphQL and DynamoDB is the go-to solution here.

    2. Performance. DynamoDB is highly performant and if you want you can have global tables which are everywhere available.

    3. Cost. Typical cloud-native argument. The costs are really low if you don't have many write or read requests and you don't have to pay for any instance which runs 24/7. You will only be charged by requests.

    I think many of these arguments can be deflected with Aurora Serverless or other products. I am a huge fan of DynamoDB and know it quite well by now and I love working with it :-)

    The main con really is that you cannot do any aggregations on the fly without any overhead. So it is really a OLTP and not an OLAP database.

    1. 1

      Neat! Thanks for sharing @Sunny_ ! I have seen some more people praise about DynamoDB from AWS, I guess they are doing some things really well!

  13. 1

    I've been having a good experience with MariaDB, I did have a MongoDB stint that was pretty good and I may use it for some configuration documents in the future, but all my e-commerce data is handled by SQL.

    1. 1

      Makes sense! I have heard great things about MariaDB and MySQL too!

Trending on Indie Hackers
I will promote your startup to 50K+ people 216 comments I made Session, a productivity timer that makes $5K/month in net profit, AMA! 43 comments 📈 We raised $500K pre-seed for our Reddit Marketing Tool 15 comments Steph Smith on making $130k w/ an ebook, creating a course in 20 days, and the latest trends 10 comments Feedback, please: does the landing page explain the product? 6 comments Replacing Substack? 5 comments