Product Development September 16, 2020

A problem I've never had before - Too much traffic AKA Database Death

Mick @Primer

https://thehackstack.com is up and down like no-ones business today.

Bizarelly I'm not getting THAT much traffic but every visitor is looking at a lot of pages.

I'm having to learn on my feet here. Anyone good with this kinda shit raise your hand LOL

🤣

  1. 7

    When this happens, the culprit is (in order of most likely to least likely): (1) poorly-constructed queries. this includes queries that have a time complexity of O(n^2) or worse as well as queries that lock rows for a long period of time. these can be found in your slow query log or by issuing a SHOW PROCESSLIST; command during times of instability if your database is MySQL. (2) too many queries. you should not be making 30 round-trips to the database on a page load. typically the limit should be 2 - 4. consider re-structuring your database operations so you're not hammering it on each request. ORMs can be particularly bad at issuing many queries when a better-thought-out request would only issue one or two. (3) dangling connections. if you're using connection parameters which leave the database connection open to "improve performance", it's easy to create situations where the connection limit is broached prematurely despite lower traffic. don't use popen operations unless you know what you're doing. (4) poor database configuration. even if you've set max_connections in /etc/my.cnf for instance, there are situations which can cause MySQL to ignore the given parameter and use a smaller default. (typically 214.) logging-in and issuing a SHOW VARIABLES; or SHOW VARIABLES LIKE 'max_connections'; can help you diagnose this sort of problem.

    And always check your logs!

    1. 2

      This is amazing. Thanks for taking the time!

    2. 1

      Great advice. I was about to ask how many queries you were making to load a page, but #1 on this list is indeed the place to start.

      This may sound like a crazy suggestion, but you may actually be able to get help if you livestream your attempts to debug it on Twitch. I'd never have thought of that a few months ago but I've stumbled across a lot of collaborative efforts there recently.

    3. 1

      Upvoting the one actually helpful answer!
      Kudos to maxk42!

  2. 1

    No idea, I did add to your problem though. Congrats on 5 dozen hacks 😉

    1. 2

      Lol thanks man I appreciate it!

  3. 1

    One trick I used, started decades ago working at Yahoo! Movies, is to auto cache the rendered page every 5 min and auto switch to serve that static page if latency reached certain threshold.
    Simple, stupid but still works for certain situations today. 99.9% uptime! 🤣

    1. 1

      Yep. This is what I was going to say. Too many people do full query of all data to render every single page load. When your page may not actually change that often. Cache it and render it from cache, only call the whole query when necessary or in the example every 5 minutes. Will help out a ton. Reddit, Stack Exchange, etc. are not calling the DB on every page load for all the data.

    2. 1

      This is a great idea and it's actually what we did at my last day job. It was a big media company and if someone famous died or there was some big celebrity event our site could go from 2000 visitors to 500,000 in seconds. It was crazy!

      Took us a long time to make the platform stable and the thing that we ended up doing was something like what you described.

      Most of the time we were serving straight up html. Never touched the database.

  4. 1

    Some quick thought, in addition to the good suggestions here.
    I have seen you tried caching already, that should help.
    Typically writing is more taxing to the db than reads make sure your writing is optimised, one way is to use transactions and batch your writes.

    Buy generally speaking you need to dig it deeper to find which operation/s is taxing your db the most in terms of time/cpu. May be you can optimise it or handover some of the calculations to code instead of db server.. good luck

    1. 1

      Thanks for this reply... it has inspired me to setup some orbiting read databases and have a master "write" database.

  5. 1

    I see you got it sorted with query caching - good job.

    To catch any issues that may arise in the future as users scale make sure to monitor the site - in particular http monitoring to catch usual outages and content (keyword) monitoring to catch on-page errors - such as MySQL 'to many connections' errors. Check out Downtime Monkey for these - hope you don't mind the plug!

    1. 2

      Don't mind the plug at all - isn't that partially what this place is all about 🤓

      I will check it out for sure.

  6. 1
    1. Add caching.
    2. Install Newrelic APM, it will highlight any show queries automatically (it's free).

    Both of the above is the fastest way to resolve this.

    1. 1

      Thanks man. Added caching yesterday and that helped hugely. Will add new relic today - good shout. I didn’t know it was free.

  7. 1

    If you have code control you can add an APM like NewRelic to find what's going on..
    like if the DB needs indexing or some code is looping too much or whatev...
    (better partial page cacheing than query caching ;) )

    1. 1

      Thanks man. Adding new relic today.

  8. 1

    Then probably not the best idea to post it again in here! :D
    It's difficult to help you with this lack of information

    Keep it up, Mick! You got this!

    1. 2

      never let a disaster go to waste

      :P

    2. 2

      HAHA I guess you're right.

      Well regardless I quickly implemented some query caching and we're all good!

      1. 2

        :) What is your stack?

        1. 2

          Laravel and MySQL hosted on Heroku

          1. 1

            Cool! Thanks for the response!