Developers May 28, 2020

Trying to understand the basics of scaling

RidenShark

I'm developing my own SaaS product, a blog service like wordpress and tumblr. It's mostly a learning project for me to get into product development. I'd like to understand how scaling works.

Say, I have launched my product and have 10 to 30 users.

They'd all be given a subdomain alice|example|net, bob|example|net and so on. Alice may be a rockstar and she'd have huge traffic.

So to build such a high-scalable product where you can't predict the traffic influx, how should I choose my production platform? I have some experience in hosting my own server, but my experience stops there, i.e. with a single server app with a MySQL database in the same server.

I want to understand these,

  1. I have read about big companies having databases across different servers, but how does it even work?
  2. Is there like a one parent alpha server where the codebase lives and it commands and controls child servers across datacentres?
  3. Should each user user|example|net get a server?
  4. If yes, how much RAM or CPU I should choose for a droplet for each account.
  5. Is there something like set up in one server and the scaling can happen on-demand automatically without burning a hole in the pocket?

My stack for this learning project is Django with MySQL. Some tasks with Celery. I am not that proficient in any modern front end frameworks, so it's simple HTML and CSS with jQuery. As a novice, I know only about the terms Horizontal scaling and Vertical scaling in ELI5 sense.

It'd be so much helpful if anyone could clear up all this for me in simple language. Thank you in advance.

  1. 5

    There are a million ways to slice this - tradeoff's come into play quite a bit.

    Scaling is generally going to depend highly on the type of product/usage you see. In your case, a wordpress (like) clone, here is what I'd recommend.

    First, let's address subdomains (not really scaling related). Having separate subdomains per user is easily achievable via most web frameworks (i.e. Ruby on Rails, Django, Express etc...). This way you have a single server serving up your main app (https://example.com), and the subdomains (for each user, i.e. https://anthony.example.com). I would highly recommend against having a separate server per user, that seems like a nightmare and likely pricier than what you'd make per user.

    Normally, the database is the biggest scaling obstacle, especially if you're are dealing with relational databases. The good news in your case is that blog content doesn't change often (unless the user updates a blog), which gives you some leverage for scaling.

    For your scenario, you can certainly take advantage of caching. Since posts rarely change, you can throw them in redis or memcached (i prefer redis), so that subsequent requests pull from redis (as a cache) instead of your DB.

    Alt redis-caching

    I honestly don't think you need to worry about scaling for quite a while. You can vertically scale first (better/bigger hardware), then scale vertically (load balance, shard, services if/when needed).

    it's for this reason, a lot of people decide to go serverless, so they can effectively scale infinitely (i.e. lambda, dynamodb). However, the trade-off here is that it's a lot trickier to develop, deploy -- at least if you're accustomed to traditional app development. With dynamoDB you REALLY have to understand your access patterns before you start. With a relationalDB, you just need to have a good grasp of you data model, without caring much about how it'll be used.

    Happy to dive deeper in any direction, but the question is fairly high level -- so I kept it there. In general, i don't think it's something you need to worry about until you have traction. Scaling problems are usually good problems to have :)

    1. 1

      It was great reading your post. Can you please go deep further?

      1. What things to consider to scale to millions when picking a database?
      2. What are the common pitfalls and all?
    2. 1

      Thank you so much. This clarified a lot of things.

  2. 2

    For your use case I would look into a CDN like Cloudflare or Fastly. Thereby you can keep your existing system and database, while the CDN helps you to serve the static blog content to millions of users.

  3. 2

    This is obviously a broad topic and one could go on writing for hours about it. I also don't want to claim to be an expert on scaling applications but I have some experience.

    In summary:

    I think what scaling really comes down to is understanding how you are able to separate out concerns into a way that each component is able to do the maximum amount of work without effecting the other pieces.

    This of course, isn't very helpful so I'll address your points and you can feel free to ask more questions:

    First, don't worry too much about scaling in the early days of writing an application. You can easily get by with 100s, even 1000s of users running all of your components on a single $5 - $10 digital ocean droplet (server client, api, db, and perhaps caching).

    1. This is called sharding. This is probably the aspect of scaling I'm least familiar with but it usually involves a master database and N shards that each have their own specific portion of data they handle. For example split by names: A-F, G-P, Q-Z or something like that. It would all depend on the dataset.

    Another option for this is a read only database that may be a bit stale as it is cloned from the master but doesn't necessarily matter. This will take the load of redundant SELECTs off of the master database.

    One of the quickest ways to decrease load on your database is caching. Throw up a Redis server (essentially a key->value store) and throw information that's accessed often and doesn't change much in there.

    1. I think the above kind of answers this.

    2. I would say probably not. It all depends but AWS which literally runs millions of requests per second on Lambda, Cloudfront, etc. does it all across shared hosts and computing powers.

    3. This will depend on what you're running on the host. API? More CPU. Database? More storage and CPU. Caching? More RAM.

    Again, a basic host will do just fine in the early days of an application.

    1. Autoscaling is a pretty deep subject and something that I'm not too familiar with but usually when companies expect high traffic they over prepare and that definitely has it costs.

    I just want to repeat this because it sounds like you're in the early days: don't get too caught up on scaling. You will likely not make any progress on an actual prototype if you do. You can always go back and rewrite and modularize things as needed for scale at a later date!

    Like I said, feel free to ask anything if I didn't address your question correctly! :)

    1. 1

      Thank you.

  4. 1

    I curated a list of great articles on scaling in this issue of my Pragmatic CS newsletter, hopefully they are helpful!
    https://pragmaticcs.substack.com/p/pragmatic-cs-1-software-architecture