Hello all
for those of you that are running Saas what server architecture do you use..
what I mean:
How many servers do you use for load balancing and for fail over ?
Do you use DNS rauting for load balancing or just reverse proxy ?
How do you avoid single point of failure ?
Thanks
It depends on how mission critical your app is.
For "starter" apps, a single server is sufficient in my experience.
For apps that need to be up "no matter what", then generally speaking you do need 1 LB backed with 2 or more app servers. This would be the bare minimum.
You can really go up from there in all sorts of directions.
Heroku is a great alternative if you want to leave dev ops out of your dev cycle.
Hatchbox is great option if you want to tinker around with your servers.
Cloudflare came out with an option to load balance at the DNS level. I haven't tried this myself and I believe they charge per query. You can look it up on their pricing page. You'll still need a way to persist your sessions, however.
If you want to deal with bare metal yourself, then Ansible and similar products like that are okay.
If you have a more complicated set up, definitely having some sort of deployment and infrastructure automation is nice. My team has experienced down times every now and then in the past. For important apps, we've structured our automation so that we're able to resurrect our entire infrastructure in about 30 minutes.
For failover, you'll want to run two sets of infrastructure that are identical at the same time. We have a production app that does this and does this where an entire stack is on stand-by 24/7. Suffice to say, I wouldn't recommend this from the get-go.
If a failover happens, which you'll need to monitor with something like heartbeat, you'll need a way to switch over the traffic. This can be done at the DNS level or if you have a load balancer/API set up where you can switch it. There are a couple of articles on this around the web.
If you're using Amazon EC2, you can just remove instances from the LB and add the new instances to it.
One thing to note, again this is EC2 specific, is that the LB health checks seem kind of slow. If for whatever reasons they fail, it'll take your instances out of rotation and you'll be scratching your head as to why nothing is working.
My preference is to have an infrastructure design that's flexible enough that I wouldn't need a hot standby, but could reconstitute the whole set up in a very short period of time. It keeps things simple and less costly.
+1 for Hatchbox. I've had multiple clients rave about it and I'd be using it myself if it weren't for the fact that it far outstrips my current costs and I want to run multiple services and customize Ubuntu on my own. Especially if you're running on Rails, I'd strongly suggest moving from Heroku to Hatchbox once the $7/month isn't enough for you.
Big fan of serverless services these days. Firebase is a great option if you are just getting started.
How many people are using your Saas or how many do you predict that will be using it at launch?
If you are starting out don't worry too much about this. Keep it simple and cheap. Get your product off the ground first.
I have more experience with AWS, so with that in mind I would recommend deploying your app using RDS, Elasticbeanstalk and S3 (if you need to host files uploaded by your users or need a CDN for static assets).
You can start with just 1 server for each and you can set up auto scaling from the get go so you are covered if you ever experience a traffic spike. Start with the smallest instances and monitor their usage to adjust the instance size to fit your load and keep costs down.
For me, the upside of AWS is that they have all the other services that you can use under the same roof.
Do you need to setup DNS for you domain? You can use Route 53.
Send emails from your system? You can use SES.
The setup is very easy and you can get redundancy from the get go.
I have been using AWS for the last 3 years deploying small business applications (for customers) used by up to 10,000 users a day, running off a single instance, and never had a problem (the apps are not mission critical and it's not the end of the world if they go down for 1 hour or so - but they never did!).
Your mileage and use case may vary so this is not a so solution that might work for all projects but has worked for me and is definitely the route I suggest when starting out.
My projects are built in .NET C# with SQL Server (corporate customers) or Node.js and MongoDB.
Downtime: 0
This is the kind of thing my platform of choice (Elixir/Erlang VM in general) is specifically tailored for. You can get a lot of the same benefits with other systems using the Actor model (e.g. Akka, etc) and you can get some of the same benefits with containerization strategies. Obviously there are a lot of other levels to consider in making a reliable system.
Not knowing how many nines of uptime you need, I'd say at the minimum, you need to
automate your deploy so you can rebuild everything on a new server in just a few commands
back up your database and stored assets regularly
set up an monitor that will send your phone alerts if the site goes down
If you need still more uptime
save entire images of your server regularly
be prepared for DDoS attacks, using something like Cloudflare
load balance multiple servers
invest in a premium logging service so you can diagnose and fix problems more quickly
use an expensive service that gives you a SLA with guaranteed levels uptime and support in case of outages
In my case, I run a content-based membership site with a few daily and weekly recurring tasks that get run. I only aim for 99.9% uptime, since the cost of longer outages would only be embarrassment and possibly giving existing members a few free days of compensatory service. No businesses would be ruined and no lives would be in danger.
So with this situation, I run everything on a single $5/month DO droplet, with automated backups for another $1. I log all business critical events to Keen.io, so in the extremely unlikely case of Digital Ocean losing not only my server data but also my backups, I could rebuild everything from scratch.
Thanks
How big is your DO droplet ?
So you say your DB + web server + every thing else on 1 server ?
It's the smallest one, listed at the top here.
Yes. I'm running four separate apps off of that one server. I'll probably break the biggest one off onto its own droplet soon
Question. How do you know when your droplet is at capacity? Is there a rule of thumb to follow? Running a DO droplet currently and everything is fine... I think
DO has dashboards and my server also logs various information.
Thanks so basically when things are peaking it's time to upscale?
Memory is a special case since usage doesn't change much with traffic.
The simplest thing to do is use a load testing service like loader.io and make sure your site can handle 10-100x the traffic it's currently getting. If it can't then either optimize or get better hardware. And spend ZERO time worrying about handling 1000000x the traffic you're getting. Just be ready for the next order of magnitude.
Thanks!!
also you can you Jmeter which can simulate many scenarios of load testing
Thanks I'll check that out
Here is a blog post describing the architecture of Archbee https://archbee.io/blog/archbee-architecture/
Missive (https://missiveapp.com/) is a team communication app that syncs with every email providers (Gmail, Office 365, iCloud, IMAP, etc) and third-party messaging ecosystem like Twitter, Facebook and Twilio. It offers draft collaboration, chat, etc... A lot is happening on the backend.
We host on Heroku. Our stack is lean and relies on proven technologies: ruby, rails, sidekiq postgres & redis.
Why Heroku? As a small team of three, we preferred paying a premium and have less to manage. We're currently in the process of migrating some of our stack directly on AWS to gain some flexibility and help manage our growth.
I wrote a post last year titled: The boring stack, the fun architecture, explaining how we use interesting technics to offer a live UI without flooding our API with none stop requests.
https://medium.com/missive-app/the-boring-stack-the-fun-architecture-b793c803462e
We are proud of this architecture, it has proven itself to be resilient, simple and fun to work with. Even if you are using an “old” framework not as shiny as the new kid in the block… there is always a creative way to make things work. In our case, we embraced the Pusher platform and implemented an architecture that would minimize the data exchanged between the frontend and backend.
great article ,
question why don't you just develop the "pusher" server ?
you are putting to much Weight of your infra on third party
what will happen if they change pricing or go down ?
We use only basic Pusher functionalities. We could probably move over to another provider in ~ 1-2 weeks. Or simply manage this ourselves.
It's very easy to over engineer the infrastructure when starting to work on a project, wasting time and money. Few projects need stuff like fail overs, load balancers etc. definitely not before reaching a certain, much bigger, scale.
Dokku is simple tool to get started with https://pawelurbanek.com/rails-heroku-dokku-migration
I'm no devops guru so I constructed my service specifically to avoid having to handle scaling myself. By relying on other services for the critical components I don't need to do failover or redundancy myself and scaling is easy.
I built on firebase, serving the client statically as a single page app that can query and update the database directly. This is all served and scaled by firebase.
The heavy lifting of actual book layout is deferred to asynchronous workers. The clients add work items to a queue in firebase and the workers take them one by one and post results back to firebase. This decoupling lets me keep the workers stateless so I can spin up any number of workers depending on needs.
Currently I let Google cloud manage and autoscale a pool of preemptible instances (cheap!) but I could just as easily spin up workers on DO, AWS or even at home.
It's not an architecture I see very often and it probably won't work for every service. If you have a workload that is lumpy and asynchronous it might suit.
I love Convox for setting up production-ready and secure infrastructure on AWS: https://convox.com
I use them for FormAPI: https://formapi.io
You have to start with 3 instances for high availability, so it starts out costing a bit more than Heroku. But it's much cheaper in the long run, and the auto-scaling feature is awesome. I never have to worry about traffic spikes while I'm asleep or busy. (At least for now. Eventually I'll probably have to do some database sharding, but that's a long way away.)
I started on Heroku, which is also great. But I wanted to use my free AWS credits from Stripe Atlas, so I switched to Convox and AWS. Would highly recommend it.
Heroku