4
1 Comment

Wrote an article on webRTC architectures: MESH, MCU, SFU

WebRTC has a ton of confusing acronyms . What might not be obvious is that there are different architectures and which one you choose is really important if you want to scale past 6 participants. I basically wrote this article to help me better understand the tradeoffs and advantages between the difference architecture. This would be a good read if you are working on a video conferencing app.

Here's the link: https://medium.com/@toshvelaga/webrtc-architectures-mesh-mcu-and-sfu-12c502274d7

  1. 1

    Hey Tosh!
    Thanks for your article: never saw all of that put that condensed and clear in one piece. That is truly awesome.

    Meanwhile, let me add some tips your readers might find helpful, too.

    Which one to pick - MCU, SFU or go full mesh?
    In our team, we usually use the same pattern to decide: assume we’re building off an SFU solution (Kurento, usually), and ask ourselves some questions to see if it is the best option to pick. Some of our points match yours, some are extra.

    Why start with an SFU?
    Of all the options, SFU is the most:

    • Extensible: since every in or out stream is handled individually, you can handle them differently. Say, only record the tutor in a school webinar. Actually, you can easily develop a MCU-like add-on to work within an SFU: that’s what Amazon advises to extend their AWS Chime for multi-thousand-user audiences.
    • Scalable: you can distribute incoming and \ or outgoing streams between different servers. It means, you can not only run several conferences off a single server, but vice versa - back a conference with an array of machines, if needed.
    • Balanced in terms of client \ server performance and bandwidth.

    Why go MCU, then?
    Here come the questions that might bring you to a conclusion MCU is the best option for the case. Go MCU, if you say yes to at least half of these.

    1. Most of your conferences will be webinars \ event streams (limited number of speakers, massive audience unable to present).
      On a different proportion, an SFU with MCU-like extension will work better.

    2. Most of your conferences will have 500+ participants.

    3. You target mobile devices and \ or regions with highly uneven connection coverage (e.g. rural regions of Southeast Asia or Central Africa)

    4. Your solution will run on-premise.
      In most cases, this implies you have

    5. Most of your events are scheduled.
      Handling streams the MCU way (merging all incoming media) is not a job one could efficiently distribute between several hosts in real time. Due to this, you can maintain a service quality level via either keeping a fleet of “warm” servers (which is expensive), or enforcing some sort of scheduling to ensure your backend grid will manage to spin up required capacities (which might take time), or both.
      Example: YouTube live streaming. If you start right away, there will be a short delay before your stream will be available. If you schedule it, it will be usually reachable from the start.

    6. Your outgoing bandwidth is limited.

    7. You can estimate the max number of concurrent events pretty accurately.
      Merging an array of streams into a grid is not a job you’d split smoothly in real time.

    Ok, then mesh maybe?

    1. Privacy is number one priority.
      In a sense, that you are ready to sacrifice features for the sake of privacy.

    2. No server side recording is planned within the next 10 sprints.

    3. In most cases, the calls will be one-on-one OR will be put within the same local network.

    4. The chance a call will have over some 8 active users is nil.

Trending on Indie Hackers
How I grew a side project to 100k Unique Visitors in 7 days with 0 audience 49 comments Competing with Product Hunt: a month later 33 comments Why do you hate marketing? 29 comments My Top 20 Free Tools That I Use Everyday as an Indie Hacker 18 comments $15k revenues in <4 months as a solopreneur 14 comments Use Your Product 13 comments