At 2,000 and climbing - Success Setting Up Environment for Load-testing AWS Jitsi 1000 Video Users - Onward to 5000+ - However Jitsi fails at 650 users standalone all-in-one

I am working on a longitudinal series of projects that heavily involve the use of Jitsi, especially in educational settings. This is for large scale environments that have highly demanding loads, and are intended as Zoom replacements in the USA and worldwide. This is just one of a number of components. Today was great to finally reach the 1,000 simulated video users load test basic baseline. Next goal is 5,000+, and then adding in additional features and still trying to keep it all working on this scale (recordings, transcriptions, closed captioning, intricate chat options, complex permissions managements, file management, and much more). Great things happening!

Jitsi blog logo

UPDATE 20210605: Now at 2,000, and still climbing.

Pardon me while I perform a hamster dance of celebration after many weeks of extreme frustration from a long list of interference factors.

I finally managed to get 1,000 user/nodes as simulated load against Jitsi setup and working in an AWS environment!

This is a success milestone. However as for scaling up more, unfortunately I hit the Fargate Spot Instance limit at 1,000 nodes, so I now have to go through the approval process to try to raise the limits to 5,000 if possible (or whatever we can get, if lower). I can still test out other scenarios and components at this level until the limits are lifted.

Once I have benchmarked the basic core Jitsi at 5,000 video users (or as close as possible), I can extrapolate the rest because the horizontal scaling is more predictably linear (while the vertical scaling is diminishing returns and increasingly prone to worsening entropic factors lessening capacity planning reliability).

I am so glad to finally get the rest of the issues out of the way and get that load test working at long last!

Many thanks to my current employer for providing the resources, and doing all they can to clear the road for building up this real capacity data and body of knowledge.

I am performing MANY capacity analysis and planning scenarios on many different Jitsi configuration approaches. Especially as different Jitsi releases come out.

Using a combination of many tools, including but not limited to:

* Selenium 3

* Terraform

* Jitsi (and many related components for recording, CC, and more)

* AWS ECS, Fargate, EC2, Cloudwatch, etc.

* Chromedriver headless

* Java, Maven, Malleus Jitsificus, etc.

* Grafana, and more.

Before testing properly tuned and scaled setups, I am baseline testing simple, default installed all-in-one Jitsi installs with the most basic features for video-conferencing and no extra features.

So far the rough overview is that even smallish AWS t3 instances support up to 200 users, lower-end m5 over 600, and slightly beefier should support around 1,000. This is without any tuning, scaling, etc., just all-in-one basic Jitsi server!

However there appears to be a configuratio or bug issue in the current version that peaks around 600 users for an all-in-one standalone Jitsi setup. No matter how much I scaled vertically this did not improve, all the way up to C5a.24xl! I am spinning up a separate blog posting on this issue, and will be reaching out to the Jitsi community for suggestions on resolution.

I can still simulate 1,000 users, but at around 650 they start dropping in and out of the rooms, and by 1,000 it is very frequent. However, I think this is a performance tuning issue at this point in the default install, more information pending.

I am definitely seeing the diminishing returns with each vertical increment, as expected. The plan is to take this up to the maximum, documenting the results of each incremental step. That then makes horizontal scaling options much easier to calculate best bang for the buck on very large scales.

Once that pathway is maxed out, I (subsequently others down the road) will then be filling in the performance blanks for a wide range of use-case, scaling, and add-ons scenarios, including many add-ons such as recordings, recordings management, real-time closed captions, offline higher-quality transcriptions, xR (VR/AR/etc), BCI (brain-computer interface) and other accessibility tools, robust chat features enhancements, moderations and permissions management, file management, UI & UX customizations, and much more. Repeatedly testing these changes against the baseline.

Each time a significant new version of Jitsi comes into out in the dev or stable branch that we want to consider integrating into our environment, I will be repeating a subset of these overall baseline and more advanced tests to watch out for any gotchas that may sneak in that could significant impact performance, reliability, etc., hopefully long before it ends up in production.

Eventually this should get automated and taken over by others, but I am very glad my current employer hired me to really dig into this great product. They reached out to me because of my my previous success using Jitsi to support 20,000+ concurrent users at last year's global conventions where I helped build the platforms that hosted the events.

The educational settings with LM have significantly different and varied scope, and thus significantly more load challenges to overcome for scaling.

It is great to be working with such friendly, hard-working, talented, and terrific people at my current place of employment, and the associated clients for this project! It was great chatting with one of LM's CEOs (Sam) and getting to connect a little. I am excited by his vision and hoping to be able to help contribute to realizing his broader altruistic goals through these tools!

Since I started in February 2021, the hours have been blindingly long (100+/week). I hope this will improve soon, but other than that, I really do enjoy the people I am working with, and other than the hours, it is a great place to work!

I am also advocating to everyone, whenever it seems appropriate, to get for permission to give back to the opensource community as much of our work as possible. I am encouraging the company, team members, even clients, to develop a culture that is open to allowing non-proprietary components to be submitted back to the opensource communities from which they came. Pointing out the many benefits of doing so, especially to keep and improve the overall technical opensource ecosystem's health. Even if it means pointing out the benefits to them from the "enlightened self-interest" perspective, if mindful holistic munificent altruism is sufficient. :-)

Cheers for now!