Scalability: What Makes a Platform Scalable, Why It's Important, and How It Benefits You
Scalability is a core part of our product offering that was one of our main priorities from day 1 of building our platform. A quick check shows that our website uses the word 'scale' no fewer than seven times. Everyone wants to be scalable, right? But what does this actually mean in the context of interactive streaming? In this post, I want to talk about what scalability means to us, how we build scalability into our platform, and how this benefits our customers and users.
To our team at PureWeb, scalable means that every person who wants an interactive stream can get one, anywhere in the world, with little to no wait time. This is as true for two users who want to collaborate around their architectural model as it is for the 10,000th streamer to join a large live event for a product launch. Everyone gets a stream, and no one waits - that's our goal.
Achieving this goal is not straightforward. To achieve the best experience possible, a big part of it is having access to enough of the right compute infrastructure available at the right time and being able to optimally route users to that infrastructure. Other factors like system and data architecture and code performance come into play.
As we start to dive in, it's important to acknowledge that "scalability" is an umbrella term that covers both "vertical" scalability and "horizontal" scalability. What do those mean, and why do they matter? Read on.
Horizontal Scalability – a.k.a. "Do More with More"
Probably the easiest type of scalability to reason about is horizontal. This really hinges on our ability to access and orchestrate large quantities of GPU-powered infrastructure around the world across multiple clouds. In the beginning, our platform grew in lockstep with AWS's deployment of GPU-powered instances around the world. This global expansion was beneficial because it meant we could provide better streaming (read: reduce interactive latency) by putting compute closer to our end users. It also meant we could serve a greater number of concurrent streams to our users.
The real test cases for this kind of global scalability are digital events serving large geographies. This is because in a digital event, you usually have a very large (1500 – 15,000) streamers showing up all at the same time, for a fixed duration. In these circumstances, having users wait for a streaming session is unacceptable because their event is going ahead whether they're there or not.
Asian Sky Group, for example, connected with Mytaverse, a Miami-based company that delivers a cloud-based solution for immersive 3D and multiplayer, true-to-life work environments. Together, they developed the Asian Sky Group Virtual Event & Conference (ASGVEC), which virtually welcomed 50 exhibitors and 1,200 unique visitors from all over the world. [Check out the full case study here]
To ensure that events like these go off without a hitch, we work with our cloud partners for anything that needs more than a few thousands users. Additionally, as we've learned in doing these large-scale events, the amount of GPU infrastructure in any one cloud region is not limitless, and these sorts of limitations have informed our cross-cloud expansion strategy onto CoreWeave, ensuring that we can always stay comfortably ahead of the needs of our most demanding use cases.
Vertical Scalability – a.k.a. "Do More with Less"
Scaling vertically is a trick from the earliest days of PureWeb, wherein we developed an ability to stack multiple streaming sessions onto a single server instance. If you can get even two sessions on a dedicated server, you're either doubling your scalability or halving your costs, and the benefits go up from there.
We regularly work with customers who are highly invested in optimizing their Unreal Engine or Unity applications. This optimization does more than simply result in better framerates and faster load times; it can also be parlayed into a higher level of concurrency on a server, which can be both a powerful scalability multiplier and a huge cost saver.
Our current customer record is held by Cvent - the largest event and hospitality technology platform in the world - who were able to optimize for a solid 4 concurrent users on a single server instance.
Vertical scalability offers you a very powerful lever in making sure every one of your users gets the experience they're after, without having to rely on just throwing more compute infrastructure at the problem.
Elasticity – a.k.a. "Get Big, Quick"
Elasticity sits on the other side of the coin from scalability. If scalability is all about how big you can go, elasticity is about how quickly you can get there. In the realm of interactive streaming, elasticity is about dynamically allocating resources to accommodate the fluctuating number of users—expanding quickly during a surge and contracting when things quiet down, ensuring we maintain efficiency without sacrificing user experience.
At PureWeb, elasticity means more than simple auto-scaling. Yes, use that too. But we've actually gone as far as building our own auto-scalers within our platform to accommodate the unique demands of scaling high variability stateful workloads that are Unreal Engine and Unity remote streams. We've also invested heavily in optimizing startup times for all our infrastructure so we can more rapidly deploy and provision infrastructure to support changes in demand.
Traditionally, the scalability of a software system usually gets put in the bucket of "non-functional requirements." It's not a feature that we can point to and say, "Yes, we have scalability," but rather it's a system characteristic that has to align with the goals and expectations of our users.
While we could be “more scalable”, what's really most important is to make sure that we're scalable enough. What that means for us is that we regularly load test the Reality platform to ensure that we can comfortably exceed the demands our users put on our platform. In real terms, this means that we regularly test to ensure that we can support roughly 10 times our typical baseline load (the average number of streams we see in a week) and 2 times our peak load (the largest number of streams we've ever concurrently supported on our platform). Every time we run these tests, we find areas for improvement and optimization, and based on the demands our customers place on the platform, we prioritize those improvements to make sure we're always one step ahead.
Interested in learning more about how our platform can help you reach the audience you want? Get in touch to speak with one of our experts today.