Dave Brown remembers when Amazon executives would pop into the small EC2 office in Cape Town, populated by the entire team of fourteen, and tell them they would be working for a billion-dollar business one day.
Fifteen years later, as Brown looks back, he says he feels silly for finding that laughable. Today, at well over $50 billion, Amazon itself would not be profitable without AWS and plans — including a more physical presence to large customers, an ambitious chipmaking spree, and continued focus on anticipating workloads — are to keep that trend kicking.
At the core of all AWS future plans is also a focus on latency. For Brown, that is where he started in 2007 and even in 2021, the network is still the biggest challenge. While Amazon has tried to get around latency issues for its largest customers the old-fashioned way (Amazon Outposts and Local Zones that bring the datacenter single-digit milliseconds away), Brown and his teams at EC2 are still focused on what workloads will be most demanding in this regard, and what else they might do to meet those challenges.
Latency and scale run hand-in-hand, and have since the beginning when the first “rack” of EC2 was a stack of laptops to test the early control plane. The first instance type they developed for Amazon’s own diverse teams, which were strapped to get hardware access, didn’t even have a name. While they figured out how to get four users on a single machine, those very-early 2000s problems of big hypervisor performance hits and networks that couldn’t keep pace with processor capabilities at scale were already rearing heads.
“We were 300ms away from the nearest Amazon region so every single keystroke we typed into EC2 had that round trip time. We got very good at typing ahead of what we were seeing on the screen,” Brown tells us. In those days, despite the early availability of VMware, which hadn’t been tested at scale, they adopted the Xen hypervisor. At the time they were more worried about orchestration, but as the first startups came on board EC2, performance became paramount.
“Around 2008 or 2009, we had enough customers talking about things like network jitter and other problems that arise with a software-based hypervisor. We started a journey then to optimize some of its components and it has taken a decade for that journey to complete.”
That 300ms starting point has evolved into essentially bare metal performance, nearing single-digit millisecond latencies — and that is because of smart offload. With the arrival of SmartNICs in 2010 or so, Brown and team were on a roll, offloading all functions (storage, network, EBS volumes, etc) to custom offload engines, most recently built on Arm.
“What all this meant is today a server within EC2 where the customer gets to use 100 percent of the core. There’s no AWS software running on the Intel, AMD, or Graviton 2 processor. This means we are in a world now where customers get the same performance they would get on a bare metal machine,” Brown adds.
Despite all of this progress, the scale + latency journey has been bumpy — with several iterative developments as workloads on EC2 expanded in size, as latency demands grew, and as Amazon kept adding instance types.
From that first M1 Small instance (even though it didn’t have the name at the time) in Brown’s small office until now, Amazon has created four hundred instance types.
The goal, he says, is to make sure it has an environment for every user, no matter the scale, and to be able to predict the needs of future workloads before they emerge. That next wave of applications will be the most latency-sensitive as Amazon seeks real-time ML and processing from autonomous vehicles, gaming platforms, and IoT devices. For its largest customers with more intense latency requirements — for instance in Hollywood where film crews finish shooting, upload to a physical EC2 zone right outside their door, and let artists do their rendering right there — latency is about (literal) colocation. For the rest of emerging workloads, the technology side is still being fleshed out.
Getting latency right at scale for true cloud (not on-site services) is a bit of a game of whack-a-mole, even fifteen years in. Brown says that scale is relative. Even though they operate at massive levels, even at the beginning scalability and latency gathered to form pop-up challenges.
“Even by the end of 2007 — and our service was tiny then, especially by comparison — we were running into scaling issues with networking devices,” Brown, who today is the VP of EC2 at AWS, tells The Next Platform. “The reason was that the rate of change in the cloud was so much higher than devices were designed for. In a normal datacenter, you’d update elements once per month or week, all by hand. By early 2008 we had to do that every few seconds, sometimes anytime a customer launched an instance. Now it’s a million times per second.”
And so began the AWS foray into custom hardware — one that is still underway with Graviton, Inferentia, Trainium and user-facing processors and also on the backend for network and storage in particular.
On the custom NICs and storage processor sides, AWS is getting around some of the latency and data movement bottlenecks for its cloud users, at least for now. Short of building on customer sites, Brown says it is focused now on a few other challenges, including increasing price/performance. The catch is, he says, with current Intel and AMD hardware offering 15 percent improvements with generation hops ± that does not let AWS capture users on price/performance over time. He says in addition to solving latency with on-site efforts, the compute portion is where the next shakeup will be.
“In the early 2000s we knew we wouldn’t see long-term doubling every 18 months, we saw the slowdown. We set out to see if we could build something with our custom silicon, a fully AWS designed processor using Arm. That brought a 40 percent price/performance boost. What we’re seeing now is a large migration, from startups to big enterprises to even enterprises we don’t expect to see adopt to new teach, all jumping on Graviton 2 for that benefit.”
It is not inconceivable to see a future in which AWS is fully its own hardware, free from the slings and arrows (but also competitive pressure advantages) of Intel and AMD. Brown says he thinks there will still be enough workloads that run on those platforms over the coming years. “It’s not just a case of moving everything to Graviton, it’s about supporting on whatever platforms they want to run on,” he adds.
With latency and scale closely coupled, we can add to that list compute capability. Brown says that as his career continues, he will keep doing what he started doing in that small office with a scrappy, limited team of engineers. Build, wait for feedback, and rebuild, albeit to unprecedented scales.