Building A Better Machine For An AI World

Raja Koduri has been in the thick of the past two eras of computing, which were marked by – among other things – the ability to architect systems and software that helped to get more performance into the hands into increasing numbers of people.

In two stints with AMD, Koduri was key in steering the development and use of the chip maker’s Radeon GPUs, expanding their use from client and gaming systems into the datacenter and HPC fields. In the middle of his almost 13 years at AMD, he left for four years to run Apple graphics architecture business, returning in 2013. And three years ago, of course, Koduri came into Intel, where he now is the company’s chief architect, vice president, and general manager of the Cores and Visual Computing and Edge Computing Solutions unit.

The job gives him an unobstructed view of what the future of computing looks like, and for all of the rapid changes that are happening and the challenges they present, the goal in many ways the same as it was in the PC era and now the mobile and cloud era – make a lot of compute capability available to as many people as possible.

“In the PC era, we digitized everything we could and networked everything we can,” Koduri said this week during his keynote presentation at the Hot Chips 2020 conference, which like most events during the ongoing COVID-19 pandemic is a virtual show done completely online. “And then we got a billion people onto the Internet. It changed the way we learned, worked, and entertained ourselves. The next disruption was the mobile and cloud era, which connected 10 billion mobile devices to supercomputers in the cloud. And this changed the way we live. What’s the next disruption going to enable?”

Admitting that he was getting into the “dangerous game of predicting future here, but I am excited about 100 billion devices that are connected and intelligent, protecting and enriching our lives in more ways than you can imagine. Intelligence will require a ton of computing – exascale and beyond – computing that should be equally accessible to anyone and everyone, that can access and address data, that can reside anywhere in the world. Computing that’s not just controlled by a few and not controlled by proprietary abstractions and service leaders. Exascale computing should be as accessible to everyone as electricity is today. This is the future that excites me.”

Driving this is not only the billions of intelligent devices that are on the way – from systems to sensors – but the massive amounts of data that they will generate (and already are generating) and the artificial intelligence (AI) technologies that will be needed to analyze the data and squeeze the useful information from it, creating the foundation for more informed and timely business decisions. All of this represents a 10-fold opportunity for the IT industry, but it’s going to demand massive amounts of expensive compute power, a demand that already is doubling every three to four months, Koduri said. By 2025, the world will be creating 175 zettabytes of data.

“Not only do we need more compute at a faster pace than we have ever had, we also need it to be general-purpose,” he said. “And we’re making it even harder. One of the key breakthroughs in our progress on AI is data – leveraging data to build our intelligence. But as a world, we are generating a ton of data every second. We are generating more data than our ability to analyze and understand. Data not only stresses the need for more compute, it stresses our whole infrastructure. We need more capacity and bandwidth. We need a network bandwidth to go up on exponential trajectories while reducing latencies as well.”

With memory, there are challenges at every level of the hierarchy to fill the existing gap between what’s available for AI today and what’s needed. Koduri said the industry needs “5X to 10X reduction in memory power to be able to pay for the compute increase we need in the system.”

There are myriad challenges to evolving the compute architecture to meet the rapidly growing demand and it will take a combination of hardware and software innovation, he said. But there is work being done in all these areas around the globe. He touched on some of them.

Don’t Sleep On Moore’s Law

Koduri cautioned that despite several years of debate about Moore’s Law, the foundational belief that has driven past eras in computing still has life left. There’s no doubt that key metrics – such as performance per dollar and performance per watt – are getting more difficult to attain, but there is still a way to go before Moore’s Law runs out and software – compilers, algorithms, different languages and such – plays as important a role as does hardware. With software updates, the same CPU can still get 100X the performance and new AI workloads allow for optimization of vectors that weren’t needed before. That said, transistor scaling is not as useful as it once was.

However, architects still have levers to pull to get performance improvement “whether we call it Moore’s Law, more than Moore, or beyond Moore. We firmly believe there is a lot more transistor density scaling to come,” Koduri said. There is still the opportunity to get three times more density scaling out of FinFET, which will morph into nanowire for MOSFET transistors for another two-fold improvement. Stacking nanowire transistors can yield another three-times improvement. Beyond pitch scaling will come wafer-to-wafer stacking for two times more density improvement and die-on-wafer stacking, which will yield another 2X. All of this is being worked on in labs around the world, though it could take a decade or more for all this to come to fruition.

Advanced Packaging

Heat dissipation will continue to be challenge, but there are techniques – including ruling voltage scaling, capacitance scaling, wavelength reduction and frequency scaling – that eventually will deliver the 50X reduction the industry needs, he said.

Advanced packaging, which Intel and other chip makers have been working on for years, will help drive density scaling, and Koduri noted the recent announcement of hyper bonding, an alternative to traditional thermal compression bonding that will deliver dense vertical interconnects between top and bottom by low capacitance and lower interconnect power. Intel has started shipping the 10-nanometer “Lakefield” chip, which features Intel’s Foveros 3D stacking and a hybrid computing architecture and the company is also shipping GPUs and FPGAs with the advanced technologies. “It’s our most advanced packaging technology to date,” he said.

Speaking of GPUs, during this presentation, Koduri held up a four-tile Xe Graphics HP GPU, which is on the roadmap for the company’s upcoming GPU architecture. Koduri and other Intel officials this year have been talking about the Xe HP Graphics offerings, showing a lineup of chips with one, two and four tiles that will span from entry-level systems to the datacenter. Among the features of “petaflops-scale” Xe HP are Foveros die stacking.

All of this is to eventually be able to create massive general-purpose systems that can scale to a million cores and that not only can host thousands and millions of users at the same time but also is affordable and accessible to everyone, according to Koduri.

The Contract Between Hardware And Software

The relationship between the hardware and software is at the core of driving computing, he said, noting that the PC era had X86 and Windows API, the mobile era has had iOS and Android, and the cloud has been defined by X86 chips and Linux software, a relationship that has led to a 10 million-developer ecosystem.

“The impact of an architecture in the ecosystem hasn’t just been about the performance of the architecture or how elegant and flexible the ISA has been,” Koduri said. “The generality is also defined by the breadth and depth of the software stack available on an architecture. We have over 20 million developers in the x86 ecosystem and a rich, comprehensive software stack is built around it. The reach and richness of that ecosystem is what makes it general as well.”

Koduri said the question, in this new era of billions of intelligent devices, becomes what will the hardware-software contracts look like? “We’re likely to see an expansion of architectures — X86, Arm, RISC-V on the CPU front, GPUs, AI architectures. There’s compute in memory and in-memory compute and there is compute already inside the network, a very heterogeneous mix of architectures. There’s definitely huge performance gains to be had.”

It’s a heterogeneous world, but the challenge is that heterogeneity can compromise the generality that is important to the adoption of new architectures. Intel has been adding more heterogeneity to its products and has learned a lot about software through its ISA extension support in many layers of the software stack. In particular, Intel architects learned that productive abstractions that hide the heterogeneity through libraries and middleware are a way to make the system look more general to the software. It takes three to five years for ISA extensions to gain broad adoption, though it’s faster for performance-oriented verticals like HPC, gaming, and AI.

They key is to ensure that abstractions that are introduced are scalable across the entire network and that they are open and accessible to developers up and down the stack.

Looking Ahead To 2021

Thinking about the near term, Kudori pointed to the Xe HP GPU silicon as an example what can be done through a combination of hardware and software, delivering petaflops-scale performance at the edge.

“We can see a near-time horizon where we span the compute spectrum from a million neurons operating at milliwatts at the sensor level to exaflops consuming megawatts in the cloud. Through our oneAPI initiative we are making good progress on software productivity as well,” he said, referring to a program designed to deliver a set of tools to developers that gives them a unified programming model to make it easier to build workloads across diverse architectures. “It is quite feasible to widen this rate to over a billion neurons at the sensor level to a zettaflops in the cloud.”