Moore’s Law has underwritten a remarkable period of growth and stability for the computer industry. The doubling of transistor density at a predictable cadence has fueled not only five decades of increased processor performance, but also the rise of the general-purpose computing model. However, according to a pair of researchers at MIT and Aachen University, that’s all coming to an end.
Neil Thompson Research Scientist at MIT’s Computer Science and A.I. Lab and a Visiting Professor at Harvard, and Svenja Spanuth, a graduate student from RWTH Aachen University, contend what we have been covering here at The Next Platform all along; that the disintegration of Moore’s Law, along with new applications like deep learning and cryptocurrency mining, are driving the industry away from general-purpose microprocessors and toward a model that favors specialized microprocessor. “The rise of general-purpose computer chips has been remarkable. So, too, could be their fall,” they argue.
As they point out, general-purpose computing was not always the norm. In the early days of supercomputing, custom-built vector-based architectures from companies like Cray dominated the HPC industry. A version of this still exists today in the vector systems built by NEC. But thanks to the speed at which Moore’s Law has improved the price-performance of transistors over the last few decades, the economic forces has greatly favored general-purpose processors.
That’s mainly because the cost of developing and manufacturing a custom chip is between $30 and $80 million. So even for users demanding high performance microprocessors, the benefit of adopting a specialized architecture is quickly dissipated as the shrinking transistors in general-purpose chips erases any initial performance gains afforded by customized solutions. Meanwhile, the costs incurred by transistor shrinking can be amortized across millions of processors.
But the computational economics enabled by Moore’s Law is now changing. In recent years, shrinking transistors has become much more expensive as the physical limitations of the underlying semiconductor material begins to assert itself. The authors point out that in the past 25 years, the cost to build a leading edge fab has risen 11 percent per year. In 2017, the Semiconductor Industry Association estimated that it costs about $7 billion to construct a new fab. Not only does that drive up the fixed costs for chipmakers, it has reduced the number semiconductor manufacturers from 25, in 2002, to just four today: Intel, Taiwan Semiconductor Manufacturing Company (TSMC), Samsung and GlobalFoundries.
The team also highlights a report by the US Bureau of Labor Statistics (BLS) that attempts to quantify microprocessor performance-per-dollar. By this metric, the BLS determined that improvements have dropped from 48 percent annually in 2000-2004, to 29 percent annually in 2004-2008, to 8 percent annually in 2008-2013.
All this has fundamentally changed the cost/benefit of shrinking transistors. As the authors note, for the first time in its history, Intel’s fixed costs have exceeded its variable costs due to the escalating expense of building and operating new fabs. Even more disconcerting is the fact that companies like Samsung and Qualcomm now believe that that cost for transistors manufactured on the latest process nodes is now increasing, further discouraging the pursuit of smaller geometries. Such thinking was likely behind GlobalFoundries’s recent decision to jettison its plans for its 7nm technology.
It’s not just a deteriorating Moore’s Law. The other driver toward specialized processors is a new set of applications that are not amenable to general-purpose computing. For starters, you have platforms like mobile devices and the internet of things (IoT) that are so demanding with regard to energy efficiency and cost, and are deployed in such large volumes, that they necessitated customized chips even with a relatively robust Moore’s Law in place. Lower-volume applications with even more stringent requirements, such as in military and aviation hardware, are also conducive to special-purpose designs. But the authors believe the real watershed moment for the industry is being enabled by deep learning, an application category that cuts across nearly every computing environment – mobile, desktop, embedded, cloud, and supercomputing.
Deep learning and its preferred hardware platform, GPUs, represent the most visible example of how computing may travel down the path from general-purpose to specialized processors. GPUs, which can be viewed as a semi-specialized computing architecture, have become the de facto platform for training deep neural networks thanks to their ability to do data-parallel processing much more efficiently than that of CPUs. The authors point out that although GPUs are also being exploited to accelerate scientific and engineering applications, it’s deep learning that will be the high-volume application that will make further specialization possible. Of course, it didn’t hurt that GPUs already had a high-volume business in desktop gaming, the application for which it was originally designed.
But for deep learning, GPUs may only be the gateway drug. There are already AI and deep learning chips in the pipeline from Intel, Fujitsu, and more than a dozen startups. Google’s own Tensor Processing Unit (TPU), which was purpose-built to train and use neural networks, is now in its third iteration. “Creating a customized processor was very costly for Google, with experts estimating the fixed cost as tens of millions of dollars,” write the authors. “And yet, the benefits were also great – they claim that their performance gain was equivalent to seven years of Moore’s Law – and that the avoided infrastructure costs made it worth it.”
Thompson and Spanuth also noted that the specialized processors are increasingly being used in supercomputing. They pointed to the November 2018 TOP500 rankings, which showed that for the first time specialized processors (mainly Nvidia GPUs) rather than CPUs were responsible for the majority of added performance. The authors also performed a regression-analysis on the list to show that supercomputers with specialized processors are “improving the number of calculations that they can perform per watt almost five times as fast as those that only use universal processors, and that this result is highly statistically significant.”
Thompson and Spanuth offer a mathematical model for determining the cost/benefit of specialization, taking into account the fixed cost of developing custom chips, the chip volume, the speedup delivered by the custom implementation, and the rate of processor improvement. Since the latter is tied to Moore’s Law, its slowing pace means that it’s getting easier to rationalize specialized chips, even if the expected speedups are relatively modest.
“Thus, for many (but not all) applications it will now be economically viable to get specialized processors – at least in terms of hardware,” claim the authors. “Another way of seeing this is to consider that during the 2000-2004 period, an application with a market size of ~83,000 processors would have required that specialization provide a 100x speed-up to be worthwhile. In 2008-2013 such a processor would only need a 2x speedup.”
Thompson and Spanuth also incorporated the additional expense of re-targeting application software for specialized processors, which they pegged at $11 per line of code. This complicates the model somewhat, since you have to take into account the size of the code base, which is not always easy to track down. Here, they also make the point that once code re-development is complete, it tends to inhibit the movement of the code base back to general-purpose platforms.
The bottom line is that the slow demise of Moore’s Law is unraveling what used to be a virtuous cycle of innovation, market expansion, and re-investment. As more specialized chips start to siphon off slices of the computer industry, this cycle becomes fragmented. As fewer users adopt the latest manufacturing nodes, financing the fabs becomes harder, slowing further technology advances. This has the effect of fragmenting the computer industry into specialized domains.
Some of these domains, like deep learning, will be in the fast lane, by virtue of their size and their suitability for specialized hardware. However, areas like database processing, while widely used, may become a backwater of sorts, since this type of transactional computation does not to lend itself to specialized chips, say the authors. Still other areas, like climate modeling are too small to warrant their own customized hardware, although they could benefit from it.
The authors anticipate that cloud computing will, to some extent, blunt the effect of these disparities by offering a variety of infrastructure for smaller and less catered for communities. The growing availability more specialized cloud resources like GPUs, FPGAs, and in the case of Google, TPUs, suggest that the haves and have-nots may be able to operate on a more even playing field.
None of this means CPUs or even GPUs are doomed. Although the authors didn’t delve into this aspect, it’s quite possible that specialized, semi-specialized, and general-purpose compute engines will be integrated on the same chip or processor package. Some chipmakers are already pursuing this path.
Nvidia, for example, incorporated Tensor Cores, its own specialized circuitry for deep learning, in its Volta-generation GPUs. By doing so, Nvidia was able to offer a platform that served both traditional supercomputing simulations and deep learning applications. Likewise, CPUs are being integrated with specialized logic blocks for things like encryption/decryption, graphics acceleration, signal processing, and, of course, deep learning. Expect this trend to continue.
The complete paper from Thompson and Spanuth is definitely worth a read. You can download it for free here.
Nvidia’s RTX Turing line of GPUs have employed Trained Tensor cores based AIs for Denoising RTX Turing’s limited Ray Tracing Cores’ noisy output, that and also the Trained AI based Upscaling that is for Nvidia named DLSS(Deep learning Super Sampling) instead of the traditional methods of upscaling.
So even though the RTX Turing’s dedicated Ray Tracing cores can complete 10 billion Ray Paths per second(RTX 2080Ti) the average frame times at between 30 frames per second(FPS) and 60 FPS are measured in milleseconds, 33.33ms at 30 FPS down to 16.67ms at 60 FPS and below 16.67 for higher frame rates than 60 FPS. So sudddenly there are still not sufficient numbers of Rays Traced at those limited frame times to generate anything but a grainy ray traced image that must be denoised by a Tensor core hosted AI pass before all that is mixed in with the traditional raster putput.
Nvidia employs a hybrid Raster/Ray Tracing approach that’s helped along by the Tensor Core hosted AI that’s trained for denoising an image that’s created with a rather limited amount of Rays available per milleseconds of available Frame Times. And that includes, if desired, any sorts of ambient occlusion, reflections, refractions and Shadow/Lighting passes that can also make use of Rays insted of traditional raster methods of shadow mapping etc. So each one of those passes needs Ray Tracing core computations with there being insufficient Ray Generation capacity to generate in gaming workloads anything close to a clear image without AI based denoising.
So most GPUs will eventually be getting dedicated Tensor Cores especially any console CPU/APU processors where upscaling is more utilized due to the limited power of the Integrated Graphics used on Consoles.
There are already hints of MCM’s with chiplets that will come as GPUs and other specilized Die/Chiplet processors instead of only CPU die/chiplets.
Look at your average cell phone and those devices have already been using DSPs and Neural Processors(Tensor Cores) in addition to the usual CPU cores and integrated GPUs.
Tensor cores will be adopted on most conumer based processor systems because of the ability to offload the training tasks to some giant computing/AI clusters and then host whatever trained AIs that are needed on mobile devices in order to offload more and varied tasks from the power hungry General Purpose CPU cores. That’s how Apple gets within its power budget on its Smartphones and Tablets, and ditto for Qualcomm and Samsung/others.
That Era of general purpose computing has been over for some years now on Mobile devices. And really that process started once GPUs started to become more commonplace and integrated along with the CPU cores for APUs and similar offerngs or even GPUs hosted on servers and used for non gaming computational workloads.
Necessity had dictated the use of specilized Procesors on smartphones and tablets in order to perform ever more complex computing tasks more efficiemtly and within the limited power budgets and thermal constraints on mobile devices.
And the same is going to be true for exascale computing where General Purpose Processors use too much power in the megawatts range and GPUs have already been of great help in petascale computing on the way to the Exascale in performing more GFlops/Watt than any General Purpose Processors(CPUs) are capable of.
All of that is fine, but it doesn’t mean this “law” that Moore’s name is used for holds true, or will become true again in the future. Why is everyone ignoring the actual definition of the law? If some law of thermodynamics were shown not to hold true anymore, then would everyone ignore it – keep using the name – and then write about how it doesn’t matter because it is basically going to become true again in the future? No… That’s insane.
As of 2018, there are possibly more than just four semiconductor manufacturers with 14 – 16nm production capabilities. According to Wikipedia, there are also Toshiba (Fab 5) and United Microelectronics (Fab 12a), which would make it six.