March 28, 2017 Nicole Hemsoth
It is one thing to scale a neural network on a single GPU or even a single system with four or eight GPUs. But it is another thing entirely to push it across thousands of nodes. Most centers doing deep learning have relatively small GPU clusters for training and certainly nothing on the order of the Titan supercomputer at Oak Ridge National Laboratory.
The emphasis on machine learning scalability has often been focused on node counts in the past for single-model runs. This is useful for some applications, but as neural networks become more integrated into existing workflows, including those …Read more