Former Nervana Leads Target Optimal Training Configurations

Ex-Nervana Systems engineers made the jump from a hardware-centric approach to efficient training to pushing better insight into optimization of models and systems.

Nervana Systems was one of the first AI chip startups to generate big buzz, culminating in an acquisition by Intel in summer, 2016. The startup’s co-founder and CEO, Naveen Rao, moved into the VP role for the AI products group while fellow Nervana engineers, including Hanlin Tang (who led development for the Neon software stack for Nervana’s devices) also stuck around Intel focusing on practical AI algorithms and federal programs.

Rao and Tang, among others, are together again with a new startup, MosaicML, which came out of stealth today with $37 million in funding from a wide range of VC partners, including Lux Capital, Future Ventures, E14, and others. The target is machine learning training and as the world quickly learned, optimized deep learning has far less to do with efficient, high performance hardware than the VCs believed in the 2014-2021 frame.

This is not to say that there is no hardware hook to MosaicML but it relates to choosing the right systems for individual training runs.

This includes evaluating suitability for cloud or on-prem, use of accelerators, and which CPU is best tuned for ultimate efficiency and/or performance. That “and/or” is because each need is unique: some shops need fast runs at any cost, others value the cheapest route. MosaicML has a set of tools that lets model training teams look at the various tradeoffs comprehensively rather than making best-guess decisions.

The real product is open source and is split between two related efforts. First is MosaicML’s “Composer” which is a library of methods for optimal training that can be strung together as “recipes” based on benchmarked findings and published works. By the way, for those who are actually doing machine learning at scale, this little tool on MosaicML’s site provides what is completely missing in the market—either from analysts, benchmarks like MLPerf, or even anecdotes.

According to MosaicML, “The compositions in this library have allowed us to achieve training speedups and cost reductions of 2.9X on ResNet-50 on ImageNet, 3.5X on ResNet-101 on ImageNet, and 1.7X on the GPT-125 language models (as compared to the optimized baselines on 8xA100s on AWS), all while achieving the same model quality as the baselines. To make sure that these results reflect fair comparisons, all of these data come from training on a fixed hardware configuration on publicly available clouds, and none of these methods increase the cost of inference.”

The other side is the Explorer interface that allow users to do what’s seen above. You set a desired tradeoff between accuracy, cost, or speed to result and get a visualization of those tradeoffs across thousands of training runs on standard benchmarks.

“We believe that unfettered growth of computing is not a sustainable path towards a future powered by artificial intelligence. Our mission is to reduce the time, cost, energy, and carbon impact of AI / ML training so that better AI models and applications can be developed,” the company says.

We tackle this problem at the algorithmic and systems level. MosaicML makes machine learning more efficient through composing a mosaic of methods that together accelerate and improve training.

This is all very useful, of course, but how it translates into a thriving business remains to be seen.

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

Be the first to comment

Leave a Reply

Your email address will not be published.


This site uses Akismet to reduce spam. Learn how your comment data is processed.