If you read The Next Platform, you probably love hardware and there are probably two reasons for that. First of all, it is the substance on which ephemeral codes flit about, and it is more immediately graspable in terms of feeds and speeds than is software. The focus on either hardware and software, and the relative dependency of the two in the end, is on full display in the cornucopia of devices and platforms that are being created to run machine learning workloads.
But getting the balance of hardware and software is not easy, even if it will separate the winners from the losers. It is difficult, but still relatively easy, to create a dense, highly specialized device that can beat everything on the benchmarks. But it is not so easy to radically change architecture and not also radically change the software development environment. And so, the people who founded at a startup called SiMa.ai are determined to get that balance right as they create devices and software stacks that span from the embedded markets down to the edge and back into the datacenter.
It is a tall order, but one that Krishna Rangasayee, co-founder and chief executive officer of SiMa.ai, says the company can eventually fulfill across the computing spectrum, as he discussed at our recent The Next AI Platform online event.
“Clearly there is no one cure-all for all of the world’s problems,” explains Rangasayee with the appropriate amount of insight and humility, “but we are starting with the embedded market as our first priority. And the key thing that the embedded market needs is primarily a software experience that defines the company. It is a difficult and fragmented market. There are tends of thousands of customers. Everyone’s architecture is different, and everyone’s software needs are different. And in my mind, ML has done a really good job in being in clients, and ML has done a fantastic job in the past ten plus years evolving its architecture in the cloud. The embedded market is an interesting one in that it is still nascent in its embracing of ML, and the largest entry point in the embedded market is around computer vision.”
Rangasayee knows this embedded market like the back of his hand, starting out as a senior application engineer at Cypress Semiconductor nearly three decades ago, and then moving to FPGA maker Altera (now part of Intel) and then jumping over to rival Xilinx for nearly two decades, rising to become general manager of global sales and markets. In 2017, Rangasayee became chief operating officer at Groq, which is making machine learning chips inspired by Google’s Tensor Processing Unit, and then he did a two and a half year stint as a board member at FPGA maker Lattice Semiconductor, completing the FPGA Triple Crown. Seeing what was happening with machine learning at Groq and Xilinx during all this time, Rangasayee decided it was time to take all of his own learning and figure out a way to add machine learning to the embedded market, in a power efficient manner, without upsetting the entire hardware and software ecosystem. SiMA.ai announced in May that it has raised $30 million in Series A venture capital, driven by Dell Technologies Capital with participation from Amplify Partners, Wing Venture Capital, and +ND Capital, and in June the company announced that Gopal Hegde, who ran the ThunderX Arm server processor effort at Cavium and then Marvell for many years, joined SiMa.ai as the senior vice president in charge of engineering and operations.
Computer vision, where SiMA.ai is making its first product, is of course a crowded market, with Qualcomm, NXP Semiconductors, MicroChip, Xilinx, Nvidia, Texas Instruments, and many others all chasing the dollars with their hardware and software. Nvidia is a good case in point of the challenges that these vendors face as they try to bring machine learning to the embedded market. If you want to do computer vision enhanced with machine learning on Nvidia GPUs, you have to program in CUDA. But embedded programmers are seasoned C and C++ veterans who know how to squeeze every last flop and int out of a chip. Some of the hotshots can make the jump to CUDA, Rangasayee concedes, but in the embedded markets in general, computer vision or otherwise, you don’t mess with legacy code that is designed explicitly to just work and to be in the field for 10 years or 15 years.
“Inherently, to service this market, you really need to natively support C and C++ and really enable the customers to have an experience where they can take their existing IP and make it work on your silicon from Day One,” says Rangasayee. “And if you jeopardize that, you have added a lot of risk, and that is one important learning and that’s why as we build our architecture, we have an Arm-based subsystem so anybody can take their legacy computer vision code and getting it running ASAP. We did not want that to be a learning curve or a hurdle to migrate their existing design.”
Rangasayee does not think that ML is a panacea to solve all problems, but rather thinks of it as an add-on that can enhance the functionality of existing software that companies have spent decades evolving and perfecting. What they are also focused on in this market is low cost and low power consumption, so the SiMa.ai architecture, which is called Mosaic, is designed with energy efficiency in mind. To be specific, the Mosaic system on chip has an Arm complex with external DRAM, Ethernet, and PCI-Express ports as well as camera inputs with data encryption and computer vision IP in that SoC, and then a custom matrix math machine learning accelerator also cooked into the SoC. The resulting SoC can process 50 teraops per second (TOPS) within a 5 watt thermal envelope, and the design can scale up to eight Mosaic SoCs cobbled together to deliver 400 TOPS in 40 watts. And, just for fun, we can imagine 4,000 TOPS in 400 watts, but this would be a truly huge SoC.