AI inference hardware startup, Untether AI, has secured a fresh $125 million in funding to push its novel architecture into its first commercial customers in edge and datacenter environments.
Intel Capital was a primary investor in Untether AI since its founding in 2018. When we did a deep dive on their architecture with their CEO in October, 2020, the Toronto-based startup had already raised $27 million and was sampling its runAI200 devices. The team, comprised of several ex-FPGA hardware engineers, was bullish on the potential for custom ASICs for ultra-low power interference and apparently, its investors are too.
This latest funding round, led by Tracker Capital and Intel Capital, also roped in new investor, Canada Pension Plan Investor Board (CPP Investments), which manages money for the country’s 20 million-strong pension program with a fund total of over $492 billion.
These are still early days for the inference startup but they have managed to secure systems integrator, Colfax, to carry their tsunAlmi accelerator cards for edge servers along with their imAIgine SDK. Each of the cards have four of the runAI200 devices we described here which Untether says can delover 2 petaops of peak compute performance. In its own benchmarks they say this translates to 80k frames per second on ResNet-50 (batch size 1) and on BERT, 12k queries per second.
The startup is focused on Int-8, low latency server-based inference only with small batch sizes in mind (batch 1 was at the heart of their design process). The company’s CEO, Arun Iyengar (you might recognize his names from leadership roles at Xilinx, AMD, and Altera) says they are going after NLP, recommendation engines, and vision systems for the applications push with fintech at the top of their list for markets, although he was quick to point out that this was less about high frequency trading and more for broader portfolio balancing (asset management, risk allocation, etc.) as AI has real traction there.
At the heart of the unique at-memory compute architecture is a memory bank: 385KBs of SRAM with a 2D array of 512 processing elements. With 511 banks per chip, each device offers 200MB of memory, enough to run many networks in a single chip. And with the multi-chip partitioning capability of the imAIgine Software Development Kit, larger networks can be split apart to run on multiple devices, or even across multiple tsunAImi accelerator cards.
He also says their low power approach would be a good fit for on-prem centers doing large-scale video aggregation (smart cities, retail operations, for example). He admits willingly that they’re starting with these use cases instead of coming out bold with ambitions to find a place among the hallowed hyperscalers, but says there’s enough market out there for low-power, high performance devices that they’ll find their niches.
In the absence of any public customers for its early silicon, the company is attractive beyond just the funding and the uniqueness of the architecture. It has some pedigreed folks backing the engineering, including Alex Grbic, who heads software engineering and is well known for a long career at Altera. On the hardware engineering side, Untether’s Alex Michael, also of Altera, brings decades of IC design, product, and manufacturing experience to bear.
While the vendor word is that there is explosive opportunity for custom inference devices in the datacenter and edge, it remains to be seen who the winners and losers are in the inference startup game. From our view, the edge opportunity has more wiggle room than the large datacenters we tend to focus on here at TNP and it will be a long, tough battle to unseat those high-value (high margin) customers from their CPU/GPU positions.