Nvidia Sees The Light On Silicon Photonics And Maybe Optical Switching
By virtue of its $6.9 billion acquisition of Mellanox Technologies completed in early 2020, Nvidia became a seller of optical transceivers for Ethernet and InfiniBand, but seven years before that deal was done, Mellanox had acquired optical technology suppliers Kotura and IPtronics to become a supplier of these components itself, correctly perceiving how important optics would be for the future.
But, when it got into a spat with activist investor Starboard Value when the networking company refused the advances of Marvell to acquire it in 2017, its optical transceiver dreams were dashed, the business was slashed, and Mellanox began sourcing lasers, photonic integrated circuits (PICs), and other components from Lumentum, the company that would become Coherent, and others to build up its LinkX cable and transceiver business. These cables and transceivers can account for half of the cost of the network and well more than half of its power draw, so to say LinkX is important to Mellanox and now to Nvidia is a big understatement.
But the challenges of designing more advanced AI systems are so severe, and the need to eventually replace electrical circuits with optical ones are so acute, that Nvidia is shelling out $2 billion a pop to both Lumentum and Coherent to get their research and development moving in directions that are helpful to Big Green’s AI ambitions.
The investments in these two companies is not, like so many deals we see these days, Nvidia buying stock, and there is not an 8-K filing from the US Securities and Exchange Commission from any of these three companies out yet to detail exactly what the investment is in either Lumentum and Coherent. We strongly suspect that the investments are for convertible notes, new shares to be issued at some future date, or some other kind of equity instrument. In addition to this $2 billion going into each company, Lumentum has a “multibillion [dollar] purchase commitment and future capacity access rights for advanced laser components” from Nvidia, and Coherent has a “multibillion dollar purchase commitment and future access and capacity rights for advanced laser and optical networking products.”
Both Lumentum and Coherent are deeply involved in the co-packaged optics (CPO) efforts Nvidia has undertaken with the impending Quantum-X InfiniBand and Spectrum-X Ethernet switches that Big Green revealed last March. As far as we know, Lumentum is supplying the lasers for the CPO modules for both of these switch families, but the demand might be so humungeous as Nvidia looks ahead that it needs more than one source of lasers. The announcements above basically say that, and that these deals are non-exclusive means Nvidia absolutely wants competition – and the more the merrier to drive down prices.
This is why Nvidia championed Micron Technology’s re-entry into the HBM stacked memory business two years ago, when Samsung and SK Hynix owned that business. Micron supplied all of the HBM capacity on the “Hopper Ultra” H200 accelerators announced in November 2023, which has helped it rake in some big bucks and a fast return on investment.
I wonder what the future “optical networking products” in the deal with Coherent might be, and I wonder further why the Lumentum release doesn’t say the exact same thing. (Perhaps it was supposed to?) No matter. They are interesting for similar and overlapping reasons.
One thing is certain: Nvidia cannot easily afford to acquire either Lumentum or Coherent with all of its other commitments. As this story is being written, Lumentum’s market capitalization is up 10.7X in the past year through end of trading on Friday, to $50.1 billion. This is pretty good for a company that had $2.11 billion in sales and $251.6 million in net income for its trailing twelve months. Coherent had revenues of $6.3 billion in the trailing twelve months, net income of $331 million, and had a market cap of $48.5 billion (up 3.8X for the year) as of Friday afternoon last week. And more importantly, if Nvidia tried to vertically integrate by acquiring either company, it might set off a hullaballoo among the world’s antitrust authorities and the telecom and service provider companies that are their big customers as well as the one that it did not acquire.
There are a lot of things that Lumentum and Coherent can help Nvidia with. To begin with, just as CPO has been added to the Quantum-X InfiniBand and Spectrum-X Ethernet scale-out networks – specifically, to their switch ASICs – we think that it will eventually be necessary to add CPO to Nvidia’s GPU compute engines and to its NVSwitch. Even after Nvidia puts a midplane in the rack to get rid of all of the copper cables currently used in its “Oberon” NVL72 racks. At some point, the bandwidth is going to have to go up on the GPUs and the physical space on the edge of the GPUs is not going to get larger – in fact, multichip sockets make the beachfront issue – area of compute and cache compared to the circumference of the socket – worse. Not better.
We understand the economic and technical reasons why Nvidia puts this off as long as possible, and have gone through these in many stories and webinars in recent years.
There is another way that Lumentum and Coherent are interesting: They both have optical circuit switches. And that means that Nvidia could, in theory and with some changes in the topology of its AI clusters, build a much larger NVSwitch memory domain – and one that was much more power efficient – if it had an OCS as the spine in its scale up networks.
Lumentum’s R300 optical circuit switch is based on the same kind of micro-electrical-mechanical systems (MEMS) mirror technology that Google employed with its “Palomar” MEMS devices that are part of the “Apollo” OCS that is the backbone of the TPU clusters for the past four generations of Google’s TPU systems. (TPU v4 through TPU v7, to be precise. The prior TPU v1 through TPU v3 machines were hardwired, like Nvidia’s GB200 NVL72 and GB300 NVL72 systems are today.)
The optical circuit switch is not fast to change any particular link between one device or another – it takes on the order of tens of milliseconds to spin the mirrors to reconfigure the links between any two fiber optic lightpipes linking any two devices. This is way too slow for a switch memory fabric where there is a lot of dynamic memory reconfiguration.
But, alas, for the spine of the memory fabric of an AI cluster – the top-most layer in the network – there is not a lot of reason to change it very often, and in fact, Google has a 3D torus network to gang up 9,216 of its “Ironwood” TPU v7p compute engines into a shared memory domain and, with the spinning of a few mirrors, it can cut that up into smaller chunks and sell smaller AI supercomputers to run smaller workloads.
The point is, you change the network configuration very infrequently, and the links between the spine of the network are then running optically, light bouncing between fibers directly across mirrors, no conversion from optical to electrical and back again as happens in an Ethernet or InfiniBand switch (either in transceivers outside the switch box or CPO inside the box). You can’t escape this power draw if you don’t have an optical circuit switch.
But if you do have an OCS like the 300x300 port R300 from Lumentum, announced last March and sampling to a number of hyperscalers and cloud builders, you can cut the overall network power consumption of an AI cluster by 65 percent across a system with 100,000 XPUs in a cluster. (That’s what Lumentum claims.) Power, like time, is money. And so is latency. Lumentum says that the latency of OCS switching compared to electrical Ethernet switching is 5X to 10X lower. (Once it is set up, that is.) Here’s what the Lumentum OCS switch looks like:
Coherent has just started shipping a version of an OCS based on liquid crystal technology, and has seven customers trialing it right now. This Datacenter Lightwave Cross Connect (DLX) switch comes in versions with 64x64 ports, 320x320 ports, or 512x 512 ports. Here is what the DLX OCS looks like:
Nvidia might be hammering out supply agreements for lasers, but we strongly suspect that somewhere in the “Rubin Ultra” generation when the new “Kyber” rack with the copper midplane comes out, Nvidia might switch to a torus or dragonfly interconnect topology (instead of the fully connected fat tree topology of the current NVSwitch memory fabric) and have OCS spines to link them all.
We think Nvidia wants multisourced lasers for CPO, but is also wants two suppliers of OCS gear, too, for the longer run.