
With AI being the biggest change in IT infrastructure since the Dot Com boom, it was no surprise that at the annual Cisco Live event last month in San Diego, the focus was on AI – and particularly agentic AI – and how the networking giant differentiates itself from other infrastructure vendors when it comes to the emerging technology.
Cisco rolled out a new networking architecture complete with devices purpose-built for AI and a strong emphasis on embedded security. Company executives boasted that Cisco comes to the AI era with the most complete stack, with its Silicon One networking chip being a foundational element. Key to that is Silicon One’s programmability, which Jeetu Patel, Cisco’s president and chief product officer, noted allows the chip to take on new workloads without having to tape out of a new chip.
The Silicon One E4 addresses tough challenges around routing logic, scaling, load balancing across large clusters, baking security into the chip itself, Patel said, adding that “it creates for a much more scalable way that you can accommodate for enterprise requirements for AI. This is an area that’s super strategic. We usually don’t talk about it as much because we don’t want to talk about all of the weeds in the infrastructure. We want to talk about what the application is. But the superpower comes from the way that you actually build out the entire stack and get that stack to integrate together.”
The person behind the Silicon One chip is Martin Lund, executive vice president of Cisco’s Common Hardware Group. The Next Platform spoke with Lund at the show about the chip and its role in the ever-evolving world of AI.
Jeffrey Burt: Why is it important for Cisco, when talking about AI, to have silicon and SiliconOne?
Martin Lund: Silicon is the engine. This is how you create high-performance networking. You can’t just do it in software. You can’t meet the performance, you can’t meet the latency, you can’t meet the power requirement, so you have to build dedicated silicon. Cisco has been building ASICs for 40 years, so it’s not new. Everybody else that is doing anything in networking – high-performance networking – is using some form of silicon, whether they build it themselves, which very few do, or they build something known as merchant switching. Just to give you a sense for the performance improvements in the past 20 years – two decades – the performance that you can get out of a chip has increased 10,000 times, to four orders of magnitude in 20 years.
JB: You’ve got a close partnership with Nvidia, you’ve got Intel doing chips, you’ve got cloud providers making their own chips. Using any of those isn’t the route to go for Cisco.
Martin Lund: Very few make their own switching silicon. Most vendors will do their own silicon for compute-specific resources. Google has a Tensor Processing Unit, which they use as an alternative to Nvidia in some cases. Networking silicon is complicated. It’s expensive to be in the game of making these and there is really only a handful of companies that make something like this.
For us, this is our core business. We are a networking company. What’s unique is that we decided when we announced Silicon One in 2019, we said, ‘Hypersalers, we will meet you where you are. We will sell you our Cisco systems with our Cisco software on it. We will sell you a box with our silicon in it and you can put your own software, like a white box. Or we’ll sell you just the silicon and you could build your own system. We’re fine with however you want to consume the technology. We’ll support it.
It’s very much a partnership model and an open ecosystem approach and that’s really resonated. It also resonated with them that they needed alternative suppliers because there might only be one other that has what we have in terms of portfolio breadth and performance, both in the high end and the low end.
JB: So we’re talking about Broadcom?
Martin Lund: Yes we are.
JB: There’s been a lot of talk at Cisco Live about the Silicon One chip’s programmability. What does that mean for Cisco and your customers in the AI era?
Martin Lund: The programmability is sort of a way to deliver something. What you do with it is the most important part. There are many other attributes to the devices that we have that make them very good and competitive. The traffic management capabilities, the way we do buffering, the way we do failure management and load balancing and all that stuff. But a lot of that is tied to how we are able to program. If you remember, in the olden days, it was called a network processing unit, a NPU. Then people called it a DPU, or data processing unit, which was a NIC that has programming in it. Now we have GPUs and XPUs. This is essentially an NPU, but we don’t necessarily call it a NPU because it has connotations for those days that they were slow and had a lot of latency. But it’s programmable like that. It’s just done with a very unique architecture, so we don’t pay the penalties for being programmable like was done in the past. It’s a very novel architecture.
There are benefits of this to hyperscalers. There are benefits to service providers and there are benefits to enterprise customers, but in slightly different domains. In the era of AI, which has largely have taken off at a pace we hadn’t foreseen when we started this, that flexibility shows up as benefits. Hyperscalers tend to have very, very large networks, very complicated, and they really need to load balance traffic very, very efficiently. We have the ability with our programming model to come up with novel and highly efficient load-balancing techniques. Their network runs better, faster. Service providers may have another pain point. For example, segment routing. Segment routing is one of those critical capabilities that is out there, and it wasn’t fully standardized when we built one of the other chips that goes into that network, so we software-upgraded it and now we support segment routing.
For enterprises, this Smart Switching technology that we have announced here – where we are fusing security into the network – we’re using the programmability to enable and accelerate the network functionality that we had with our HyperShield and with our other Smart Switch examples. There’s another example for AI, which was like UEC, which is Ultra Ethernet Consortium. It’s a spec. Again, we used our programmability to support that while the spec was written. We were able to support all these features because of our programmability. The benefit depends on what you look at. Either you get a better performing network and you can get longevity and you get innovation velocity. All from the same architecture approach.
JB: There was talk during a Q&A [with journalists and analysts] about how this programmability also allows for enterprises to introduce new use cases without having to wait. Can you talk a little bit about that?
Martin Lund: At the end of the day, the network has to support many, many different applications. You have email running on it, you have video running on, you’ve got voice, and now you’re going to have [AI] agents that are talking networking and if you have lots of them, you’re probably going to have way more demand on the network. Part of that … flexibility we’re building in is for something that’s not invented yet. We don’t know what it is yet, but we feel confident that we have a very good shot at supporting it. Future-proofing it. What agents are going to be? How do we know how many agents and what QoS model that they want to have? We don’t know yet, because they’re not built yet. It’s not invented. It was just future-proofing something that I know I can upgrade it so I can solve problems for the future. It’s like anti-forklift. It’s like investment protection, and it’s also getting a network where we have observability capabilities, so you can actually see what’s going on, you have good telemetrics.
That’s one of the things that hyperscalers have been driving very, very hard as a requirement. They need to understand their networks, need to see what’s going on, and a lot of those capabilities we have brought into our other product lines, so we have very, very good telemetrics in these devices.
JB: The investment protection also becomes interesting because, as was discussed here, Nvidia is coming out with new GPUs every year and enterprises can’t change as fast as they need to keep up with every new accelerator.
Martin Lund: Many of our customers want the solutions to run for seven to ten years, so this ability to meet future requirements and be malleable to these requirements and support is super important. That is a key advantage. There are other vendors that will claim they’re programmable because you use software to change behavior, but that’s not what we mean by programmable. We mean there’s actually code that runs in the chip that does it. What some other vendors’ approach is, they call it programmable, but it’s more configurable, known as ‘table-driven,’ meaning, yeah, you can change the configuration, but you can’t change the nature or the personality. We can change personality of these devices.
JB: For the first two years after ChatGPT came out, it was really about training. There is always going to be training, but a greater emphasis now leans toward inferencing. For a company like Cisco, with Silicon One, what does that mean?
Martin Lund: The way I think of it is training is training. This is like, you go to school, you learn something. Inference is when you get out of school and you actually go into work. There’s going to be way many more workers than there’s going to be kids in school and they’re going to do productive work. What limits the applicability of AI as a tool that can be useful besides model performance, hallucinations, and all these other things. At the end of it, it comes down to what’s the cost of running the machine. The figure of merit for that is cost-per-token. How many cents per token does it cost, or pico-cents, or whatever fraction, and how fast can I get it done? How many throughputs and dollars per token? A dollar per million tokens maybe is another way to think of it.
But as that comes down – because all the silicon integration and optimization and beta models, that price per token is dropping – that means that significantly more workloads coming in. There are other aspects to it, because as the models are changing to improve them, they will run the same question, and maybe instead of one time, they’ll run it twelve times through the same loop. You do much more looping. You do reinforcement learning. That adds more power. Training was important and still is important. Huge datacenters need training. But productive work is where you actually create value, and that is going to be everywhere. It’s going to in the cloud. It’s going to be in private datacenters. It’s is going be at the edge. It’s on the phone. That opportunity is massive.
JB: Does that change at all how you look at what you need to do as far as Silicone One?
Martin Lund: It does. It’s almost like a little bit of a waterfall model that we have. We push the edge at the hyperscaler space and go as fast as we can. That’s the race we’re on, like it’s a treadmill. Keep going. Then we take some of those technologies and we scale them down. The chip we have here is a 51.2 Tb/sec. That might be overkill for some enterprises. They still need the same technology, but it’s just too much. They don’t need that. There’s going to be some version of it, but they can’t cool it, so we’ll have a version of this that’s maybe half the size or a quarter of the size. We make it smaller so it’s a better fit. We still kind of push it in the frontier and then we waterfall it down for the rest. This waterfall is something that’s maybe a half a year to a year after. It’s not long.
JB: So we can expect in, say, a year or two that you will have different versions of that to address different workloads or at least sizes that different companies need?
ML: Yeah, and with the timeframe, you can probably be more optimistic on the time from there.
JB: One of the questions that has been raised here is the idea that, given what you’re able to do with Silicon One, whether you’re entertaining thoughts expanding to include GPUs or anything like that.
Martin Lund: The Silicon One architecture is a Cisco Silicon One architecture. It’s a switching architecture. We have other custom silicon that we have that we built, like NICs. We have other technologies, other ASICs that we have built. We probably wouldn’t call it Silicon One because that’s kind of a switch. Maybe we call it Silicon Two? I don’t know. In terms of the accelerator piece, it’s a hard thing. It’s not our core business to be an accelerator. We don’t do our own server chips either. We get them from our partners.
The real issue is that in the compute market, the GPUs are the workhorse. It’s possible there will be applications where those are not a good fit and it needs another specialized solution, but for now I would say we are very focused to our business. It is a big enough swim lane that we have with networking and security, and we have good partners and good solutions out there, and it’s a very, very fast-moving space as well. Do we have the capabilities to build large, complex chips that are at the absolute frontier of chip development? Yes, we do. That might be a one of those little-known secrets.
JB: It’s been talked about a lot at the show that Cisco and Silicon One aren’t just about switching but also security. From a chip perspective, what does that mean to you?
Martin Lund: The space of cybersecurity and how you can add value to it is vast. Where we can innovate together with the security architects and so forth, a lot of that is acceleration. A lot of security is based on sampling because you can’t keep up with all of it. But if you can, if you have silicon acceleration driving it, maybe you can do it in hardware and you can do the analytics on the chip, or part of the analytics on the chip and offload it.
If you heard the commentary from Jeetu Patel, Cisco’s president and chief product officer, he said it’s all about distilling the data down because there’s too much of it. You can distill it by sampling it, or you can distill it by computing it and analyzing it right there on the chip and only then send up the distilled information. Also, there’s encryption and other capabilities we have built into the devices that provide MacSEC and IPSEC acceleration and so forth.
You can do anything with software, right? You just can’t do it fast enough. You don’t have enough servers to do it, so silicon is an accelerant. I have sort of a rule of thumb, which is probably still correct: it’s somewhere between a thousand and a ten thousand times faster to do something optimized in silicon – like in transistors, custom silicon – than it is to run it on a general-purpose CPU. And faster means, usually, speed and power are on the same curve – you go faster, you can go at the same performance and lower power. The cell phones that we have here is a good example of it. There are GPUs in them, there is all this stuff, but they’re still low power.
JB: One of the things that is interesting about AI is the rate of innovation. Again, from a silicon perspective, what does that mean when you are looking at where you want to go with the chip? Is there guesswork involved?
Martin Lund: Guesswork is not usually what we use in chip making. We’re sort of the opposite. We’re like fanatical that every transistor must work, otherwise the chip is can’t. But we have to make some bets, and we have to make bets that certain things will come true. Although they’re not here today, we say, but in two years, they will. We make bets like that.
Very few people really understand how fast it’s moving in the entire stack – models, compute software, accelerators, networking. Inside the networking stack, you have optics, you have these chips, you have the hardware systems coming together from a power management perspective. The whole thing is moving at lightning speed. It’s super exciting, but there are going to be some teams or companies that veer off the slopes and, sorry, they’re not around anymore.
You don’t think of it, but you build one of these things and your company depends on it and you make a horrible, bad mistake, so now you have to do it again. Next time you will see this thing will be a year or a year and a half later. Your competition has finished the race. Maybe that’s a little reason why I really also like that programmability, because I can sort of fix stuff if there’s an issue. But a lot of this is just hard. It’s just difficult, and we’re pushing the limits of physics and there’s so much room for innovation. There’s room for errors, too, but I think that’s what makes it fun.
Be the first to comment