This year, about 45 percent of the revenues at Big Blue will come from software. But IBM wants to push this up to half of revenues and then onwards from there. IBM also wants to find the modern AI era analogs to its venerable WebSphere application server, which gave it big revenues and relatively easy profits from enterprise customers during the commercial Internet era.
Back in the late 1990s, IBM grabbed the Apache Web server and the Tomcat Java servlet container system and turned it into WebSphere, which it embedded into its own proprietary and Unix systems and which it also offered as an alternative to Oracle’s WebLogic middleware. WebSphere probably generated tens of billions of dollars in revenues in more than three decades, although it is hard to say because it was embedded in so many ways with IBM’s Power Systems and System z mainframes; it was almost certainly wildly profitable, but has been waning as that application-database part of the enterprise IT estate has settled into normal growth (something above GDP on average but nothing crazy) while the world has gone absolutely bonkers for GenAI.
These days, enterprise data is more of a torrent than a transaction, and data analytics systems have evolved beyond the batch-oriented statistical analysis tools that IBM got through its acquisition of Cognos in early 2008 and of SPSS in late 2009. Those acquisitions together cost $6.1 billion – about $9.5 billion in today’s US dollars – which is on par with the $11 billion that IBM is shelling out for data streaming pioneer Confluent to further build out its GenAI software stack while also helping enterprise customers with more traditional data gathering and analytics. (Yes, GenAI is not going to replace the IT software portfolio overnight. . . . )
IBM has always been a builder of platforms, and the wonder is not that IBM is buying Confluent, the commercial entity behind the open source Kafka streaming tool and its many augmentations and add-ons, but rather why it took Big Blue so long to get it done. Kafka is exactly the kind of thing that, in decades gone by, Big Blue would have invented and commercialized.
The Kafka stream processing platform and distributed event storage system was created at LinkedIn by Jay Kreps, Neha Narkhede, and Jun Raa in a mix of Java and Scala in 2011. It was open sourced a year later, and the founders set about creating a platform and then created a company to support it.
The Kafka Streams streaming processor is analogous to Spark Streaming, Flink, and similar systems and it stores its tabular overlay in a distributed RocksDB datastore. (RocksDB is the low-level storage engine that is at the heart of the CockrockDB clone of Google’s Spanner datastore, and is itself derived from the LevelDB key/value storage engine that Google open sourced and that was inspired by its BigTable and Spanner database service.) The RocksDB datastores underneath Kafka Streams are sharded and distributed across a cluster of servers. Samza is what happens when the YARN job scheduler and data replication features of Hadoop are mixed with Kafka and the KSQL distributed, real-time SQL engine that Confluent created from scratch and launched in the summer of 2017. KSQL is written entirely in Java and was also open sourced under an Apache 2.0 license like the other components of Kafka.
Confluent, the company that the three founders started in 2014, has done a good job building up and out a modern middleware platform for streaming data, but profits have been hard to come by as the requirements for streaming platforms keep growing fast and furious. The task is so difficult, however, that others have tried and failed. We thought Streamlio, which was created by techies from Twitter, Yahoo, and Google and which was launched in March 2018, might have been a contender. But Streamlio was acquired by Splunk in 2019, which was acquired by Cisco Systems in September 2023, and as far as we know is not being sold as a separate product.
The Apache Pulsar core of Streamlio lives on and is being commercialized by StreamNative. Redpanda, which was founded a year after Streamlio, is also a contender in this streaming space, and like StreamNative, claims performance and price/performance benefits over the Kafka platform from Confluent.
The big clouds have their own managed streaming services, too: AWS has Kinesis, Google has Cloud Pub/Sub, and Microsoft Azure has Event Hubs – and interestingly, the AWS and Azure services are based on Kafka. (As Meatloaf correctly points out, two out of three ain’t bad.)
Confluent has 6,500 paying customers and an annualized revenue run rate that is just now cresting above $1 billion, and importantly for Big Blue, fewer than 5 percent of those customers are spending $1 million or more a year on Kafka, according to IBM’s chief financial officer, Jim Kavanaugh, who walked Wall Street through the acquisition this morning.
Kavanaugh added that 95 percent of the Fortune 500 are using IBM’s software but only 45 percent of these same companies are using the Kafka platform from Confluent. IBM’s global reach and its ability to weave new things into its own platforms gives it a unique ability to double or triple the revenue streams from Confluent pretty easily, and then grow it from there.
Over the period where we have financial results, shown above, Confluent has $3.8 billion in cumulative revenues, but $2.2 billion in net operating losses and has never been profitable.
Confluent has been ramping its revenues and customer count pretty fast and looks like it might be more profitable in the coming years. IBM is stepping in before that happens, and in fact, to increase the odds that the Confluent products start delivering profits sooner rather than later. The operating income number shown with the red dashed line above is not precisely encouraging, as incremental revenue are not moving the needle all that much towards profitability at an operating level.
Still, IBM’s chief executive officer, Arvind Krishna, who managed the $34 billion acquisition of Red Hat in October 2018 before taking the helm at Big Blue and who also was keen on the $6.4 billion acquisition of HashiCorp in April 2024, said on the Wall Street call that Confluent would be accretive to earnings before taxes and all that other junk in EBITDA next year and accretive to cash flow in the second year.
The question is how long will it take for IBM to get back the $11 billion it is shelling out from its $14.9 billion cash hoard to get Confluent, which had a market capitalization of $8.09 billion at Friday’s market close. That calculates out to a 36 percent premium on the raw math, but Confluent has just shy of $2 billion in the bank and just a tad over $1 billion in debt. Call it $10 billion in net cash to do the deal. At the rate that Confluent is growing, it might take about four or five years to get the money back as revenue propping up IBM’s Software group. (It takes a lot longer to make enough profits to cover the cost). It took from Q3 2019 to Q3 2025 to cover that $34 billion that Red Hat cost through additional revenues, but look at the story shift and pivot that IBM was able to execute. . . . Without Red Hat, IBM was in permanent and very slow decline, and now it owns commercial Kubernetes and now, with HashiCorp, it has Terraform and a slew of other tools to bolter modern systems deployment and automation. Kafka streaming will add to that software revenue stream (pun intended).
The total addressable market for the Kafka platform is huge, according to Confluent a month ago and now IBM:
This is particularly true in an agentic AI world, where software us going to be getting firehosed by countless other bits of software from all over the world all the time. IBM and Confluent have only showed the TAM for 2025, and we strongly suspect the one for 2029 will be perhaps 2X to 3X larger given how much agentic AI will be going on by then.
We have been warned that the data in the chart above is “for illustrative purposes only” and is not drawn to scale. We think this is cheating. If you are a public company, you draw to scale. Period.
In any event, that is how revenues for the core Kafka platform on premises (Confluent Platform), and the managed services on the cloud (Confluent Cloud) look generally. The Kafka Stream and Data Streaming Platform variants of the cloud-based managed service are shown above. Mostly.
Here is a chart that shows how Confluent Cloud revenues have changed over time:
And here is how the broader Confluent Subscription revenue (which constitutes the vast majority of revenues, excepting professional services) looks:
The Confluent Cloud revenue is one component of the Confluent Subscription revenue, and if you subtract the two, you get the on premises Confluent Platform revenues. IBM will no doubt want to boost sales of the on premises subscriptions, given its enterprise customer base, but is clearly interesting in being hybrid and pushing the cloud, too.
With this acquisition, IBM has its Watson.X model and data governance application for GenAI, plus Red Hat Enterprise Linux and OpenShift to containerized and run AI training and AI inference clusters. It has Storage Scale (formerly GPFS) for high performance parallel storage to feed these clusters, and DS8000s for storage area networks underpinning databases and traditional application and middleware systems.
What IBM will not have is the X86 or Arm servers that run a lot of this stuff, but there is not much money to be made with these anyway. IBM is perfectly happy for customers to buy X86 servers from Lenovo, Dell, Hewlett Packard Enterprise, Supermicro, or Cisco Systems to run Kafka workloads. But, we may yet see a Power11 cluster tuned up with Kafka and GPFS for streaming and storage.
At this point, Confluent board members and shareholders controlling 62 percent of the shares of Confluent have given the IBM deal the nod. Asking the other shareholders to say “Yes” is a formality, but getting regulatory approval around the globe is not. That will probably take until the middle of 2026, according to Kavanaugh.