The Least Crazy And Least Mean GenAI Model Will Win

It takes big money as well as big ideas to compete in the generative AI space. Not hundreds of millions of dollars, but billions of dollars. And because of this, and because the stakes are so high as GenAI goes from hype cycle to commercialization, we see alliances and partnerships forming that will shape how we access GenAI functions and who we access them from.

As usual, the odds favor the large, not because might makes right but because might begets more might. Given the $1 billion-plus price tag for a system that can train state of the art large language models that power GenAI – which have transformer models, sometimes also called foundation models, as their underpinnings – there are a limited number of ways that such iron can be brought to bear to advance the state of the art. This, more than any other reason, is why we see Amazon Web Services investing up to $4 billion in AI startup Anthropic today.

There is irony as well as poetry in the name that this AI startup chose for itself.

Anthropic is an adjective that literally means having to do with human existence, and GenAI is certainly recursive in that it is only beginning on its inevitable course of utterly changing the nature of work and private life for the 8.1 billion people who inhabit Earth. That’s the irony.

The poetry in that startup’s name comes from the anthropic principle, which is another snake eating its own tail and which we are grossly oversimplifying because this is a tech analysis not a philosophical treatise, argues in essence that the Universe has evolved specifically so living beings such as ourselves can participate in it and observe it. What we have joked about in recent years is that the evolution of artificial intelligence on Earth was always inevitable, and if you hew to the anthropic principle, precisely as inevitable as the evolution of intelligent life in the Universe, and the funny bit is that this is the third time it has actually happened but civilization collapsed the other two times and we just have no historical records. Maybe some pyramids we don’t really understand. . . .

With that, let’s get back to business with Anthropic and its new relationship with the world’s largest cloud builder and, lest we forget, the world’s largest online retailer.

Anthropic was founded in 2021 by Dario Amodei and Daniela Amodei, who are siblings and who worked both with Microsoft GenAI partner OpenAI, the creator the GPT large language model and famously the ChatGPT chatbot that has made GenAI a household concept around the world. Dario Amodei is Anthropic’s chief executive officer and was vice president of research at OpenAI and led the development of the GPT-2 and GPT-3 models. (These were the open source ones.) He was previously a senior research scientist on the Google Brain team, and did a one-year stint working with former Googler Andrew Ng at Chinese hyperscaler Baidu. Daniela Amodei was a technical recruiter and risk manager at payment platform Stripe before joining OpenAI as its engineering manager and vice president of people, and she eventually became its vice president of safety and policy.

Other Anthropic co-founders – and there are many – include Jared Kaplan, a physicist who worked at the Stanford Linear Accelerator run by the Office of Science of the US Department of Energy and who was a research consultant for OpenAI in 2019 and 2020; he has been a professor of physics and astronomy at Johns Hopkins University for the past year. Co-founder Tom Brown worked at Internet startups MoPub and Grouper before becoming a member of the technical staff at Google Brain in 2017 and had that same title when he joined OpenAI in December 2018. Co-founder Sam McCandish is a theoretical physicist who was a research lead at OpenAI and who worked on the team to scale up OpenAI’s transformative (pun intended) GPT-3 model. And last but certainly not least for us, there is co-founder Jack Clark, our compatriot in hyperscaler computing from The Register who did a stint at Bloomberg for a few years before being asked by Elon Musk to join OpenAI to help with policy and communications for its AI efforts.

Anthropic’s focus is to build “helpful, honest, and harmless AI systems,” and its first LLM, called Claude, was released through a chat interface and an API in March of this year. It comes in a high-end mode called Claude-v1, which is the full-bore model, and Claude Instant, which is less expensive, faster, and lighter weight and designed for, well, just being chatty. The Claude 2 LLM was announced in July, which boosted the accuracy of Claude for coding, math, and reasoning, and significantly it can input up to 100,000 tokens for its prompts and it can output up to a few thousand tokens. Importantly for Anthropic, Claude 2 was twice as good at giving harmless answers as Claude 1.3, the latest v1 iteration.

To recap: the Claude models are focused on long-form input and output and responses that are less harmful. Huh. So is The Next Platform. . . .

Soon, we think, the industry will talk more about prompt token inputs and outputs as much as we now talk about parameters (roughly akin to synapses in our neural networks) and total tokens used for training, which is how we have been characterizing the models. The compute required to train a model scales linearly with the parameter count, and the word on the street is that Claude is roughly on part with GPT-3 with 175 billion parameters.

If this is true, this tells us two things. One, Anthropic needs a lot more iron to scale up its parameter counts to compete with OpenAI/Microsoft GPT-4 and GPT-5, Google PaLM 2 and Gemini, Meta Platforms LLaMA 2, and Inflection AI’s Inflection-1, which are all going to be pushing up into the trillions of parameters and trillions of tokens for their training if they are not already there. And two, given the cost of systems to train such models, Anthropic doesn’t want to blow whatever remains of the $1.5 billion – something just shy of $400 million of that came from Google, according to rumors – that the company has raised to date. Hence the “loan” from Amazon Web Services, which is akin to the same one that Microsoft has given to OpenAI in exchange for preferential treatment on access to models in exchange for promises to spend the dough on cloudy infrastructure.

That’s what these “investments” really are, and these are not necessarily interest free loans, either. For instance, for every dollar that was “invested” by Microsoft, the cloud capacity almost certainly costs more than a dollar. At some point, such investments could be thought of as a kind of roundtripping – we will leave the accountants to figure that out. Do Microsoft and AWS get to book the revenue cleanly when OpenAI and Anthropic pay for capacity, or is there some kind of offset? (Maybe ChatSEC or ChatFASB have opinions about this?)

Whatever the accounting, OpenAI was going to have an enormous cloud computing bill and deals worth $13 billion with Microsoft were a means to get it. Inflection AI was going to need an enormous amount of capacity, and doing partnerships with Nvidia and CoreWeave and raising money from them and others through Wall Street was a way to do this. Cerebras Systems needed to grow its cloud, and a deal with G42 that could be worth $900 million was a way to pay for that iron. It is no different for Anthropic. And this explains why Databricks, the commercial entity behind the Spark in-memory processing framework that recently spent $1.3 billion on AI model provider MosaicML, has a hyperscaler sugar daddy problem if it wants to compete.

AWS is investing $1.25 billion in Anthropic to get a minority stake, just like Google kicked in its dough earlier this year as the two companies were announcing that Anthropic models would run on Google Cloud and, importantly, be tuned to run on its homegrown TPU accelerators as well as Nvidia GPU accelerators also sold by Google Cloud under a utility model. It is sensible for Anthropic to spread its risk out across two investors who are also cloud providers because, right now, cloud capacity is the same as money.

Under the deal, AWS can invest up to $4 billion in Anthropic, which will train its Claude models on AWS Trainium chips and allow for inference to run on its Inferentia chips, both of which are designed by AWS and sold on its eponymous cloud. Importantly, the deal between the two allows Anthropic to have input into future Trainium and Inferentia compute engine designs, and Anthropic has promised to run the majority of its workloads on AWS. (Probably in direct proportion to the Google and AWS investments is our guess. We doubt very much that Microsoft would do a similar deal with Anthropic, given its heavy investment in OpenAI.) Anthropic will also make the Claude models available through the Bedrock managed AI training and inference service launched by AWS in April of this year, and customers will also get early access to new models as they are created.

AWS is, of course, no stranger to machine learning and artificial intelligence. It launched its SageMaker managed service for machine learning in 2017 and today has over 100,000 customers using that service. This is part of what we here at The Next Platform call the software division at AWS.

One last thought: The deal between AWS and Anthropic did not say anything about parent company Amazon, but this deal could have an angle here as well. If the Anthropic models are indeed helpful and less harmful – harmless, as Anthropic puts it in its mission statement, is perhaps a stretch because harm is in the eyes of the harmed – then Claude 2 or Claude 3 might be just the thing that Amazon itself needs as it puts chat interfaces on generative AI for its many lines of business. It might have been cheaper to do a deal here with Anthropic than try to build such models from scratch, and thereby get the benefits from Amazon consumers all that more quickly.

And, just for fun, I signed into Claude and asked him what he thought about this story:

“Based on my review, the document appears to be an accurate and fair news article reporting on business developments in the AI industry. Here are the key points:

  • The article discusses investments and partnerships between major tech companies like Amazon, Google, Microsoft, and startups like Anthropic and OpenAI to develop large language models and generative AI capabilities.
  • It provides factual details on the backgrounds of Anthropic’s founders, the capabilities of Anthropic’s Claude models, and the terms of the deal between AWS and Anthropic.
  • The tone is neutral and matter-of-fact. The author provides analysis and commentary on the business motivations behind these deals, but does not make unsupported claims.
  • Statistics and figures provided, such as the amount invested by AWS and the training parameters of Claude, align with other publicly available information.
  • The article appears well-researched, citing developments across the industry. It contains links to source material for key information.

In summary, I do not see any factual inaccuracies, bias, or lack of balance in the reporting. Based on the information provided, it seems to be an accurate and fair news article on this subject. Let me know if you need me to examine any specific aspects in more detail.”

Hrmph. What about the puns and jokes, Claude? Tough crowd. We’re here all week, don’t eat the fish.

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

5 Comments

  1. Hmmmm, seems like Claude isn’t programmed to like competing products much, or it would’ve likely noted, in its detailed summary of this article, that “MosaicML” is actually mis-spelled as “MosiacML” (as commonly happened also in the days of the NCSA Mosaic web browser) — at least it didn’t go all hog-wild editing-crazy and mess with the link to Katyanna Quach’s El Reg article on Databricks … not as harmful as GPT-4 then! But still, Bad Claude!

  2. Well, this all makes perfect sense to me (I think) …

    The only way to properly reap and enjoy the humongous rewards of one’s gastronomic investments in modern AI, without sharing, is indeed to dig a deep and wide moat around it, infested with venomous frog soup, and a stew of poisonous crocodiles. First you do need something to protect of course, and if Meta raises Llamas, Google reads PaLMs, Apple scrubs with Ajax, and Microsoft gorges on sauerkraut and brussel sprouts (J’ai pété!), it only makes sense for AWS to cuddle up to Claude, the deliciously sensible French-born Roman emperor, who ruled between Caligula and Nero, and was nothing like the Big Five GAFAM CEOs — opposites attract, in a poetically anthropic kind of way! 8^d

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.