
A scant three months ago, when Meta Platforms released the Llama 3 AI model in 8B and 70B versions, which correspond to the billions of parameters they can span, we asked the question we ask of every open source tool or platform since the dawn of Linux: Who’s going to profit from it and how are they going to do it?
The hyperscaler and social network put itself on the open source AI track when it launched the first Llama model in early 2023, and it has since continued to pour hundreds of millions of dollars into building better and more capable models that chief executive officer Mark Zuckerberg and other executives said rivaled the performance of the closed and proprietary models of their for-profit counterparts, who are vying to take leadership shares of a global generative AI market that could reach as high as $356 billion by 2030.
But Zuckerberg believes that the open source model is not only good for Meta but also for the world in general, saying in an open letter this week that the constraints placed on the company by Apple while building its services was a formative experience.
To quote Zuckerberg: “It’s clear that Meta and many other companies would be freed up to build much better services for people if we could build the best versions of our products and competitors were not able to constrain what we could build. On a philosophical level, this is a major reason why I believe so strongly in building open ecosystems in AI and AR/VR [augmented and virtual reality] for the next generation of computing.”
Llama 3.1 Takes On The World
His letter came in conjunction with the release of Llama 3.1 – an update that was hinted at during the release of Llama 3 – its largest large-language model (LLM) that the company says can outperform Anthropic’s Claude 3.5 Sonnet and OpenAI’s GPT-4 and GPT-4o models. With this latest release, Meta upgraded its 8B and 70B versions, but the focus is the 405B model, which Meta called the first “frontier-level” open source AI model. It was trained on more than 15 trillion tokens using more than 16,000 expensive and hard-to-find Nvidia H100 GPUs.
At that scale, Meta scientists said they chose to develop the 405B as a standard decoder-only transformer model architecture – though with minor adaptations – rather than a mixture-of-experts, a move aimed to ensure training stability.
“We adopted an iterative post-training procedure, where each round uses supervised fine-tuning and direct preference optimization,” the Meta researchers wrote in the announcement post. “This enabled us to create the highest quality synthetic data for each round and improve each capability’s performance.”
All three models within the 3.1 release – the 405B as well as the enhanced 8B and 70B – are getting enhancements, such as an extended context length of 128,000, up from 8,000, and support for eight languages (English, French, German, Hindi, Italian, Portuguese, Spanish, and Thai). Users in the United States can try out the 405B Llama model on WhatsApp and at meta.ai.
Open Is The Way To Go
Amid all this, Zuckerberg and Meta are continuing the open source drumbeat, with the company researchers noting that the goal is to make the models part of a larger system that can juggle multiple components, all with the plan to give developers the technologies they need to create their own custom AI tools, an idea they said was introduced last year when Meta first incorporated components that were outside of the LLMs.
Along with Llama 3.1, Meta is releasing a compete reference system and new components like Llama Guard 3 to give developers a safeguard by more easily detecting content that violated standards, detect cyberattacks, and prevent malicious code to be put out by the models. In addition, Prompt Guard helps filter out prompt injections, which threat groups use to bypass security controls in LLMs.
Meta also is looking to build out what it’s calling the “Llama Stack,” APIs that will make it easier for third-party developers to use the Llama LLMs. The company also has posted a request for comment on GitHub for suggestions on what the stack should look like.
“The implementation of components in this Llama System vision is still fragmented,” the researchers wrote. “That’s why we’ve started working with industry, startups, and the broader community to help better define the interfaces of these components. Our hope is for these to become adopted across the ecosystem, which should help with easier interoperability.”
Pulling In Partners
As with most open systems, creating a community of tech partners is a key part of Meta’s plans. With Llama 3.1, the company has more than two dozen vendors offering services, with Zuckerberg writing that “as the community grows and more companies develop new services, we can collectively make Llama the industry standard and bring the benefits of AI to everyone.”
The release of Llama 3.1 tightened Meta’s relationship with Nvidia, which will be on full display next week when Zuckerberg and Jensen Huang, the GPU maker’s chief executive officer, sit down to talk about generative AI and its use for building virtual worlds.
Nvidia released its AI Foundry for the Llama 3.1 models, which lets developers build and deploy custom AI models best suited to their specific needs using its accelerated computing and software, including DGX Cloud, foundation models, and NeMo software. There also are consulting services from the likes of Accenture, Deloitte, Infosys, and Wipro, and with DGX Cloud offering increasing capacity on such cloud services as Amazon Web Services, Microsoft Azure, Google Cloud, and Oracle Cloud Infrastructure.
In addition, organizations also can use Nvidia’s NIM inference microservices with all three Llama 3.1 models.
AWS is making the three models available in its Amazon Bedrock AI managed service and Groq is running the models on its LPU inference technology. Other tech partners include Dell, Microsoft, Google, Databricks, and Snowflake.
Such partnerships will improve the services Meta offers in its various businesses, which include Facebook, WhatsApp, and Instagram, which will only benefit the company, Zuckerberg wrote. The company needs to build up the ecosystem of integrated tools, silicon optimizations, and other components, he said, adding that “if we were the only company using Llama, this ecosystem wouldn’t develop and we’d fare no better than the closed variants of Unix.”
In addition, a rapidly evolving AI market means that Llama will have to be highly competitive and open if it is to become the industry standard. Also, selling access to AI models isn’t part of Meta’s business plan, so releasing Llama won’t hurt its revenue.
Zuckerberg compared what Meta is doing with Llama with what it did when it founded the Open Compute Project in 2011, releasing its server, storage, networking, and datacenter designs and in the course of that saving billions of dollars with its “vanity free” iron and innovative datacenters.
“We benefited from the ecosystem’s innovations by open sourcing leading tools like PyTorch, React, and many more tools,” he said. “This approach has consistently worked for us when we stick with it over the long term.”
Zuckerberg is now betting that the same approach will work in the high-stakes and highly competitive world of AI.
Great to see Groq’s dataflow LPU Groqchip applied to inferencing of these Llama 3.1 models (especially 405B). Their 10x power efficiency advantage, and 10x speed boost should be quite valuable here ( https://www.nextplatform.com/2023/11/27/groq-says-it-can-deploy-1-million-ai-inference-chips-in-two-years/ ). The flexibility of GPUs might be needed for training (and culling, quantizing, etc …), but for inference, the more efficient dataflow arch seems to win out nicely (also Cerebras, SambaNova, …).