ARM is jumping into the datacentre processor game, specifically for Meta. This move comes as tech companies seek affordable, energy-efficient solutions to handle their growing AI demands. Meta, along with other major players, relies on pricey graphics processing units (GPUs) to power its AI workloads. But these GPUs come with high costs and heavy energy consumption, plus they often need liquid cooling systems.
Meta considers AI a key part of its strategy across platforms like Facebook, Instagram, and WhatsApp. CEO Mark Zuckerberg is clear about his vision: he wants Meta to lead in AI, stating that this year could see the rise of a highly personalized AI assistant that reaches a billion users. To achieve this, Meta is shifting from using GPUs to developing custom silicon chips tailored for their specific workloads and data centres.
CFO Susan Li emphasized Meta’s commitment to creating custom silicon for unique tasks—something off-the-shelf chips can’t handle as efficiently. They’ve kicked off a long-term project called the Meta Training and Inference Accelerator (MTIA) to design an efficient architecture for their needs. Li mentioned that they’re starting to implement MTIA in early 2024 for critical tasks like core ranking and recommendations. The plan is to shift towards using MTIA more extensively through 2025, gradually replacing older GPU-based servers.
Efficiency is key for Meta as they roll out MTIA. They measure this by performance-per-watt, a metric that significantly affects overall costs. The MTIA chip, designed to fit into an Open Compute Platform (OCP) module, draws about 35 watts but needs additional components like a CPU and memory for full functionality.
The collaboration with ARM could enable Meta to transition from their earlier, highly customized chips to a next-gen design using general-purpose ARM processor cores. ARM is also gearing up to support scalable, power-efficient AI, having previously teamed up with Nvidia on the Blackwell Grace architecture. This year, Nvidia showcased an ARM-based chip designed for robust AI performance, essential for running extensive AI models.
Integration plays a major role here, with system on a chip (SoC) devices melding different computing components into one unit. The Grace Blackwell, an example of this, may align well with Meta’s goals for MTIA as they consider how to fuse their technology with ARM’s on a single chip.
Li’s comments about moving away from GPU servers and the goals of MTIA align neatly with the collaboration with ARM, potentially allowing Meta to scale its AI operations more effectively while decreasing its dependence on traditional GPU-based systems.
On ARM’s side, the company, owned by SoftBank, recently took a leading role in the Trump administration’s Stargate Project to enhance AI capabilities in the U.S. During ARM’s latest earnings call, CEO Rene Haas hailed Stargate as a significant infrastructure initiative, expressing enthusiasm for being the go-to CPU for the project, particularly when paired with the Blackwell CPU and ARM’s Grace architecture. He also highlighted the Cristal intelligence partnership with OpenAI, which aims to enable seamless AI communication across all device types, from earbuds to data centres.