Meta Plans to Deploy Its Own AI Chips This Year to Reduce Dependence on Nvidia GPUs

baoshi.rao

Social media giant Meta plans to deploy a custom second-generation AI chip, codenamed "Artemis", in its data centers this year.

According to Reuters, the new chip will be used for "inference" in Meta's data centers, which is the process of running AI models. The goal of this initiative is to reduce reliance on Nvidia chips and control the costs of AI workloads. Additionally, Meta is offering generative AI applications in its services and is training an open-source model called Llama3, aiming to reach the level of GPT-4.

Image credit: AI-generated image, licensed by Midjourney Meta CEO Mark Zuckerberg recently announced plans to deploy 340,000 Nvidia H100 GPUs by the end of this year, totaling approximately 600,000 GPUs for running and training AI systems. This makes Meta Nvidia's largest publicly disclosed customer after Microsoft. However, as models grow more powerful and larger in scale, AI workloads and costs continue to rise. Beyond Meta, companies like OpenAI and Microsoft are also attempting to break this cost spiral through proprietary AI chips and more efficient models.

In May 2023, Meta launched its new chip series called Meta Training and Inference Accelerator (MTIA), designed to accelerate and reduce the cost of running neural networks. According to official announcements, the first chip is expected to be operational by 2025 and was already being tested in Meta's data centers at the time. Reuters reports that Artemis is already an advanced version of MTIA. Meta's initiative demonstrates their intention to reduce reliance on Nvidia chips by deploying their own AI processors and controlling the costs of AI workloads. They plan to put the Artemis chip into production this year, stating: "We believe our self-developed accelerators, combined with commercially available GPUs, offer the optimal performance and efficiency for Meta's specific workloads." This move will provide Meta with greater flexibility and autonomy while potentially lowering the costs of AI workloads.