Microsoft and OpenAI Invest $100 Billion to Build 'Stargate' Supercomputer
-
On March 30, the renowned tech media The Information exclusively reported that Microsoft and OpenAI are developing a data center project, which includes a supercomputer named 'Stargate.' This supercomputer will be equipped with millions of AI-specific chips, primarily serving OpenAI's research and product development.
According to an insider who has spoken with OpenAI co-founder and CEO Sam Altman and reviewed Microsoft's cost estimates for the project, the total investment could reach up to $100 billion. As early as May 19, 2020, Microsoft officially announced that it had built a supercomputer in Azure cloud services specifically for OpenAI, equipped with 10,000 GPUs and 285,000 CPUs, which was one of the five fastest computers in the world at the time. This played a crucial role in the later development of products like ChatGPT and DALL·E3.
The upcoming Stargate will be even more powerful and faster than the 2020 version, helping OpenAI develop large models to achieve AGI.
Brief Introduction to Stargate According to The Information, the data center project is divided into 5 phases, with Microsoft and OpenAI currently in the middle of phase 3. Phase 4 involves Microsoft building a supercomputer slightly smaller than Stargate, expected to be operational around 2026.
Phase 5 is the Stargate supercomputer, which will be equipped with millions of AI-specific chips and is expected to be operational around 2028. One of the main tasks for phases 4 and 5 is the large-scale procurement of AI chips. The project will cover an area of over 400 acres and require approximately 5 gigawatts of electricity, making it an energy-intensive endeavor. As a result, nuclear power may be utilized for its energy supply.
The investment in this project is about 100 times that of most advanced data centers. The primary reason for Microsoft's substantial expenditure is that OpenAI has been constrained by computing power limitations, preventing it from fully unleashing its product and innovation capabilities.
For instance, in 2023, OpenAI had planned to launch an AI product called "Arrakis," but abandoned the plan due to AI computing power constraints. Microsoft once built a dedicated supercomputer for OpenAI
Spending $100 billion to build a super data center was not an impulsive move by Microsoft, but a proven strategy they had successfully executed before. At the "Build" developer conference in May 2020, Microsoft announced it had built a dedicated supercomputer for OpenAI within its Azure cloud services. This system was equipped with 10,000 GPUs and 285,000 CPUs, with each GPU server featuring a network speed of 400GB per second, specifically designed for training AI models.
It's fair to say that without the assistance of this supercomputer, the globally sensational ChatGPT might not have come into existence.
Microsoft's current investment is 100 times larger than the 2020 version, primarily because OpenAI's technology is rapidly evolving, including the development of groundbreaking text-to-video models like Sora. Compared to ChatGPT's text, video models require higher computational power and face greater processing difficulties during pre-training and fine-tuning. This is because video data consists of high-dimensional information including width, height, time, and color channels.
Therefore, more complex data preprocessing steps are needed, such as video decoding, frame extraction, frame resampling, and size adjustment. In contrast, text data is relatively low-dimensional, primarily dealing with sequence data processing including tokenization, tagging, and word embedding.
In terms of computational power, due to the high-dimensional nature of video data, video models require significantly more AI computing resources during both training and inference than text models. Video processing typically needs to capture temporal changes while maintaining spatial details, which demands substantial AI computing power support. OpenAI's Chief Technology Officer Mira Murati stated that generating a 20-second 720P video with Sora takes approximately several minutes.
If 100,000 people simultaneously use Sora to generate different types of videos, the AI computing power consumed would be astronomical.
Thus, AI computing power, like data, has become the water, electricity, and coal infrastructure of the generative AI field. Any product's technological innovation and iteration depend on it.