Microsoft's AI Chips Elevated to Divine Status, Yet Nvidia Remains the King of AI Chips

baoshi.rao

At the recent Microsoft Ignite 2023 conference, Microsoft dropped multiple 'bombshells.'

First, two new chips were unveiled: one based on ARM architecture and another a self-developed AI chip. Additionally, Bing Chat was rebranded as Microsoft Copilot, completing the full Copilot transformation of AI products. Finally, Copilot Studio was introduced, allowing anyone to customize AI chatbots.

It's no exaggeration to say that this conference once again elevated Microsoft to divine status. After all, in the past few months, OpenAI has rolled out a series of updates and plans, recapturing the world's attention.

But if generative AI, or even the future of AI, is said to be concentrated solely on Microsoft and OpenAI, Nvidia might be the first to disagree. During the Ignite conference, Microsoft CEO Satya Nadella invited Nvidia founder and CEO Jensen Huang on stage and posed a question:

Where is the Future of AI Headed?

Huang Renxun states that generative AI is the most significant paradigm shift in computing history over the past 40+ years, surpassing PCs, mobile devices, and even the internet. The first wave of generative AI was triggered by OpenAI's GPT models, while the second wave is Microsoft's current Copilot model.

Image/ YouTube@Microsoft

The third and largest wave will be NVIDIA's Omniverse combined with generative AI to help digitize heavy industries. "The vast majority of the world's industries rely on heavy industry," explained Huang.

This is not the first time NVIDIA has highlighted the integration of Omniverse and generative AI.

At the SIGGRAPH graphics technology conference in August, NVIDIA extensively discussed the combination of generative AI and Omniverse, showcasing a 'PDF to Factory' demo. In simple terms, NVIDIA has moved the complex engineering task of "building a factory" into the digital world, using generative AI and graphics technology to transform 2D blueprints into 3D models. By adding lighting, textures, and extensive information, they ultimately create a "digital twin" version of the factory.

Virtual Factory

Virtual Factory, Image/NVIDIA

On the other hand, although Microsoft's newly released self-developed AI chips have shown strong competitiveness, coupled with the threat posed by AMD, many individuals and companies see hope in breaking NVIDIA's dominance in computing power. However, in reality, NVIDIA GPUs still hold a significant advantage, whether it's the H100 released last year or the recently launched H200.

Microsoft's release of its self-developed AI chip, Maia 100, comes as no surprise. Firstly, there were earlier reports hinting at this development. Secondly, the world's largest cloud computing companies—Google and Amazon—have already introduced their own AI chips. Of course, another direct factor is Nvidia.

It's well-known that Nvidia GPUs have effectively become the 'hardware standard' for large AI models. The H100 has turned into a strategic resource hoarded by all tech giants, and even the A100, released as far back as 2020, still triggers a 'scramble.' However, on one hand, due to production constraints, Nvidia GPUs are perpetually in short supply. On the other hand, the enormous profits from Nvidia GPUs and the costly AI model arms race have sparked widespread discussions about 'only Nvidia making money.'

The question is, tech giants haven't found a viable alternative to Nvidia, making self-developed AI chips a potentially better option. But can Microsoft's Maia 100, for example, truly replace Nvidia GPUs?

Microsoft Maia 100 AI Chip

Maia 100, Image/Microsoft

According to Nadella, Microsoft's self-developed AI chip Maia 100 is built on TSMC's 5nm process, the same as NVIDIA's H100, and features an astonishing 105 billion transistors. Public data suggests this chip is currently the largest AI chip in existence.

Semiconductor research firm SemiAnalysis revealed that Maia 100 delivers 1600 TFLOPS in MXInt8 and reaches 3200 TFLOPS in MXFP4. Additionally, analysis indicates that the annual cost of developing Maia 100 is approximately $100 million.

If we only look at the numbers, Maia 100's computing power completely surpasses Google's TPUv5 and Amazon's Trainium/Inferentia2 chips. Even compared to NVIDIA's H100, the gap isn't significant.

However, it's important to note that MXInt8 and MXFP4 are the latest data formats. MXInt8 is expected to replace FP16/BF16, while MXFP4 is expected to replace FP8. Yet in reality, no company has trained large models using these new data formats yet. Therefore, at least in the training phase, Maia 100's computing power isn't actually suitable for direct comparison with other GPUs or AI chips.

Another noteworthy point is that Microsoft's Maia 100 has a memory bandwidth of 1.6TB/s, which still outperforms Amazon's Trainium/Inferentia2 but falls short of Google's TPUv5, not to mention NVIDIA's H100.

In fact, Microsoft understands that "Rome wasn't built in a day" – Maia 100 still has a long way to go before it can replace Nvidia's H100 or newer models. Reports indicate that Maia 100 uses direct liquid cooling and currently only runs GPT-3.5 for GitHub Copilot, with plans to expand support for partial Azure cloud workloads next year.

Just a few days ago, Nvidia released the H200 GPU, which boosts large model training and inference performance by 60% to 90% solely through significant upgrades in memory bandwidth and capacity.

This might explain why, while announcing its self-developed Maia 100 AI chip, Microsoft still declared at the Ignite conference that it would continue collaborating with Nvidia to build the next-generation AI supercomputer and factory.

As the undisputed leader with the deepest moat, NVIDIA may not have paid much attention to Microsoft's in-house AI chips. However, it is clear that NVIDIA is more concerned with how Omniverse can integrate generative AI to become the 'next big wave' in the new era after Copilot.

For NVIDIA's heavily promoted Omniverse, many may have already forgotten about it, but they certainly remember the 'real and fake Jensen Huang' from two years ago.

NVIDIA

In April 2021, NVIDIA once again held an online 'kitchen' keynote, traditionally presented by Jensen Huang. For over three months afterward, no one discovered the true wonder of this presentation - until NVIDIA voluntarily revealed the secret at SIGGRAPH in August that year:

The kitchen scene, leather jacket, oven... even Jensen Huang himself along with his movements and expressions were all 'fake,' or more precisely, a 'digital twin' of reality.

The Omniverse platform began surfacing at this time, coinciding with the peak popularity of the 'metaverse' concept. Some even considered it NVIDIA's version of the metaverse. However, one crucial difference between Omniverse and the metaverse is NVIDIA aims to create a digital twin world with the core purpose of influencing the physical world.

Digital Twin Kitchen

Virtual railway, Image/NVIDIA

As mentioned at NVIDIA's GTC conference last year, German Railways has constructed and operates a 'digital twin' of railway tracks on Omniverse, encompassing 5,700 stations and spanning over 30,000 kilometers. Within this 'virtual railway,' German Railways can train and validate AI models, continuously monitor the operation of railways and trains, and simulate various unexpected scenarios to assess their impact on operations.

The most direct practical value is that, based on testing and validation on Omniverse, it is possible to increase railway transport capacity and operational efficiency without constructing new tracks, while also reducing carbon emissions.

Through the testing and validation of 'digital twins,' the results from the digital world can guide the real world, which is one of the core reasons why Jensen Huang has always held Omniverse in such high regard. This is also why, even as the 'metaverse' concept has been abandoned worldwide, NVIDIA continues to 'promote' its Omniverse at every GTC and SIGGRAPH conference, including this year's Microsoft Ignite event.

Of course, digital twin technology is not without its flaws, and the biggest challenge currently may still be cost.

Two years ago, during NVIDIA's remarkably realistic presentation, the digital twin version of 'Jensen Huang' appeared for only 14 seconds, but it required a series of complex tasks and significant manpower and resources. In contrast, this year's SIGGRAPH demo, 'From PDF to Factory,' heavily utilized generative AI to participate.

Converting 2D to 3D, Image/NVIDIA

Caption: Converting 2D to 3D, Image/NVIDIA

Based on the Omniverse platform, through dialogues with various generative AIs, 2D blueprints can be transformed into complete 'digital twins.' Two years ago, we couldn't have imagined this, but today, generative AI has proven its capabilities and potential to the world.

From this perspective, it's no surprise that standing next to Microsoft CEO Nadella, Jensen Huang stated, 'Copilot is important, but Omniverse + generative AI is even more crucial.'

Cover image from Microsoft Ignite Conference