Will Nvidia's Mastery of the 'Moore's Law' in the AI Era Widen the Gap Between Chinese and American AI Companies?

baoshi.rao

On March 18 local time, Nvidia unveiled multiple chips and software products at the 2024 GTC conference.

Founder Jensen Huang stated: "General-purpose computing has lost momentum. Now we need larger AI models, larger GPUs, and more GPUs stacked together. This is not about reducing costs but about scaling up."

As the centerpiece of the GTC conference, Nvidia introduced the Blackwell GPU, which is divided into the B200 and GB200 series. The latter integrates one Grace CPU and two B200 GPUs. The NVIDIA GB200 NVL72 rack-scale system utilizes GB200 chips, combined with NVIDIA BlueField-3 data processing units and fifth-generation NVLink interconnect technology. Compared to systems with the same number of H100 Tensor Cores, it achieves up to 30 times performance improvement in inference while reducing costs and energy consumption by 25 times.

In AI applications, NVIDIA introduced Project GR00T, a foundational model for robotics, along with significant updates to the Isaac robotics platform.

NVIDIA demonstrated that its AI chips have achieved 1000-fold growth in computing power over the past 8 years, indicating the formation of Moore's Law for the AI era (rapid growth in computing power with rapidly decreasing costs). At the GTC conference, NVIDIA not only released updates in computing power but also introduced its progress in applications.

Blackwell is not just a chip but also a platform. NVIDIA's goal is to make it easy to train and perform real-time inference for AI models with up to 10 trillion parameters.

The smallest unit is the B200, which contains 208 billion transistors, manufactured using a custom 4NP TSMC process and featuring a Chiplet architecture. Two GPU dies are connected via a 10TB/s chip-to-chip link to form a unified GPU. The GB200 superchip connects two B200 Tensor Core GPUs with an NVIDIA Grace CPU through a 900GB/s ultra-low-power NVLink chip-to-chip interconnect technology.

At a higher level, the NVIDIA GB200 NVL72 is a multi-node, liquid-cooled rack system containing 36 Grace Blackwell superchips, which include 72 Blackwell GPUs and 36 Grace CPUs. Supported by NVIDIA BlueField-3 data processing units, it enables cloud network acceleration, composable storage, zero-trust security, and GPU computing elasticity in hyperscale AI clouds.

This system can operate as a "single GPU," delivering 1.4 exaflops of AI performance and 30TB of fast memory. It is claimed that a single GB200 NVL72 can support models with up to 27 trillion parameters. The largest system is the DGX SuperPOD, where NVIDIA GB200 NVL72 serves as the building block. These systems are connected via NVIDIA Quantum InfiniBand networks and can scale to tens of thousands of GB200 superchips.

Additionally, NVIDIA offers HGX B200 server boards, which link eight B200 GPUs via NVLink, supporting x86-based generative AI platforms. The HGX B200 supports network speeds of up to 400Gb/s through the NVIDIA Quantum-2 InfiniBand and Spectrum-X Ethernet network platforms.

GB200 will also be available to customers on NVIDIA DGX Cloud, an AI platform co-designed with leading cloud service providers like AWS, Google Cloud, and Oracle Cloud. It provides enterprise developers with dedicated access to the infrastructure and software needed to build and deploy advanced generative AI models. NVIDIA provided a practical model training example: training a GPT-MoE-1.8T model (suspected to be GPT-4). Previously, using Hopper series chips required 8,000 GPUs for 90 days of training. Now, with GB200, the same model can be trained with only 2,000 GPUs, consuming only a quarter of the previous energy consumption.

A system composed of GB200 units demonstrates 30 times better inference performance compared to a system with the same number of NVIDIA H100 Tensor Core GPUs, while reducing costs and energy consumption by 25 times.

Supporting these AI chips and computing systems are a series of new technologies, including: the second-generation Transformer Engine (supporting double the computation and model size), fifth-generation NVLink (providing 1.8TB/s bidirectional throughput per GPU), RAS Engine for reliability (enabling AI systems to run continuously for weeks or even months), and Secure AI (protecting AI models and customer data). On the software front, the Blackwell portfolio is supported by NVIDIA AI Enterprise, an end-to-end enterprise-grade AI operating system. NVIDIA AI Enterprise includes NVIDIA NIM inference microservices, along with AI frameworks, libraries, and tools that enterprises can deploy on NVIDIA-accelerated clouds, data centers, and workstations. The NIM inference microservices enable optimized inference for dozens of AI models from NVIDIA and its partners.

Integrating NVIDIA's innovations in computing power, we observe its remarkable advancements in AI model training and inference.

In terms of AI model training, more powerful chips and advanced inter-chip communication technologies allow NVIDIA's computing infrastructure to train larger models at relatively lower costs. GPT-4V and Sora represent the future of generative AI—multimodal models and large-scale visual models including video. NVIDIA's progress makes larger, more multimodal, and more advanced models possible. In the field of AI inference, the increasing scale of models and the growing demand for real-time performance present severe challenges to inference computing power. NVIDIA's AI computing system has achieved a 30-fold improvement in inference performance while reducing costs and energy consumption by 25 times. This not only makes real-time inference for large models possible but also addresses previous issues with energy efficiency and cost.

At the GTC conference, NVIDIA announced a series of new achievements in applications such as biomedical, industrial metaverse, robotics, and automotive sectors. Among these, robotics (embodied intelligence) is a key focus area.

NVIDIA introduced the Project GR00T foundation model for bionic robots and significant updates to the Isaac robotics platform. Project GR00T is a general multimodal foundation model for humanoid robots, acting as their 'brain' to enable learning various task-solving skills.

The Isaac Robotics Platform provides developers with new robot training simulators, Jetson Thor robot computers, generative AI foundation models, and CUDA-accelerated perception and manipulation libraries.

Customers of the Isaac Robotics Platform include leading humanoid robotics companies such as 1X, Agility Robotics, Apptronik, Boston Dynamics, Figure AI, and XPENG Robotics. NVIDIA has also entered the field of industrial and logistics robotics. The Isaac Manipulator provides state-of-the-art dexterity and modular AI capabilities for robotic arms. It offers up to 80x acceleration in path planning and improves efficiency and throughput through Zero Shot perception (representing success rate and reliability). Its early ecosystem partners include Yaskawa Electric, PickNik Robotics, Solomon, READY Robotics, and Franka Robotics.

The Isaac Perceptor delivers multi-camera, 3D surround vision capabilities, which are particularly useful for automated material handling robots. It helps companies like ArcBest and BYD achieve new levels of automation in material handling operations.

In terms of development approach, NVIDIA has distinct differences from companies like OpenAI. Companies like OpenAI, Anthropic, and Meta focus on AI models as their core, then operate platforms and ecosystems; NVIDIA, on the other hand, centers on computing power and extends to software platforms and AI applications. In terms of applications, it does not exhibit a monopolistic stance but collaborates with partners across various industries, aiming to build a vast ecosystem integrating both hardware and software.

NVIDIA's advancements in computing power have profoundly impacted AI startups.

For large model startups, such as OpenAI, this is clearly beneficial. They can train larger, more multimodal models at a faster pace and lower cost, and have the opportunity to further reduce API prices and expand their customer base. For AI application startups, NVIDIA has not only increased inference computing performance by dozens of times but also reduced energy consumption and costs. This enables AI application companies to expand their business scale under affordable costs. With the further growth of AI computing power, the operational costs of AI application companies may continue to decrease in the future.

For AI chip startups, NVIDIA's major updates have brought significant pressure. NVIDIA provides a complete system, including computing chips, inter-chip communication technology, and network chips that break the memory wall. AI chip startups must find directions where they can truly establish advantages, rather than losing their value due to one or two updates from giants like NVIDIA.

Chinese AI startups, for various reasons, find it difficult to use the latest and most powerful NVIDIA AI chips. As substitutes, domestic AI chips still lag behind in computing power and energy efficiency. This may lead to a widening gap between companies focused on large models and their overseas counterparts in terms of model scale expansion and iteration speed. For Chinese AI application companies, there are still opportunities. They can utilize not only domestic foundational models but also advanced open-source models from overseas. China boasts world-class AI engineers and product managers who can develop products capable of competing globally. This enables AI application companies to expand into overseas markets while also having a vast domestic market as their foundation. The ByteDance or miHoYo of the AI era is likely to emerge from among them.