NVIDIA Unveils New Architecture, Set to Launch This Year with Potential Price of $200,000 per Unit

baoshi.rao

On March 19th, local time Monday, NVIDIA grandly announced its latest generation of AI-specific GPU chips and software for running AI models at the Global Developers Conference held in San Jose, California, aiming to further solidify its position as the preferred supplier in the AI field.

The new generation of AI GPU architecture introduced by NVIDIA is named Blackwell, with the first chip based on this architecture, the GB200, expected to be launched later this year. Despite the continued high demand for older products like the H100 chip based on the Hopper architecture, NVIDIA is attracting customers by introducing even higher-performance new chips, encouraging them to keep placing orders.

Since the end of 2022, with the launch of OpenAI's chatbot ChatGPT, an AI boom has been ignited, leading to a fivefold surge in NVIDIA's stock price and a more than doubling of its revenue. NVIDIA's high-performance GPUs are crucial for training and running large AI models. Tech giants like Microsoft and Meta have spent billions of dollars purchasing these chips.

NVIDIA CEO Jensen Huang stated at the conference, "The Hopper architecture is excellent, but we are pursuing even more powerful GPUs."

NVIDIA's stock price dipped slightly by over 1% in after-hours trading on Monday. To encourage customers to choose NVIDIA chips amid increasing competition, the company has introduced a paid software called NIM, which simplifies AI deployment.

NVIDIA executives stated that the company is transforming from a single chip supplier into a platform provider similar to Microsoft or Apple, where other companies can develop software on its platform.

Jensen Huang admitted: "Blackwell is not just a chip; it represents a platform."

Manuvir Das, Vice President of NVIDIA's Enterprise Computing division, said in an interview: "The best-selling commercial product is still the GPU, and the software is designed to help users utilize GPUs in different ways." He added: "Of course, we are still committed to product innovation. But now, we have truly achieved a transformation and developed our own commercial software business."

NVIDIA's new software makes it easier to run programs on all NVIDIA GPUs, even older models that are more suitable for deploying AI applications rather than development. Das said, "If you're a developer with a highly anticipated model and want wider adoption, just deploy it on NIM. We promise compatibility with all NVIDIA GPUs to ensure the model reaches a broad user base."

Blackwell: The Successor to Hopper Architecture

NVIDIA updates its GPU architecture every two years, achieving significant performance leaps. Over the past year, many released AI models have been trained on the Hopper architecture announced in 2022, which includes chips like the H100.

According to NVIDIA, the GB200 chip based on the Blackwell architecture will bring a massive performance leap for AI companies, with AI computing power reaching 20 petaflops per second—far surpassing the H100's 4 petaflops per second. This powerful computing capability will enable enterprises to train larger and more complex AI models.

The GB200 chip also integrates NVIDIA's high-performance Transformer inference engine, a technology specifically designed for running Transformer-based AI, which is a key component of the popular chatbot ChatGPT's core technology. The Blackwell architecture GPU chips are massive, essentially consisting of two independently manufactured chips integrated onto a single TSMC-fabricated die. NVIDIA has also launched the accompanying GB200 NVLink 2 servers, which house 72 Blackwell architecture GPUs along with other specialized components for AI model training.

Major cloud service providers, including Amazon, Google, Microsoft, and Oracle, will offer GB200 chip-based cloud services. Each GB200 system comprises two Blackwell architecture B200 GPUs and one ARM-based Grace CPU. NVIDIA revealed that Amazon Web Services (AWS) plans to establish a supercomputing cluster with 20,000 GB200 systems.

NVIDIA stated that Amazon's server cluster will be capable of deploying AI models with up to 27 trillion parameters, far surpassing the scale of the currently known largest GPT-4 model, which has 1.7 trillion parameters. Many AI researchers believe that ultra-large models with more parameters and data can demonstrate significantly enhanced capabilities.

Although NVIDIA has not disclosed the specific pricing for the GB200 chips and systems, analysts estimate that, based on the price range of the Hopper architecture H100 chips (approximately $25,000 to $40,000 per unit), the cost of an entire GB200 system could reach as high as $200,000.

NVIDIA's Inference Microservices NVIDIA has announced the addition of a new product called NIM (NVIDIA Inference Microservices) to its enterprise software subscription service. This service aims to simplify the process of using older GPU models for AI inference and software operations, enabling businesses to continue utilizing their existing hundreds of millions of NVIDIA GPU resources. Compared to training new AI models, inference requires fewer computational resources, allowing enterprises to run their AI models more conveniently through NIM without relying on computing services provided by companies like OpenAI.

As part of its strategy, NVIDIA encourages customers purchasing its servers to subscribe to NVIDIA Enterprise services, charging a licensing fee of $4,500 per GPU annually. Additionally, NVIDIA will collaborate with leading AI companies such as Microsoft and Hugging Face to ensure their AI models run smoothly on all compatible NVIDIA chips. Developers can use the NIM service to efficiently run models on their own servers or cloud-based NVIDIA servers without complex configurations.

"In the code that originally called OpenAI services, only one line of code needs to be replaced to connect it to the NIM service provided by NVIDIA," explained Das.

NVIDIA stated that the NIM software can not only run on cloud servers but also enable smooth operation of AI applications on laptops equipped with NVIDIA GPUs, further expanding the application scenarios of the NIM service.