China Computing Power Development Index White Paper: AI Server Prices Continue to Rise with No Sign of Slowing Down

baoshi.rao

Driven by the boom in AI large model development, market demand for computing power has surged. As a key infrastructure for computing power, AI servers, with their advantages in graphics rendering and parallel processing of massive data, can quickly and accurately handle large volumes of data, making their market value increasingly prominent. Recently, due to soaring demand, coupled with the ongoing shortage of core components like GPUs (graphics processing units/accelerator chips) and rising GPU prices, AI server prices have skyrocketed.

Some companies revealed that the price of AI servers they purchased in June last year has increased nearly 20-fold in less than a year. During the same period, GPU prices have also continued to rise. For example, the market price of an A100 GPU has reached 150,000 yuan, up from 100,000 yuan two months ago—a 50% increase. The price increase for the A800 has been relatively smaller, currently around 95,000 yuan, compared to 89,000 yuan last month.

AI server prices have fluctuated significantly in recent years, mainly due to continuous upgrades in configurations and the limited production capacity of core GPU components from major global manufacturers like NVIDIA and AMD. With the ongoing shortage of GPU supply, AI server prices are expected to continue rising in the future.

Future AI server prices are expected to maintain an upward trend

It is worth noting that the persistent shortage of high-end GPU chips has, to some extent, affected the shipment volume of AI servers for companies, thereby impacting the short-term performance of related server manufacturers listed on the stock market. However, in the long run, AI servers are widely regarded as promising in the industry, and listed companies are actively expanding their investments in this area.

Since the rise of ChatGPT, major global tech companies have been actively embracing AIGC (generative AI) and focusing on developing large AI models. According to incomplete statistics, since Baidu first announced "Wenxin Yiyan" on March 16, more than 30 large model products have been introduced in China.

However, the implementation of large AI models requires massive amounts of data and powerful computing power to support training and inference processes. Huawei estimates that by 2030, compared to 2020, the demand for computing power driven by the AI boom will increase by 500 times.

AI servers are designed to handle the massive data workloads of deep learning, including training and inference that require large memory capacity, high bandwidth, and overall system cache coherence. Compared to ordinary servers, AI servers are equipped with multiple high-performance accelerators (mostly GPUs), offering higher computational power, faster processing speeds, and larger storage capacity.

Since the debut of the first AI model (AlexNet) in 2012, the depth and breadth of AI models have been expanding. As one of the largest language models today, GPT-3, released in 2020, uses 175 billion parameters—over 100 times more than its predecessor. The Microsoft and NVIDIA co-developed Megatron-Turing-NLG model boasts 530 billion parameters.

According to the "China Computing Power Development Index White Paper" compiled by the China Academy of Information and Communications Technology, the computational resources used for AI model training have surged over the past decade. The computational complexity of AI training has increased tenfold annually, making AI computing the dominant form of computing. Global intelligent computing power (converted to FP32) is expected to grow rapidly from 232 EFlops in 2021 to 52.5 ZFlops by 2030, with a compound annual growth rate (CAGR) exceeding 80% during this period.

TrendForce predicts that global AI server shipments will reach approximately 200,000 units by 2026.

TrendForce notes that driven by emerging applications such as AIGC, autonomous driving, AIoT, and edge computing, many large cloud providers are increasing investments in AI-related infrastructure. It is estimated that AI servers equipped with GPGPUs (General Purpose GPUs) accounted for only 1% of the total server market in 2022 but are expected to grow to 200,000 units by 2026, with a CAGR of 10.8% from 2022 to 2026.

According to TrendForce statistics, in 2022, the procurement of AI servers was dominated by the four major North American cloud providers—Microsoft, Google, Meta, and AWS—which collectively contributed 66.2%. In China, ByteDance was the most active investor, accounting for 6.2%.

The cost structure of different types of servers varies, with the proportion of chip costs increasing as server performance improves.

Servers are essentially computers, with core hardware including CPUs, accelerator cards (primarily GPUs), memory, hard drives, network cards, power supplies, and motherboards. Among these, computing power chips such as CPUs and GPUs are the main cost components, with their proportion rising as server performance increases.

Based on data from Huajing Intelligence Network and ARK, chip costs account for about 32% of the total cost in ordinary servers. However, in servers used for machine learning, which typically feature multiple high-performance GPUs, chip costs can quickly rise to 83%.

The difference between AI servers and ordinary servers lies in their hardware configuration and usage.

AI servers are designed to handle massive data workloads for deep learning, including training and inference that require large memory capacity, high bandwidth, and cache coherence across the system. Compared to ordinary servers, AI servers are equipped with multiple high-performance accelerators (mostly GPUs), offering higher computational power, faster processing speeds, and larger storage capacity to support high-load and complex computing tasks.

With the upgrade from GPUs like the A100 to the H100, AI servers are expected to see both increased volume and higher prices.

According to NVIDIA's official website, the H100 achieves a 6x overall performance improvement over its predecessor A100 through three key advancements: 1) Fourth-generation Tensor Cores with FP8 precision Transformer Engines optimize large model training/inference; 2) Upgraded NVLink 4.0 enables 900GB/s GPU-to-GPU interconnect; 3) Enhanced memory bandwidth.

Analysts predict the higher-priced H100's rollout will drive both volume and price increases in the AI server market this year.

The launch of ChatGPT, Baidu's ERNIE Bot, and other major models has created an "AI model explosion." Google's recent PaLM 2 release competes with OpenAI's GPT-4, while Chinese firms like Cloudwalk and Gravity Media announce new model developments. This "100-model war" triggers exponential computing demand growth, igniting a global "computing arms race."

AI development relies on data, algorithms, and computing power - progress in any area fuels demand for others. While data and model research advance rapidly, computing capacity lags behind. OpenAI's temporary suspension of ChatGPT Plus subscriptions in April 2023 highlights this computing resource shortage.

Projections show AI computing growth far exceeds Moore's Law (doubling every 18 months), with global supercomputing power expected to reach 0.2 ZFLOPS by 2030 (34% CAGR). Huawei forecasts 500x AI computing demand growth in the next decade. Two critical questions emerge: What's the optimal AI computing solution? How far has China's "computing arms race" progressed?

Chip capability directly impacts high-performance training efficiency. Unlike CPU-based general computing, AI requires GPU/GPGPU/ASIC-based intelligent computing for training and inference. GPUs outperform other hardware in computational power - NVIDIA's A100 delivers 249x CPU throughput in AI inference. With models like H100, GPUs have become the cornerstone of AI computing.

CICC research notes that multi-GPU interconnect enhancements boost parallel computing capacity, increasing demand for more GPUs. As single GPUs struggle with deep learning demands, NVIDIA employs multi-GPU solutions. Industry analysts suggest high-end GPU quantities will determine model training scale and become a key metric for evaluating corporate AI capabilities.

According to TrendForce data, measured by the processing power of NVIDIA's A100 GPU, the GPT-3.5 large model requires 20,000 GPUs to handle training data. There is also an industry consensus that the computing power threshold for developing competitive AI large models is 10,000 A100 chips.

Currently, the global GPU market is dominated by three giants: NVIDIA, Intel, and AMD, with their Q4 market shares for discrete GPUs being 85%, 6%, and 9% respectively. NVIDIA leads in AI, cloud computing, and discrete GPUs, with the A100 and H100 achieving peak floating-point performance of 19.5 TFLOPS and 67 TFLOPS, respectively.

In contrast, China's domestic GPU industry is still in its infancy, lagging significantly behind international manufacturers. However, with export restrictions on high-end GPUs, the China-specific A800 has seen its price surge by 100,000 yuan, creating urgent demand for domestic GPUs. In this context, localization has become imperative, and domestic GPU manufacturers have emerged rapidly in recent years. Leading domestic GPU developers include Cambricon, Jingjia Micro, and Huawei Ascend. Among them, Jingjia Micro is the first Chinese company to successfully develop a domestic GPU chip and achieve large-scale engineering applications. Industry experts note that its flagship product, the JH920, performs similarly to NVIDIA's GTX 1050 released in 2016, indicating a long road ahead to catch up in the mid-to-high-end and high-performance computing sectors.

Regarding the computing power market, industry insiders believe that investing in GPUs is currently the most practical solution. From the perspective of domestic GPU manufacturers, China's high-end GPU capabilities are still weak. The only viable path is to use more low-end GPUs for stacking and optimization, coordinating their performance through optimization and synergy to simulate higher-end capabilities.