Huawei's Wang Lei: Galaxy AI Network May Facilitate the Inclusive Development of AI Computing Power
-
With the rapid development of artificial intelligence, large model technology is being increasingly applied globally, demonstrating the profound transformation AI-driven models bring to various industries. However, AI large models do not emerge out of thin air or leap into existence overnight. Training these models is a complex systematic project that demands higher computational power and greater data transmission capabilities. As such, new network infrastructure will play a pivotal role in the development of the large model industry.
During the recent Huawei Connect 2023 conference, Huawei unveiled the Galaxy AI Network large model, which integrates over 20 billion linguistic data points from Huawei's data communication domain and encapsulates the expertise of more than 30,000 Huawei network specialists. At the event, Wang Lei, President of Huawei's Data Communication Product Line, shared Huawei's cutting-edge insights and unique perspectives on the era of AI intelligence. He also elaborated on how the Galaxy AI Network leverages high-throughput capabilities to unleash the full potential of AI computing power, thereby realizing the vision of "strengthening computing through networking."
In recent years, as digital infrastructure has increasingly supported government services, economic development, social welfare, and governance, the demand for AI computing power has surged exponentially. Recently, China's Ministry of Industry and Information Technology (MIIT) and five other departments jointly issued the High-Quality Development Action Plan for Computing Infrastructure, emphasizing the indispensable roles of computing power, transmission capacity, and storage capacity. Among these, the "transmission capacity," centered on network connectivity, is crucial for the AI industry's growth. Wang Lei noted that while large models are incredibly powerful, building a computing center for such models is prohibitively expensive. One customer estimated that a trillion-parameter model requires an initial investment of over 1 billion yuan, with annual maintenance costs in the tens of millions, making self-construction nearly impossible.
Consequently, customers are more concerned with another question: "Since self-construction is out of the question, how can we access large models as soon as possible?"
As Huawei's next-generation network infrastructure designed for the intelligent era, the Galaxy AI Network offers ultra-high throughput, long-term reliability, and elastic high concurrency, giving it a competitive edge. Wang Lei stated that the industry is moving toward a computing power leasing and service model, where large-scale computing power service centers will act as platforms serving diverse industries. This means computing power services will exist as public utilities, encompassing three key components: "computing power generation, computing power transmission, and computing power access."
First, in terms of computing power generation, larger AI computing centers are more likely to exhibit the phenomenon of "intelligent emergence." Therefore, computing power services must be centralized, and computing centers must be cluster-based. Second, for computing power transmission, AI training requires continuous interaction with new data to improve intelligence. However, enterprises and individuals may be far from computing centers, necessitating data transmission across hundreds or thousands of kilometers for training and feedback. Lastly, for computing power access, it is essential to ensure that neither urban nor rural areas, nor enterprises or individuals, are constrained by access limitations.
Wang Lei explained that Huawei's Galaxy AI Network provides comprehensive solutions and a series of products to address these three challenges.
For computing power generation, the Galaxy AI Network offers the next-generation Galaxy Intelligent Computing Switch, featuring high-density 400GE and 800GE ports. With just two layers of switching, it enables a non-blocking cluster network supporting up to 18,000 cards, capable of training models with trillions of parameters. Additionally, reducing the number of network layers minimizes the need for extensive optical module interconnections, lowering both construction costs and network power consumption.
Furthermore, the Galaxy AI Network supports the innovative Network-Level Load Balancing (NSLB) solution, boosting AI network throughput from 50% to 98% and improving AI training efficiency by 20%, achieving true computing power overclocking. Combined with full-stack visual operation and maintenance technologies, it enables real-time visualization of large model training paths and flow loads. Integrated with Packet Event data-plane anomaly detection and DPFR (Data Plane Fast Recovery) technology, it achieves sub-millisecond fault convergence.
In terms of computing power transmission, the Galaxy AI Network boasts high throughput, elasticity, and concurrency. Leveraging Huawei's multi-path intelligent scheduling, flow-aware load balancing, and adaptive packet loss resistance technologies, it achieves "T-level data delivery within hours," improving forwarding capacity by eightfold. For computing power access, Huawei introduced the Galaxy Gateway Router, which utilizes Fillp technology to mitigate network degradation. It can increase bandwidth utilization from 10% to 80% even with a 1% packet loss rate, ensuring smooth AI computing power delivery even in remote areas with poor network quality.
"Currently, artificial intelligence is creating various 'miracles' across industries," said Wang Lei. For instance, in the pharmaceutical sector, developing an antibiotic typically requires an average of $1 billion and over a decade, with most time spent on drug screening and validation—a process likened to "finding a needle in a haystack." Huawei's Pangu Drug Molecule Model broke the 40-year stagnation in new antibiotic development. By analyzing 1.7 billion molecular chemical structures, it assisted Professor Liu Bing's team at Xi'an Jiaotong University in discovering the new antibiotic "Cinnamomumycin" in less than a month, now advancing to clinical trials.
During Huawei Connect 2023, Huawei, in collaboration with the China Academy of Information and Communications Technology and iFLYTEK Research Institute, released the Galaxy AI Network White Paper. It outlines the network's broad application prospects in large-scale AI parameter computing scenarios, showcasing its technological leadership in AI through trends, architecture, and innovations, while providing a reference for high-performance AI training networks.
"The launch of Huawei's Galaxy AI Network marks a significant step in Huawei's commitment to driving intelligent transformation across industries. In the future, it will deliver superior, smarter AI network services globally," Wang Lei added.