Alibaba Cloud Launches 8th-Gen Enterprise Instance g8i with 7x AI Inference Performance Boost, Supporting 72B Large Language Models

baoshi.rao

On January 11th, Alibaba Cloud announced the upgraded computing power of its 8th-gen enterprise general-purpose instance ECS g8i, which is the first in China to feature the codenamed Emerald Rapids fifth-generation Intel Xeon Scalable processors. Leveraging Alibaba Cloud's self-developed "Feitian+CIPU" architecture, the ECS g8i instance achieves up to an 85% improvement in overall machine performance and up to a 7x boost in AI inference performance, supporting large language models with up to 72B parameters. This helps reduce the initial construction costs of small to medium-sized models by 50%. Additionally, the new instance provides end-to-end security protection, offering robust privacy-enhanced computing support for enterprises building trusted AI applications.

Zhang Xiantao, General Manager of Alibaba Cloud's Elastic Computing Product Line, stated, "The strong performance of Alibaba Cloud's ECS g8i instance demonstrates that a CPU-centric computing system also has significant potential to accelerate AI inference. Public clouds are not only capable of handling ultra-large-scale AI models but also open new paths for accelerating the deployment of AI applications."

Li Yadong, General Manager of Intel China's Data Center and AI Group Xeon Customer Solutions Division, said: "The newly launched fifth-generation Intel Xeon Scalable processors feature built-in AI acceleration in every core, fully capable of handling demanding AI workloads. Compared to the previous generation, they deliver up to a 29% improvement in AI training performance and up to a 42% boost in AI inference capabilities. We hope to work with Alibaba Cloud's 8th-gen enterprise instance (ECS g8i) to help developers achieve technological inclusivity, making AI technology ubiquitous."

General Computing Power Further Enhanced, Overall Machine Performance Improved by 85% As an enterprise-grade general-purpose computing instance, the ECS g8i instance has achieved comprehensive upgrades in computing, storage, networking, and security capabilities. Key specifications include: L3 cache capacity increased to 320MB, memory speed reaching 5600MT/s, overall machine performance improved by 85%, and single-core performance boosted by 25%. For storage, ESSD cloud disks deliver 1 million IOPS with full NVMe support, achieving latency as low as 100 microseconds. In networking, it supports 30 million PPS and features Alibaba Cloud's self-developed eRDMA acceleration technology with latency as low as 8 microseconds. Security-wise, ECS g8i supports trusted computing and encrypted computing features, being the world's first to adopt confidential VM TDX technology for comprehensive protection.

In E2E scenarios, ECS g8i improves MySQL performance by up to 60%, with Redis and Nginx performance gains of 40% and 24% respectively. It provides robust computing power for industries including gaming, live streaming, e-commerce, finance, healthcare, and enterprise services, meeting stringent performance requirements for database, big data, and AI inference applications.

The instance also offers various hardware-native acceleration capabilities including QAT and IAA accelerators. Through proprietary technology, Alibaba Cloud enables fine-grained hardware acceleration transmission to VM instances, making acceleration available even for smaller ECS g8i configurations. With QAT encryption/decryption accelerators, ECS g8i achieves up to 70x performance improvement in compression/decompression scenarios and over 4x faster encryption/decryption performance.

Capable of supporting 72B-parameter large language models The technological revolution sparked by generative AI is driving fundamental changes in computing paradigms. Currently, AI model inference still faces multiple computing power challenges, such as first-packet latency being constrained by parallel processing and floating-point operation capabilities, while throughput performance is limited by memory bandwidth and network latency.

According to reports, Alibaba Cloud's ECS g8i instances have implemented significant optimizations for these challenges. These include upgrading the built-in instruction set from AVX512 to Intel AMX (Advanced Matrix Extensions) acceleration technology, enabling generative AI to run faster. Compared to the AVX512 instruction set, enabling AMX AI acceleration can improve int8 matrix computation performance by up to 7x in ECS g8i instances.

With AMX AI acceleration capabilities, g8i instances can respond more quickly to medium and small-scale parameter models. When running AI workloads such as knowledge retrieval, Q&A systems, and summarization generation, the initial construction costs are reduced by 50% compared to A10 GPU cloud servers. Additionally, when combined with Alibaba Cloud's Spot instances, the cost advantage becomes even more prominent, further reducing AI inference expenses.

Meanwhile, leveraging self-developed eRDMA ultra-low-latency elastic networking, Alibaba Cloud's g8i instance clusters offer ultra-low-latency networking and high elasticity advantages. They can easily support distributed inference for large language models with up to 72B parameters, achieving near-linear performance scaling with cluster size. These instances can also support AI model workloads with ultra-large parameter scales exceeding 32 batch sizes, running AI workloads such as text-to-image generation, AI code generation, virtual assistants, and creative assistance tools. Taking Alibaba Cloud's open-source Qwen-72B large model as an example, it can operate efficiently on clusters built with g8i instances and eRDMA networks. With input text under 500 characters, the first packet delay is less than 3 seconds, and it can generate 7 tokens per second.

Finally, in terms of security, Alibaba Cloud has established end-to-end security protection across its entire product line, ensuring the security of data storage, transmission, and computation throughout the entire process. At the lowest level, the CIPU-based security architecture incorporates a security chip TPM as the hardware root of trust, enabling trusted server boot and ensuring zero tampering. At the virtualization level, it supports virtual trusted capabilities (vTPM), providing verification for core components during instance startup. Building on trusted instances, it supports confidential computing across different platforms, achieving runtime memory data isolation and encryption protection.

It is worth mentioning that the upgraded ECS g8i instances fully support Trust Domain Extension (TDX) technology. Business applications can be deployed into TEE without any modifications, significantly lowering the technical barrier. With minimal performance overhead, it provides privacy-enhanced computing power for AI applications like large models, safeguarding their cloud data security.

Taking the Qwen-Chat-7B model as an example, enabling TDX ensures both the security and trustworthiness of model inference while protecting the confidentiality and integrity of the data. "Alibaba Cloud will continue to deepen technological research and sustain product innovation, providing enterprises with more stable, powerful, secure, and elastic computing services, driving AI applications across all industries into an era of full-scale explosion," said Zhang Xiantao.