NVIDIA GPUs Are Outclassed! World's Leading AI Chip Upgraded with 4 Trillion Transistors and 900,000 Cores

baoshi.rao

Cerebras Systems has released their third-generation wafer-scale AI accelerator chip, the WSE-3 (Wafer Scale Engine 3), with even more insane specifications. Remarkably, it doubles the performance while maintaining the same power consumption and price.

The first-generation WSE-1 in 2019 was based on TSMC's 16nm process, with an area of 46,225 square millimeters, 1.2 trillion transistors, 400,000 AI cores, 18GB of SRAM cache, support for 9PB/s memory bandwidth, 100Pb/s interconnect bandwidth, and a power consumption as high as 15 kilowatts. The second-generation WSE-2 in 2021 upgraded to TSMC's 7nm process while maintaining the same 46,225 square millimeter size. It featured 2.6 trillion transistors, 850,000 cores, 40GB of cache, 20PB/s memory bandwidth, and 220Pb/s interconnect bandwidth.

The current third-generation WSE-3 has now upgraded to TSMC's 5nm process. While the exact size isn't specified, it should remain similar since each chip requires an entire wafer, making significant size increases impractical.

The transistor count has remarkably increased to 4 trillion, with AI cores further expanded to 900,000. Cache capacity reaches 44GB, paired with external memory options of 1.5TB, 12TB, or 1200TB. At first glance, the increase in core count and cache capacity seems modest, but the performance has achieved a remarkable leap. The peak AI computing power reaches 125 PFlops, equivalent to 12.5 billion billion floating-point operations per second, comparable to top-tier supercomputers.

It can train next-generation AI models dozens of times larger than GPT-4 and Gemini. It can store 24 trillion parameters in a single logical memory space without partitioning or restructuring.

When used to train a 1-trillion-parameter model, its speed is equivalent to training a 1-billion-parameter model with GPUs. Four in parallel, it can complete the tuning of 70 billion parameters in one day, and supports up to 2048 interconnections, enabling the training of Llama's 70 billion parameters in a single day.

The specific power consumption and price of WSE-3 have not been disclosed, but based on the previous generation, it should be around over 2 million US dollars.