Intel’s Tailored Gaudi 3 AI Chips for China May be Significantly Compromised in Performance

Intel, the global semiconductor giant, recently unveiled plans to launch two “special edition” AI chip products in China, according to a 24-page “Gaudi 3 AI Accelerator White Paper” published on its official website. The announcement, made on April 15, details the two hardware form-factor accelerator cards: the HL-328 OAM-compatible mezzanine card and the HL-388 PCIe accelerator card. The former is slated for release on June 24 this year, while the latter is expected to debut on September 24.

The performance of the “China Special Edition” HL-328 chip may be significantly compromised, with estimates suggesting a reduction of about 92% compared to the international version of Gaudi 3. These performance estimates are based on parameters including core count, operating frequency, and Thermal Design Power (TDP).

The Gaudi 3 AI accelerator chip, Intel’s next-generation offering, was launched at the Intel Vision 2024 conference held in the United States on April 9. Fabricated using TSMC’s 5nm process, the chip promises a fourfold increase in BF16 AI computing power. When compared to Nvidia’s H100 GPU, Intel’s Gaudi 3 AI chip boasts several improvements: a 40% increase in model training speed, a 50% boost in inference speed, a 50% uptick in average performance, and a 40% improvement in energy efficiency. These enhancements come at a cost significantly lower than that of the H100.

In response to the escalating export controls in the semiconductor and AI chip sectors by the U.S. government, Intel released a “China Special Edition” AI chip, Gaudi 2, based on a 7nm process in July 2023. The special edition chip offered comparable performance to the international version of Gaudi 2, but with a reduction in the number of integrated Ethernet RDMA ports from 24 to 21, aligning with U.S. chip export control regulations. However, sales of the Intel Gaudi 2 China Special Edition were limited, and the product was eventually discontinued in the Chinese market due to its performance exceeding export control measures.

Following this, Intel attempted to develop a variant of Gaudi 2, the Gaudi 2C AI chip, towards the end of last year. The company hoped to regain permission to sell to mainland China but was met with updated export restrictions announced by the U.S. Bureau of Industry and Security (BIS) in March this year.

With the launch of Gaudi 3 on April 10, Intel has made another attempt at developing special edition products. The aim is to leverage the 5nm Gaudi 3 AI chip to compete in the Chinese market, providing AI and cloud customers with an alternative to Nvidia products.

The Chinese special edition of Gaudi 3 shares the same 96MB SRAM on-chip memory, 128GB HBM2e high-bandwidth memory, and a bandwidth of 3.7TB/s as the original version. It also incorporates a PCIe 5.0 x16 interface and decoding standard. However, U.S. export control rules necessitate that the total computing performance (TPP) of such high-performance AI chips be below 4800 for export to China. This implies that the 16-bit performance of the Chinese special edition of Gaudi 3 cannot exceed 150 TFLOPS.

To meet U.S. export control requirements, a significant reduction in the number of cores and operating frequency may be required for the Chinese special edition of Gaudi 3. This could result in a reduction of AI performance by about 92% compared to the international version of Gaudi 3, which achieves 1835 TFLOPS (FP16/BF16). The reduction in AI performance also implies a significant decrease in TDP, with both the OAM card and the PCIe card expected to have a TDP of 450 watts, significantly lower than the original versions.

Overall, Intel’s “special edition” Gaudi 3 products for the Chinese market may offer AI performance comparable to Nvidia’s “China Special Edition” AI accelerator card H20. The latter has about 80% less overall performance than the H100, with a FP16/BF16 performance of 148 TFLOPS, slightly below the export control limit of 150 TFLOPS.

As of now, Nvidia’s H20 AI chip has been sent to customers in mainland China for sampling. However, leading domestic AI companies such as Baidu and Alibaba have shown a lukewarm response. Industry insiders have attributed this to the H20’s low performance and high price, which have dampened companies’ purchasing enthusiasm.

SEE ALSO: BYD’s First New Energy Pickup Truck to be Released Within the Year