Intel publicly unveiled the Gaudi 3 accelerator designed for AI workloads. Since Nvidia’s popular H100 and H200 GPUs for AI and HPC are faster than the new CPUs, Intel is staking its Gaudi 3’s success on its cheaper pricing and reduced total cost of ownership (TCO).
The 64 tensor processor cores (TPCs, 256×256 MAC structure with FP32 accumulators), 8 matrix multiplication engines (MMEs, 256-bit wide vector processor), and 96MB of on-die SRAM cache with a 19.2 TB/s bandwidth are all packed onto two chipsets of Intel’s Gaudi 3 processor.
In addition, Gaudi 3 incorporates 14 media engines and 24 200 GbE networking ports, the latter of which can process H.265, H.264, JPEG, and VP9 to enable vision processing. Alongside the processor, there is 128GB of HBM2E memory spread across eight memory stacks, providing a whopping 3.67 TB/s of bandwidth.
Also Read: What is Intel Gaudi 3 AI Accelerator for AI Training and Inference?
Compared to Gaudi 2, which features 24 TPCs, two MMEs, and 96GB of HBM2E memory, Intel’s Gaudi 3 is a huge advance. But given that the Gaudi 3 processor only offers FP8 matrix operations in addition to BFloat16 matrix and vector operations (i.e., no more FP32, TF32, and FP16), it appears that Intel simplified both TPCs and MMEs.
In terms of performance, Intel claims that Gaudi 3 can provide up to 28.7 BF16 vector TFLOPS at about 600W TDP and up to 1856 BF16/FP8 matrix TFLOPS. Gaudi 3 offers two times poorer FP8 matrix performance (1,856 vs 3,958 TFLOPS), significantly worse BF16 vector performance (28.7 vs 1,979 TFLOPS), and somewhat lower BF16 matrix performance (1,856 vs 1,979 TFLOPS) when compared to Nvidia’s H100, at least on paper.
Also Read: Intel Introduces Gaudi3 AI Chip to Challenge Nvidia and AMD
The Gaudi 3’s actual performance in the real world will be more significant than its specifications. It must contend with the Instinct MI300-series from AMD and the H100 and B100/B200 processors from Nvidia. And as much depends on software and other elements, this is something that has to be observed. For the time being, Intel displayed some slides asserting that Gaudi 3 can provide a notable cost-performance advantage over Nvidia’s H100.
Intel stated earlier last year that an accelerator system built on eight Gaudi 3 processors on a baseboard will set you back $125,000, or around $15,625 per CPU. In comparison, the current pricing of an Nvidia H100 card is $30,678, indicating that Intel does intend to have a significant price edge over its rival. It is unclear, though, if the blue business will be able to hold onto its lead over its competitor given the potentially enormous performance benefits provided by Blackwell-based B100/B200 GPUs.
Also Read: Intel Announces New AI Chips to Compete with Nvidia and AMD
According to Justin Hotard, executive vice president of Intel and general manager of the Data Center and Artificial Intelligence Group, “the industry is asking for choice in hardware, software, and developer tools. Demand for AI is leading to a massive transformation in the data center.” “With our launch of Xeon 6 with P-cores and Gaudi 3 AI accelerators, Intel is enabling an open ecosystem that allows our customers to implement all of their workloads with greater performance, efficiency, and security.”
IBM Cloud and Intel Tiber Developer Cloud will offer Intel’s Gaudi 3 AI accelerators. Additionally, in the fourth quarter, systems based on Intel’s Xeon 6 and Gaudi 3 will be widely accessible from Dell, HPE, and Supermicro; systems from Supermicro and Dell will arrive in December, while workstations from Supermicro will ship in October.
Also Read: How AI-Enabled AMD Ryzen PRO Processors will Revolutionize Mobile and Desktop