Intel today officially introduced its Gaudi 3 accelerator for AI workloads. The fresh processors are slower than Nvidia’s popular H100 and H200 GPUs for AI and HPC, so Intel is betting the success of its Gaudi 3 on a lower price and lower total cost of ownership (TCO).
Intel’s Gaudi 3 processor uses two chiplets that include 64 tensor processor cores (TPC, a 256×256 MAC structure with FP32 accumulators), eight matrix multiplication engines (MME, a 256-bit vector processor), and 96 MB of on-die SRAM cache with a throughput of 19.2 TB/s. Additionally, Gaudi 3 integrates 24 200 GbE network interfaces and 14 multimedia engines — the latter of which is capable of supporting H.265, H.264, JPEG, and VP9 to handle image processing. The processor is accompanied by 128 GB of HBM2E memory across eight memory stacks, offering a massive throughput of 3.67 TB/s.
Intel’s Gaudi 3 is a huge improvement over the Gaudi 2, which has 24 TPCs, two MMEs, and 96GB of HBM2E memory. However, Intel appears to have simplified both the TPC and MME, as the Gaudi 3 processor only supports FP8 matrix operations, as well as BFloat16 matrix and vector operations (i.e. it no longer supports FP32, TF32, and FP16).
In terms of performance, Intel claims that the Gaudi 3 can offer up to 1,856 TFLOPS of BF16/FP8 die performance, as well as up to 28.7 TFLOPS of BF16 vector performance at around 600W TDP. Compared to Nvidia’s H100, at least on paper, the Gaudi 3 offers slightly lower BF16 die performance (1,856 vs. 1,979 TFLOPS), half the FP8 die performance (1,856 vs. 3,958 TFLOPS), and significantly lower BF16 vector performance (28.7 vs. 1,979 TFLOPS).
More crucial than the raw specs will be the Gaudi 3’s actual real-world performance. It has to compete with AMD’s Instinct MI300 processors, as well as Nvidia’s H100 and B100/B200 processors. And that’s something that remains to be seen, as a lot depends on software and other factors. For now, Intel has shown a few slides, claiming that the Gaudi 3 could offer a significant price advantage over the Nvidia H100.
Earlier this year, Intel indicated that an accelerator kit based on eight Gaudi 3 processors on a core board would cost $125,000, meaning each one would cost about $15,625. For comparison, the Nvidia H100 card is currently available for $30,678so Intel is indeed planning to have a huge price advantage over its competitor. However, given the potentially huge performance advantages offered by the Blackwell-based B100/B200 GPUs, the question remains whether the blue company will be able to maintain its advantage over its rival.
“The demand for AI is driving a massive transformation in the data center, and the industry is demanding choice in hardware, software, and development tools,” said Justin Hotard, executive vice president and general manager, Data Center and Artificial Intelligence Group, Intel. “With the introduction of Xeon 6 P-core processors and Gaudi 3 AI accelerators, Intel is enabling an open ecosystem that allows our customers to deploy all their workloads with greater performance, efficiency, and security.”
Intel Gaudi 3 AI accelerators will be available on IBM Cloud and Intel Tiber Developer Cloud. Additionally, Intel Xeon 6 and Gaudi 3-based systems will be generally available from Dell, HPE, and Supermicro in Q4, with Dell and Supermicro systems shipping in October and Supermicro machines in December.