Intel's Gaudi 2 Challenges Nvidia in AI Accelerator Performance

Intel's Gaudi 2 Challenges Nvidia in AI Accelerator Performance

Intel's Gaudi 2 technology has positioned itself as a robust contender against Nvidia in the realm of AI accelerators for training and inference. The findings suggest that Gaudi 2 exhibits impressive performance, especially in large language model (LLM) inference, presenting a significant challenge to Nvidia's dominant AI accelerators.

For LLM inference, Gaudi 2 showcased latency matching Nvidia H100 systems on decoding, surpassing the performance of the Nvidia A100. Moreover, the research highlighted that Gaudi 2's inference achieves superior memory bandwidth utilization compared to both H100 and A100.

While Nvidia still leads in training performance on its top-end accelerators, Gaudi 2 achieved the second-fastest single-node LLM training performance after Nvidia H100, with over 260 TFLOPS per chip. The study also reported that, based on public cloud pricing, Gaudi 2 offers the best dollar-per-performance for both training and inference when compared to Nvidia A100 and H100.

Intel's commitment to transparency is evident through its own testing results on Gaudi 2 using the MLcommons MLperf benchmark for both training and inference. The Databricks research serves as additional validation from an independent source, reinforcing Intel's standing in the AI accelerator space.

Abhinav Venigalla, lead NLP architect at Databricks, acknowledged the impressive performance of Gaudi 2, especially in LLM inference. He anticipates further gains in training and inference performance with Gaudi 2's FP8 support, available in the latest software release.

Eitan Medina, COO at Habana Labs, an Intel company, emphasized the importance of such third-party reviews. He stated, "Since many people say that the Gaudi is kind of Intel’s best-kept secret, it’s actually important to have these sorts of publication reviews being made available so more and more customers know that Gaudi is a viable alternative."

Intel's competitive strides with Gaudi technology began after acquiring AI chip startup Habana Labs in 2019 for $2 billion. The company has consistently improved Gaudi's technology, as reflected in benchmark performances.

While industry-standard benchmarks like MLPerf play a role in assessing performance, Medina highlighted that customers often rely on their own testing to ensure compatibility with specific models and use cases. He emphasized the importance of a mature software stack to address concerns about benchmark optimization.

Looking ahead, Intel is set to launch the Gaudi 3 AI accelerator technology in 2024. Gaudi 3, developed with a 5 nanometer process, promises a significant performance boost, with four times the processing power and double the network bandwidth compared to Gaudi 2. Medina expressed the belief that Gaudi 3 will offer advantages in performance per dollar and performance per watt.

Beyond Gaudi 3, Intel is working on future generations that integrate high-performance computing (HPC) and AI accelerator technology. The company also recognizes the continued value of its CPU technologies, recently announcing its 5th Gen Xeon processors with AI acceleration. Overall, Intel's strategy involves offering a range of solutions to cater to diverse AI workloads.