Cerebras Systems' recent launch of its third-generation chip, the CS-3, marks a significant shift in the AI hardware landscape, challenging Nvidia’s long-standing dominance with its GPUs. The CS-3, leveraging Cerebras' flagship wafer-scale engine technology, introduces new possibilities for AI inference, a critical phase where AI models are applied in real-world scenarios.
Until now, Nvidia's GPUs have led the market, particularly in training large language models (LLMs). However, the upcoming shift from training to inference—where trained models are deployed for real-time applications—could disrupt this status quo. As AI applications increasingly rely on inference, the demand for speed and efficiency is escalating, with the AI inference market projected to hit $90.6 billion by 2030.
Inference involves evaluating new data with a trained AI model, such as during a conversation with an LLM or in autonomous driving. Historically, GPUs have been preferred for their parallel computing capabilities essential for training massive datasets. Yet, as inference workloads grow, the high power consumption, heat generation, and maintenance costs of GPUs become more burdensome.
Cerebras, founded in 2016, is revolutionizing AI inference hardware with its Wafer-Scale Engine (WSE). The recently launched CS-3 chip is a leap forward, featuring 4 trillion transistors and being 56 times larger than the biggest GPUs. This massive chip, which contains 3000 times more on-chip memory than GPUs, allows for unprecedented performance and efficiency by processing large workloads without extensive networking.
The CS-3 excels in handling LLMs, processing up to 1,800 tokens per second for the Llama 3.1 8B model. With a cost of just 10 cents per million tokens, Cerebras offers a competitive alternative to Nvidia’s GPUs. The chip’s impressive speed and efficiency are already gaining attention from industry leaders. Kim Branson of GlaxoSmithKline noted that the CS-3 has enhanced their drug discovery capabilities, while Denis Yarats from Perplexity highlighted its potential to transform search engines with its lower latency. Russell d’Sa of LiveKit praised its role in advancing multimodal AI applications.
Despite Cerebras’ advancements, Nvidia remains a formidable competitor. Nvidia’s Hopper GPUs are well-established in both AI training and inference, supported by extensive cloud provider partnerships and a robust ecosystem. However, the landscape is becoming more competitive with startups like Groq entering the scene. Groq’s Tensor Streaming Processor (TSP) technology offers impressive performance, energy efficiency, and competitive pricing.
Although Cerebras and Groq are newer entrants, their cloud computing solutions are making their advanced technologies accessible. Cerebras Cloud offers flexible pricing, while Groq Cloud allows users to easily switch from other providers. These offerings provide a cost-effective and flexible way for enterprises to explore cutting-edge AI inference technologies.
The AI hardware market is evolving rapidly, with Cerebras and Groq presenting strong alternatives to Nvidia’s GPUs. As the industry transitions from AI training to inference, enterprises must consider the performance, efficiency, and cost-effectiveness of these emerging technologies to stay ahead in this dynamic field.