AI Chip Record: New Leader Beats NVIDIA
- Teh Cerebras WSE, a massive computer chip containing four billion transistors, has achieved record speeds in AI inference, surpassing NVIDIA in recent tests. The wafer-scale engine, measuring 8.5...
- Naor Penso, Cerebras chief facts security officer, revealed at Web Summit in Vancouver that the WSE chip reached 2,500 tokens per second on Llama 4.
- inference, in this context, refers to an AI's ability to generate sentences, images, or videos based on user input.
Cerebras WSE Chip Beats NVIDIA in AI Inference
Updated may 28, 2025
Teh Cerebras WSE, a massive computer chip containing four billion transistors, has achieved record speeds in AI inference, surpassing NVIDIA in recent tests. The wafer-scale engine, measuring 8.5 inches per side, is designed too accelerate artificial intelligence operations.
Naor Penso, Cerebras chief facts security officer, revealed at Web Summit in Vancouver that the WSE chip reached 2,500 tokens per second on Llama 4. This benchmark substantially exceeds NVIDIA’s reported 1,000 tokens per second.
inference, in this context, refers to an AI’s ability to generate sentences, images, or videos based on user input. Tokens are the essential units of information processed, such as words or characters. Faster token processing translates to quicker results.
According to Penso, speed is increasingly vital as AI enters an “agentic age,” where AI systems handle complex, multi-step projects. These AI agents break down large tasks into numerous sub-tasks, demanding rapid dialog and inference.
The WSE’s speed stems from its high transistor count and co-location of components, including 44 gigabytes of high-speed RAM, on a single chip. This design eliminates the need for off-chip data access, further boosting performance.
Artificial Analysis, an self-reliant agency, validated these claims, recording 2,522 tokens per second on Llama 4, compared to NVIDIA Blackwell’s 1,038 tokens per second.
“We’ve tested dozens of vendors,and Cerebras is the only inference solution that outperforms blackwell for Meta’s flagship model,” said Micah Hill-Smith,CEO of Artificial Analysis.
Julie shin, Cerebras chief marketing officer, emphasized that the WSE represents a significant advancement in chip technology, moving beyond traditional CPU and GPU architectures.
“This is not an incremental technology,” Shin said. “This is another leapfrog moment for chips.”
What’s next
Cerebras plans to continue refining the WSE chip to further enhance its AI inference capabilities, perhaps impacting various applications from enterprise solutions to AI agents.
