News Context

At a glance

Teh ‍Cerebras WSE, a massive computer chip ‌containing⁢ four billion transistors, has achieved ⁢record ⁢speeds in AI inference, surpassing NVIDIA in recent⁤ tests. The wafer-scale engine, measuring 8.5...
Naor Penso, Cerebras chief facts security officer, revealed at Web Summit in Vancouver that the WSE ⁤chip reached 2,500 tokens per second on Llama 4.
inference, in this context,⁣ refers to⁤ an AI's ability to generate sentences, images, or videos based on user input.

Cerebras WSE Chip ⁣Beats NVIDIA in AI Inference

‍ Updated may 28, 2025

Teh ‍Cerebras WSE, a massive computer chip ‌containing⁢ four billion transistors, has achieved ⁢record ⁢speeds in AI inference, surpassing NVIDIA in recent⁤ tests. The wafer-scale engine, measuring 8.5 inches per side, is designed too ⁤accelerate artificial ‍intelligence operations.

Naor Penso, Cerebras chief facts security officer, revealed at Web Summit in Vancouver that the WSE ⁤chip reached 2,500 tokens per second on Llama 4. This benchmark substantially ‌exceeds NVIDIA’s reported 1,000 tokens per second.

Cerebras WSE chip for ‍AI, the world's largest chip, recently beat NVIDIA on llama 4 ‍inference tests. — The Cerebras WSE chip.

inference, in this context,⁣ refers to⁤ an AI’s ability to generate sentences, images, or videos based on user input. Tokens are the essential units of information processed, such as words or characters. Faster ‌token processing ⁤translates ‌to quicker results.

According‍ to⁣ Penso, ⁢speed is increasingly vital as AI enters⁣ an “agentic ‌age,” where AI systems handle complex, multi-step projects. These AI ⁣agents break down large tasks into numerous sub-tasks,‌ demanding rapid dialog and inference.

The WSE’s ⁢speed stems⁣ from its high ⁣transistor count and co-location of components, including 44 gigabytes of⁢ high-speed RAM, on a single chip. This design eliminates the need for off-chip data access, further boosting performance.

Artificial ⁤Analysis, an self-reliant agency, validated these claims, recording 2,522 tokens per second on Llama 4, compared to NVIDIA Blackwell’s 1,038‍ tokens per second.

“We’ve tested dozens of vendors,and Cerebras is the only inference solution ⁣that outperforms blackwell for Meta’s flagship model,” said Micah Hill-Smith,CEO of Artificial⁢ Analysis.

cerebras chief information security officer Naor Penso — Cerebras chief information security officer naor penso

Julie shin, Cerebras chief marketing officer, emphasized that the ⁣WSE represents a significant advancement in chip⁤ technology, moving ‌beyond ⁤traditional CPU and GPU architectures.

“This is not an⁢ incremental technology,” Shin said. “This is another leapfrog moment for ⁤chips.”

What’s next

Cerebras plans ‌to continue refining the WSE chip to further enhance its AI inference capabilities, perhaps impacting various applications from enterprise solutions to AI agents.

AI Chip Record: New Leader Beats NVIDIA

Cerebras WSE Chip ⁣Beats NVIDIA in AI Inference

What’s next

Related

AI Chip Record: New Leader Beats NVIDIA

Cerebras WSE Chip ⁣Beats​ NVIDIA in AI Inference

What’s next

Share this:

Related

Cerebras WSE Chip ⁣Beats NVIDIA in AI Inference