Nvidia Blackwell MLPerf: Training Benchmark Winner
- Nvidia's GPUs have once again demonstrated their dominance in the latest MLPerf benchmarks,securing top positions across various machine learning tasks.
- The MLPerf training benchmarks, overseen by the MLCommons consortium, evaluate AI performance across six industry-relevant tasks, including content proposal, LLM pretraining and fine-tuning, object detection, image generation, and...
- This round featured an updated LLM pretraining task using Meta's Llama 3.1 403B, a model more than twice the size of GPT3.
Nvidia’s Blackwell GPUs seize the lead in the latest MLPerf training benchmarks, dominating large language model (LLM) pretraining. This showcases Nvidia’s continued prowess in AI, with remarkable results across six critical machine learning tasks, including image generation and object detection.AMD’s Instinct MI325X GPU demonstrated strong performance,matching Nvidia’s H200 in LLM fine-tuning,signaling significant advancements in the competitive AI landscape.Efficient networking stands out as a critical factor, as Nvidia connects 512 B200s, highlighting the need to reduce overhead for optimal training. News Directory 3 is following the developments closely. Data also showed the importance of power efficiency. What innovations will the next benchmark reveal?
Nvidia GPUs Dominate MLPerf Training,AMD Shows Strong AI Performance
Nvidia’s GPUs have once again demonstrated their dominance in the latest MLPerf benchmarks,securing top positions across various machine learning tasks. The results highlight Nvidia’s continued leadership in AI training, particularly in the demanding area of large language model (LLM) pretraining.
The MLPerf training benchmarks, overseen by the MLCommons consortium, evaluate AI performance across six industry-relevant tasks, including content proposal, LLM pretraining and fine-tuning, object detection, image generation, and graph node classification. these benchmarks aim to provide a standardized and transparent measure of AI capabilities.
This round featured an updated LLM pretraining task using Meta’s Llama 3.1 403B, a model more than twice the size of GPT3. This larger benchmark reflects the industry’s trend toward increasingly large and complex models.
While Nvidia’s Blackwell GPUs achieved the fastest training times across all six benchmarks, AMD’s latest Instinct MI325X GPU showed competitive performance in LLM fine-tuning. The MI325X matched the performance of Nvidia’s H200 GPUs, demonstrating a 30% advancement over its predecessor, the Instinct MI300X.
dave Salvator,director of accelerated computing products at Nvidia,noted the significance of this being the first large-scale deployment of Blackwell GPUs,suggesting further performance improvements are likely.
Google also participated, submitting its Trillium TPU for the image-generation task.

Efficient networking between GPUs becomes increasingly important as the scale of AI training increases. Nvidia’s system, connecting 512 B200s, highlights the critical role of minimizing dialog overhead to maximize training efficiency.
For the LLM pretraining benchmark, the performance scaling with more GPUs was notably close to linear, achieving 90% of the ideal performance. salvator attributes this to the NVL72, wich connects 36 Grace CPUs and 72 Blackwell GPUs with NVLink.
Kenneth Leach,principal AI and machine learning engineer at Hewlett Packard Enterprise,pointed to improvements in GPUs and networking as reasons for the reduction in the size of the largest submissions compared to previous rounds.
Notably, only Lenovo included a power measurement in its submission for this round. The energy it took to fine-tune an LLM on two Blackwell GPUs was 6.11 gigajoules, or 1,698 kilowatt-hours.
What’s next
Future MLPerf benchmarks are anticipated to further emphasize power efficiency in AI training,encouraging more companies to submit power measurements alongside performance data. The ongoing advancements in GPU technology and networking will likely continue to drive improvements in AI training speed and efficiency.
