Revolutionizing AI: NVIDIA Blackwell Smashes Records in MLPerf Inference Test Debut, Redefining Generative AI Capabilities
Unlocking the Power of Generative AI: NVIDIA’s Leading Performance in MLPerf Inference
As enterprises rapidly adopt generative AI and bring new services to market, the demand for robust data center infrastructure has never been greater. While training giant language models is a significant challenge, providing LLM-based real-time services poses another hurdle altogether.
In the latest MLPerf industry benchmark, Inference v4.1, NVIDIA platforms demonstrated unparalleled performance across all data center tests. The upcoming NVIDIA Blackwell platform, featuring the 2nd generation Transformer Engine and FP4 Tensor Core, is set to revolutionize the landscape. On Rama2 70B, MLPerf’s largest LLM workload, the NVIDIA H100 Tensor Core GPU showed performance up to 4 times better than its predecessor.
The NVIDIA H200 Tensor Core GPU has achieved outstanding results across all benchmarks in the data center sector, including the Mixtral 8x7B MoE LLM, which boasts 12.9 billion parameters activated per token, totaling 46.7 billion parameters. MoE models are gaining popularity due to their ability to provide more diversity to LLM deployments, answering a wider variety of questions and performing a broader range of tasks in a single deployment.
The continued growth of LLMs necessitates more compute to handle the large number of inference requests. Multi-GPU computing is essential to serve as many users as possible while meeting the lowest real-time latency requirements for delivering state-of-the-art LLMs. NVIDIA NVLink and NVSwitch, combined with the NVIDIA Hopper Architecture, provide high-bandwidth communication between GPUs, offering significant benefits for real-time, cost-effective, large-scale model inference.
In addition to NVIDIA, ten NVIDIA partners submitted MLPerf inference results, highlighting the broad availability of NVIDIA’s platforms. These partners include AsusTek, Cisco, Dell Technologies, Fujitsu, Giga Computing, Hewlett Packard Enterprise (HPE), Juniper Networks, Lenovo, Quanta Cloud Technology, and Supermicro.
Continuous Software Innovation
NVIDIA platforms are continuously improving performance and features every month through ongoing software development. The NVIDIA Hopper architecture, NVIDIA Jetson platform, and NVIDIA Triton Inference Server have shown dramatic performance improvements in the latest round of inference tests.
NVIDIA H200 GPUs delivered up to 27 percent higher AI inference performance than previous rounds, highlighting the added value customers can achieve over time from their investments in the NVIDIA platform. The NVIDIA AI Enterprise Software, which includes the Triton Inference Server, is a fully-featured open source inference server that helps consolidate framework-specific inference servers into a single, unified platform.
Going to the Edge
Generative AI models deployed at the edge can transform sensor data like images and video into actionable, real-time insights with powerful contextual awareness. The NVIDIA Jetson platform for edge AI and robotics has the unparalleled performance to run all types of models locally, including LLM, vision transformer, and stable diffusion.
In this MLPerf benchmark, the NVIDIA Jetson AGX Music system-on-module delivers over 6.2x throughput improvement and 2.4x latency improvement over the previous round on the GPT-J LLM workload. This general-purpose 6 billion-parameter model can seamlessly interface with human language, revolutionizing generative AI at the edge.
Proven Performance Leadership Across All Sectors
This MLPerf Inference round demonstrates the versatility and leading performance of NVIDIA’s platform to supercharge the most innovative AI-based applications and services across all benchmark workloads from the data center to the edge. The H200 GPU-based system is the first cloud service provider to announce general availability, with server manufacturers ASUS, Dell Technologies, HPE, QCT, and Supermicro also on board.
