Unlocking⁢ the Power of Generative AI: NVIDIA’s Leading ‍Performance in MLPerf Inference

As enterprises⁣ rapidly adopt generative AI and bring new services ⁤to ‌market, the demand for robust data center infrastructure has never‍ been greater. While ⁢training giant language models is a significant challenge, providing LLM-based real-time services poses another ‍hurdle altogether.

In the latest MLPerf industry benchmark, Inference v4.1,⁣ NVIDIA platforms demonstrated unparalleled performance across all data center tests. The upcoming NVIDIA Blackwell platform,⁤ featuring⁤ the ⁤2nd generation Transformer Engine and FP4 Tensor Core, is set to revolutionize the landscape. On Rama2 70B, MLPerf’s largest LLM workload, the ‍NVIDIA H100 ‍Tensor Core GPU showed performance up to 4 times better than its predecessor.

The NVIDIA H200 Tensor Core GPU has achieved outstanding‌ results across ⁤all benchmarks in the ‌data⁢ center sector, including the Mixtral 8x7B MoE LLM, which boasts 12.9 billion parameters activated per‍ token, totaling 46.7⁣ billion parameters. MoE models are gaining popularity due to their ability to⁣ provide more diversity‌ to LLM deployments, answering a wider⁢ variety of questions ⁤and performing a broader range⁢ of tasks in a single deployment.

The continued growth of LLMs necessitates more compute⁣ to handle the large⁣ number of⁣ inference requests. Multi-GPU computing is essential to serve as⁤ many users as possible while meeting the lowest real-time latency requirements for delivering state-of-the-art LLMs. NVIDIA NVLink and NVSwitch, combined ‍with the NVIDIA Hopper Architecture, provide high-bandwidth communication between GPUs, offering significant benefits for real-time, cost-effective, large-scale ⁣model ⁢inference.

In addition to NVIDIA, ten NVIDIA partners‌ submitted MLPerf⁣ inference results, highlighting the broad availability of⁣ NVIDIA’s platforms. These partners include AsusTek, Cisco, Dell Technologies, Fujitsu,‌ Giga Computing, Hewlett Packard Enterprise (HPE), Juniper Networks,⁣ Lenovo, Quanta ⁣Cloud Technology, and Supermicro.

Continuous Software ⁤Innovation

NVIDIA platforms are continuously improving performance and features every month through ongoing software development. The NVIDIA Hopper architecture, NVIDIA Jetson platform, and NVIDIA Triton Inference Server have shown dramatic performance improvements‌ in the latest round of inference tests.

NVIDIA⁤ H200 GPUs⁢ delivered up to 27 percent higher AI inference performance than previous rounds, highlighting the added value customers can achieve over⁢ time from‍ their⁢ investments in⁤ the NVIDIA platform. The NVIDIA⁢ AI Enterprise Software, which ⁤includes the Triton Inference Server, is a fully-featured open source inference server that helps consolidate framework-specific⁢ inference servers into a single, unified platform.

Going to the⁣ Edge

Generative AI⁤ models deployed at the edge can transform⁣ sensor data like images and‍ video into actionable, real-time insights with powerful contextual awareness. The NVIDIA ⁣Jetson platform for edge⁢ AI and robotics has the unparalleled performance to run all types of models locally, including LLM, vision transformer, ⁤and stable diffusion.

In this MLPerf‍ benchmark, the NVIDIA Jetson AGX Music system-on-module delivers over 6.2x ‍throughput improvement and 2.4x latency improvement over the previous round on the GPT-J LLM workload. This ⁢general-purpose 6 billion-parameter model can seamlessly interface ‌with human language, revolutionizing generative AI at the edge.

Proven Performance Leadership Across All Sectors

This MLPerf Inference round demonstrates the versatility and leading performance of NVIDIA’s ‍platform to supercharge the most innovative AI-based applications and services across all benchmark⁢ workloads⁤ from the data center ⁣to the ⁣edge. ⁣The H200⁣ GPU-based system is the first cloud service provider to announce general availability, with server manufacturers ASUS, Dell Technologies, HPE, QCT,‌ and Supermicro also on board.

Revolutionizing AI: NVIDIA Blackwell Smashes Records in MLPerf Inference Test Debut, Redefining Generative AI Capabilities

Unlocking⁢ the Power of Generative AI: NVIDIA’s Leading ‍Performance in MLPerf Inference

Continuous Software ⁤Innovation

Going to the⁣ Edge

Proven Performance Leadership Across All Sectors

Related

Revolutionizing AI: NVIDIA Blackwell Smashes Records in MLPerf Inference Test Debut, Redefining Generative AI Capabilities

Unlocking⁢ the Power of Generative AI: NVIDIA’s Leading ‍Performance in MLPerf Inference

Continuous Software ⁤Innovation

Going to the⁣ Edge

Proven Performance Leadership Across All Sectors

Share this:

Related