Nvidia Rubin’s Network Doubles Bandwidth

Earlier this week, Nvidia ⁢surprise-announced their⁣ new Vera Rubin architecture (no relation to teh recently unveiled telescope) at the ⁢ Consumer Electronics ‍Show in Las Vegas. The new platform, set to reach customers later this year, is advertised to ⁣offer a ten-fold reduction in‍ inference costs and a four-fold reduction in how many GPUs it would take to train certain models, as compared to Nvidia’s Blackwell architecture. ⁣

The⁢ usual suspect for improved performance is the GPU. Indeed, the⁣ new ⁢Rubin GPU boasts 50 quadrillion floating-point operations‌ per second (petaFLOPS) of 4-bit computation, as compared to 10 petaflops on Blackwell, at least for ⁢transformer-based ⁢inference workloads like large language models.

However,focusing on just the GPU misses the bigger picture. There ‍are a⁢ total‍ of six new chips in the Vera-Rubin-based computers: the vera CPU,the Rubin GPU,and four distinct networking chips. To achieve performance advantages,‍ the components have to work in concert, says Gilad shainer, senior vice president‍ of networking at Nvidia.

“The same ‌unit connected in a⁣ different way will deliver a fully different level of performance,” Shainer says. “That’s why we call it extreme co-design.”

Expanded “in-network ⁢compute”

AI workloads, both training and inference, run on large numbers‌ of‌ GPUs together.‌ “Two years back, inferencing was ⁢mainly run on ⁣a single GPU, a single box, a single server,” Shainer says. “Right now, inferencing is ⁣becoming distributed, and it’s ‌not just in a rack.It’s ‌going⁤ to go across ⁤racks.”

To ‌accommodate these hugely distributed tasks,⁣ as many GPUs as possible need to⁣ effectively ‍work ⁣as one. This is‌ the aim of the so-called scale-up network: the connection of⁢ GPUs within ⁤a single rack.⁤ Nvidia handles this connection with their ⁣NVLink networking chip. The new line includes the NVLink6 switch, ⁢with double the bandwidth ⁤of⁢ the previous version (3,600 gigabytes per second for GPU-to-GPU connections, as compared to 1,800 GB/s for NVLink5 switch).

In addition to the‍ bandwidth doubling, the scale-up chips also include double the number of SerDes-serializer/deserializers (which allow data to be sent across fewer wires) and an expanded‌ number ‍of calculations ‌that can ⁢be done within the network.

“The scale-up network ⁤is not really‍ the network itself,” Shainer says. “It’s computing infrastructure, and‌ some ⁤of ⁤the computing operations ar

Nvidia Rubin’s Network Doubles Bandwidth

Expanded “in-network ⁢compute”

Share this:

Related