NVIDIA Advances Real-World Robotics Through Sim-to-Real Breakthroughs

News Context

At a glance

Robotics is transitioning from controlled demonstrations and scripted automation toward generalizable, reliable embodied autonomy in real-world settings.
On May 28, 2026, NVIDIA Research detailed how simulation-to-real transfer—the process of training AI in a virtual environment before deploying it to physical hardware—is serving as the foundation...
The research spans the full development stack, addressing challenges such as multi-arm coordination, cross-body policy generalization, the grasping of novel objects in cluttered spaces, and the creation of...

Robotics is transitioning from controlled demonstrations and scripted automation toward generalizable, reliable embodied autonomy in real-world settings. This shift involves moving away from rigid programming toward systems that can adapt to dynamic and unpredictable environments.

On May 28, 2026, NVIDIA Research detailed how simulation-to-real transfer—the process of training AI in a virtual environment before deploying it to physical hardware—is serving as the foundation for this evolution. Eight of the company’s 28 papers accepted at the International Conference on Robotics and Automation (ICRA) illustrate how this approach allows robots to perceive, reason, and act with greater reliability outside of laboratory conditions.

The research spans the full development stack, addressing challenges such as multi-arm coordination, cross-body policy generalization, the grasping of novel objects in cluttered spaces, and the creation of vision-language-action (VLA) models that reason before executing movements.

Coordinating Motion and Navigation

Traditional robot scheduling software typically handles tasks sequentially, managing one arm at a time. NVIDIA’s ScheduleStream framework changes this by utilizing GPUs to allow multiple robotic arms to plan movements and operate in parallel.

When deployed on hardware such as the NVIDIA Jetson edge AI platform, ScheduleStream demonstrated a 3x speedup across multi-arm planning scenarios. The framework’s code has been made available on GitHub for developer use.

Navigation presents a different challenge, as software trained for one robot body often fails when transferred to a robot with a different shape or movement mechanism. The COMPASS policy framework addresses this by using imitation learning to build baseline navigation and residual reinforcement learning within NVIDIA Isaac Lab to create specialists for diverse robot embodiments.

Trained entirely in simulation without real-world robot data, COMPASS achieved a 4.5x improvement in average success rate compared to imitation learning baselines. In real-world trials involving humanoids and autonomous mobile robots, the system demonstrated approximately 80% success across 20 navigation attempts.

Adaptive Grasping and Manipulation

Most grasping systems follow a linear path: identifying an object, predicting a grasp, planning a path, and executing. However, precision often fails in the final few centimeters of movement.

Grasp-MPC solves this by adaptively computing grasps and continuously correcting motion as the robot closes in on an object. To develop this policy, researchers generated 2 million simulated trajectories across 8,000 objects using the GraspGen dataset and the cuRobo CUDA-accelerated library.

The resulting system achieved a 75% overall success rate on real robots when grasping novel objects in cluttered environments, compared to a 41% success rate for the baseline.

For more complex tasks, such as clearing tangled materials like tree branches from power lines, the Deformable Cluster Manipulation framework was developed. Rather than focusing on a single graspable point, the system uses the entire arm to wrap around and sweep aside clusters of flexible material.

The researchers used biological growth equations to create synthetic trees of various shapes and sizes for training in NVIDIA Isaac simulation frameworks. The resulting policy deploys to real branches using zero-shot transfer, with potential applications in cable management and agricultural inspection.

Precision Assembly and Sequential Tasks

Precise assembly, such as inserting a peg into a hole or threading a nut onto a bolt, is difficult to master in simulation because real-world surfaces and sensors often deviate from idealized models.

NVIDIA Research Unveils New Advances in Robotics and AI Motion Control | ICRA 2025

The SPARR method manages this by splitting the task into two layers. A policy trained in Isaac Lab establishes the general strategy, while a second layer on the physical hardware corrects for simulation discrepancies using the robot’s own camera, without requiring human guidance.

SPARR improved success rates by 38% and reduced cycle time by roughly 30% compared to zero-shot sim-to-real baselines. On National Institute of Standards and Technology (NIST) assembly tasks not seen during training, success rates improved by nearly 75%.

The Refinery framework focuses on multi-step assembly where the outcome of one step dictates the possibility of the next. By training across hundreds of simulated scenarios, Refinery learns to position components to facilitate subsequent steps, achieving 91% success in simulation and an 11% mean improvement over baselines in real-world results.

Vision-Language-Action Models

Visual clutter often confuses robot policies, as cameras capture irrelevant noise alongside target objects. The PEEK pipeline addresses this by using a vision language model to read task instructions and focus the robot’s line of vision on relevant objects while fading out distractors.

For policies trained purely in simulation, the addition of PEEK resulted in a 41x improvement in real-world accuracy. For larger VLA models, the gains ranged from 2x to 3.5x.

To ensure robots execute the actions they reason through, the SEAL method was developed in collaboration with researchers from Carnegie Mellon University, the University of Utah, and the University of Sydney. SEAL prevents the failure mode where a model correctly reasons through a task but executes a different action.

At runtime, SEAL generates several candidate action sequences and selects the one that most closely matches the intended outcome. This approach delivered up to 15% accuracy gains and increased robustness against shifted camera angles and scene clutter.

NVIDIA is supporting these advancements through large-scale open datasets, including the NVIDIA Physical AI Dataset and the NVIDIA Isaac GR00T X Embodiment Sim. These tools are being used by robotics teams at institutions including MIT, ETH Zurich, and the University of Texas at Austin to move research from simulation to physical systems.

Worth a look

NVIDIA Advances Real-World Robotics Through Sim-to-Real Breakthroughs

Coordinating Motion and Navigation

Adaptive Grasping and Manipulation

Precision Assembly and Sequential Tasks

Vision-Language-Action Models

Share this:

Related