Revolutionizing the Road: How Video Generation and World Models are Transforming Autonomous Driving
Exploring the Interplay Between Video Generation and World Models in Autonomous Driving: A Survey
cs.AI06 Nov 2024
Ao Fu, Yi Zhou, Tao Zhou, Yi Yang, Bojun Gao, Qun Li, Guobin Wu, Ling Shao
Southeast University; Nanjing University of Science and Technology; DiDi Chuxing; University of Chinese Academy of Sciences
World models and video generation are key technologies in the field of autonomous driving, each playing a key role in improving the robustness and reliability of autonomous systems. World models that simulate the dynamics of real-world environments and video generation models that generate realistic video sequences are increasingly being integrated to improve situational awareness and decision-making capabilities. This paper explores the relationship between these two techniques, with a particular focus on how structural similarities in diffusion-based models contribute to more accurate and consistent simulations of driving scenarios. We highlight the lack of a universally accepted definition of world models by examining pioneering studies such as JEPA, Genie, and Sora, and demonstrating different approaches to world model design. These diverse interpretations highlight the field’s evolving understanding of world models to optimize for various autonomous driving tasks. Furthermore, this paper also discusses the main evaluation metrics used in this field. For example, the Chamfer distance for 3D scene reconstruction and the Fréchet Inception Distance (FID) for evaluating the quality of the generated video content. By analyzing the interaction between video generation and world models, this study identifies important challenges and future research directions and highlights the potential of these technologies to jointly improve the performance of autonomous driving systems. The results presented in this paper aim to provide a comprehensive understanding of how the integration of video generation and world models can foster innovation in the development of safer and more reliable autonomous vehicles. Masu.
