Explore generative AI for autonomy: video generation models as world simulators

Generative AI world model

Generative world models are reshaping deep learning in robotics by enabling AI to predict and interpret the dynamics of the real world. These models allow AI to form general representations, similar to human mental models, that improve its decision-making and understanding of what might happen next.

GAIA, the first generative world model for autonomy, excels in creating detailed simulations of the physical world. It generates minutes-long driving videos that closely adhere to specific prompts about scene features and the vehicle’s behavior. This leap in AI capability not only enhances the decision-making and safety of our AV technology but also addresses complex problems requiring the AI model to foresee and navigate nuanced real-world interactions.

As shown above, GAIA-1 can generate diverse and lengthy driving scenes entirely from imagination.

Multimodal Model

GAIA-1 utilizes video, text, and action inputs to produce realistic driving videos while providing precise control over ego-vehicle behavior and scene features. Its multimodal nature also enables GAIA-1 to generate videos from various prompt modalities and combinations.

GAIA-1’s true marvel is its ability to manifest the underlying rules of our world.

Prompt: Night-time driving on a snowy road in London.

GAIA’s capabilities

GAIA’s deep understanding of driving and language allows it to accurately interpret prompts and generate detailed driving videos. These videos encompass diverse traffic scenarios, specific types of motion, accurate depictions of time of day and weather conditions, and realistic interactions between vehicles and other road users.

This provides Wayve with a versatile and powerful synthetic tool to advance the training and validation of safer and more intelligent autonomous systems.

Predicting diverse futures

GAIA can generate multiple possible outcomes by utilizing past video context. For instance, when simulating driving through a roundabout, you can direct it to “go straight” or “turn right”.

Generating diverse traffic

GAIA can predict different traffic levels, including pedestrians, cyclists, motorcyclists, and oncoming traffic.

Interacting with other road users

GAIA can analyze the car’s interaction with other moving elements in the scene and produce multiple potential outcomes.

Controlling generation with prompts

By providing GAIA with text descriptions or video prompts depicting weather conditions, time of day, and illumination, we can control various aspects of the environment.

A compilation of unmodified driving scenes generated by GAIA-1.

Read our research blogs


29 Sep 2023
GAIA-1: A Generative World Model for Autonomous Driving
Download paper
Back to top