17 June 2023  |  Research

Introducing GAIA-1: A Cutting-Edge Generative AI Model for Autonomy

GAIA-1 is a new generative AI model for autonomy that creates realistic driving videos by leveraging video, text and action inputs. It offers fine-grained control over ego-vehicle behaviour and scene features, making it ideal for research, simulation, and training purposes. The launch of GAIA-1 provides endless possibilities for R&D and innovation in the field of autonomy.

Autonomous driving has been a long-standing goal for the automotive industry, with the potential to revolutionise transportation and improve road safety. However, developing autonomous systems that can navigate complex real-world scenarios remains a significant challenge. This is where GAIA-1 comes ina cutting-edge generative AI model designed specifically for autonomy. 

GAIA-1 is a research model that leverages video, text and action inputs to generate realistic driving videos and offers fine-grained control over ego-vehicle behaviour and scene features. Its ability to manifest the generative rules of the real world represents a significant step towards embodied AI, where artificial systems can comprehend and reproduce the rules and behaviours of the world. The introduction of GAIA-1 provides boundless possibilities for innovation in the field of autonomy, enabling enhanced and accelerated training of autonomous driving technology.

Overview of GAIA-1

GAIA-1 (Generative Artificial Intelligence for Autonomy) is a multi-modal approach that leverages video, text and action inputs to generate realistic driving videos. By training on Wayve’s vast corpus of real-world UK urban driving data, our model learns to predict the subsequent frames in a video sequence, resulting in an autoregressive (AR) prediction capability without needing any labels. This resembles the approach seen in large language models (LLMs).

GAIA-1 is not just a standard generative video model. It is a true world model that learns to understand and disentangle the important concepts of driving, including cars, trucks, buses, pedestrians, cyclists, road layouts, buildings, and traffic lights. What sets our generative AI model apart is its ability to provide fine-grained control over both ego-vehicle behaviour and other essential scene features. Whether altering the ego-vehicle’s behaviour or modifying the overall scene dynamics, our model offers unparalleled flexibility, making it an invaluable tool for accelerating the development of our foundation models for autonomous driving.

The true marvel of GAIA-1 lies in its ability to manifest the generative rules that underpin the world we inhabit. Through extensive training on a diverse range of driving data, our model synthesises the inherent structure and patterns of the real world, enabling it to generate remarkably realistic and diverse driving scenes. This achievement represents a significant step towards realising embodied AI, where artificial systems can not only interact with the world but also comprehend and reproduce its rules and behaviours.

The true marvel of GAIA-1 lies in its ability to manifest the generative rules that underpin the world we inhabit.

Please change your cookie settings to view embedded content, or view this video on YouTube.

Videos of four different driving scenarios generated by GAIA-1

What is a world model?

From an early age, humans and other animals develop models of the world through observation and interaction. These models, which are based on accumulated knowledge of the world, allow us to navigate effectively in unfamiliar situations.

World models are the basis for the ability to predict what might happen next, which is fundamentally important for autonomous driving. They can act as a learned simulator, or a mental “what if” thought experiment for model-based reinforcement learning (RL) or planning. By incorporating world models into our driving models, we can enable them to understand human decisions better and ultimately generalise to more real-world situations. That’s why we’ve been researching prediction and world models for over five years. GAIA-1 builds on our work in future prediction, dreaming about driving, predicting in a bird’s eye view and learning a world model

Emerging Capabilities of GAIA-1

To better understand the capabilities of this research model, it is helpful to see it in action.

Generation of long plausible futures

GAIA-1 can predict up to several minutes into the future based on a few seconds of video. When we generate driving scenes that far into the future, what happens in the first few seconds of the initial video prompt becomes less important to what it generates next. This shows that GAIA-1 understands the rules that underpin the world we inhabit. 

Below is a video demonstrating how GAIA-1 can render a long simulated driving scenario. What’s remarkable is that by the time we get to the last frame of the generated video, there’s nothing in the scene that was present in the first frame.

Please change your cookie settings to view embedded content, or view this video on YouTube.

This video generated by GAIA-1 shows a realistic driving sequence. Notice the realistic depiction of road features and other concepts like parked and moving cars, as well as driving behaviours that are representative of what one would expect from a human driver in the real world.

Generation of multiple plausible futures

When we prompt GAIA-1 with a few seconds of starting context, it can imagine several possible futures. This shows that based on the same video prompt, GAIA-1 understands that many different things can happen in the future.

 

Please change your cookie settings to view embedded content, or view this video on YouTube.

This video shows that GAIA-1 can be prompted with a couple of seconds of the same starting context. Then it can unroll multiple possible futures. For example, turning left at the intersection (top left frame) or continuing straight (top right frame). Or even continuing straight along a different imagined road (bottom left frame). It can also imagine a pedestrian crossing the road ahead of us (bottom right frame).

Control the model to generate specific driving scenes

Finally, we can condition GAIA-1 on our actions using video and/or natural language prompts to generate a plausible future. This method allows us to control the scene, as well as the ego vehicle in the simulation, with action conditioning.

In the videos below, you can see how we used natural language to prompt different futures.

Please change your cookie settings to view embedded content, or view this video on YouTube.

This video shows what GAIA-1 generated based on the prompt: “Going around a stopped bus.”

Please change your cookie settings to view embedded content, or view this video on YouTube.

This video shows that we can inject a natural language prompt after three seconds. See what GAIA-1 generated based on the prompt: “It’s night, and we have turned on our headlights.”

As a true world model, we can even get GAIA-1 to imagine what might happen in scenarios it has never been trained on. Here we show that the model can be forced to steer left or right against its better judgement. The ability of GAIA-1 to predict driving scenes, previously unseen from the scenarios it trained on, shows its ability to extrapolate beyond the data it was trained on, which could aid in the safety evaluation of AI driving models. Safety assurance partly relies on testing driving behaviour in response to corner and edge case scenarios. Since these scenarios are rare and too dangerous to test in the real world, simulated testing provides a safer alternative. 

Please change your cookie settings to view embedded content, or view this video on YouTube.

These videos show what happens when we force the model to steer left or right and deviate from its lane onto the sidewalk. GAIA-1 would never have seen these incorrect behaviours in the expert driving data set used to train it–indicating that it can extrapolate driving concepts previously unseen in the training data. This is helpful because it allows us to generate simulated data of incorrect driving, which we can use to evaluate our driving models.

Conclusion

GAIA-1 is a game-changing generative AI research model that offers new opportunities for research, simulation, and training in the field of autonomy. With its ability to generate realistic and diverse driving scenes, GAIA-1 provides a unique opportunity for enhanced training of autonomous systems, enabling them to navigate complex real-world scenarios more effectively. Additionally, our model can serve as a powerful R&D tool for scenario exploration and testing of autonomous technologies. 

We are excited to explore how GAIA-1 will advance the development of Wayve’s foundation models for autonomous driving. By providing our team with a powerful new tool, GAIA-1 can help accelerate the development of Wayve’s autonomous driving technology and ultimately improve its performance and safety. 

If you want to learn more about GAIA-1, stay tuned for more results and insights in the coming months.

Please change your cookie settings to view embedded content, or view this video on YouTube.

Video showing the diverse array of scenes generated by GAIA-1

Back to top