23 May 2023  |  Leadership

2023’s AI Breakthroughs Apply to AVs too, accelerating Wayve’s Foundation Model for Embodied AI

Alex Kendall, co-founder and CEO of Wayve, shares his thoughts on how recent advances in artificial intelligence will open new possibilities for autonomous driving and other applications of embodied AI.

Alex Kendall Co-founder & CEO
Images generated by StableDiffusionXL showing a self-driving car surfing a wave through San Francisco, Amsterdam, New York, London, Paris and Tokyo.

Recent progress in artificial intelligence (AI) has taken the world by storm. Multiple breakthroughs in natural language understanding and generative AI have captured people’s imagination and created incredible commercial impact in only a few months. This has resulted in new products, such as ChatGPT, which rose to over 100M users in two months and GitHub CoPilot, which is accelerating software development by 220%

I’ve always believed that when AI becomes sufficiently robust, this technology will be more transformative than any that came before it. Seeing these breakthroughs gives me an even greater conviction that AI will be what unlocks autonomous vehicles (AVs) at scale. 

Great strides have been made, but there’s still more to come. Today’s AI narrative is focused on applications of large foundation models in software. But the frontiers of AI point to embodied intelligence—AI systems that can sense, act, learn and adapt to human behaviour in complex real-world environments. And that’s what we’re building at Wayve.

Our mission is to reimagine autonomous mobility through embodied intelligence. We’ve been pioneering deep learning for end-to-end autonomous driving since 2017, and we believe that embodied AI is the key to unlocking the full potential of AVs for everyone, everywhere. To get there, we are advancing the state of the art in embodied AI while building scalable and safe applications for these innovations with our commercial partners.

In this blog, I discuss recent breakthroughs in AI and their implications for self-driving. Let’s start by looking at what made these breakthroughs possible.

1. AI has shifted from task-specific to task-agnostic capabilities. Previous AI systems had to be trained on millions of examples of labelled data in a specific domain to solve a problem. For example, suppose you wanted to create an image-recognition system that could classify animal species. In that case, you would hand-label and train it on millions of images to solve this problem with supervised learning. Recent breakthroughs show that large foundation models can be trained on generic data with self-supervised learning to understand general-purpose concepts before showing just a few examples (or prompts) to solve the task you’re interested in. This is a seismic change that brings these AIs from simply being able to recognise patterns to now understanding net-new ideas from prompts, opening up an increased ability to adapt to new scenarios and improve the performance and safety of our technology.

2. Multimodality has become a key feature of AI. AI can now learn to understand how to align representations between different modes of data. For example, foundation models can connect text, images, audio, video and robotics. This is powerful because it allows AI to address multimodal applications by seamlessly translating between modalities. But perhaps more importantly, it unlocks the ability to transfer knowledge from a more accessible modality (like text) to a less accessible one (like action). This is beneficial for self-driving cars because it offers new ways to train AI models with data that is more readily available than real-world driving data, which can be expensive and difficult to gather in vast quantities.

LLMs, like this example from ChatGPT, display complex and intelligent reasoning about driving despite these AIs never having been behind the wheel. In this illustrative example, learning the connection between needing to brake or swerve for a ball that rolls out on the road in front of you to being extra cautious if needing to do that maneuver in a school zone to finally recognising that the risk of children present at midnight might be lower than during school hours, all may be difficult to emerge from video experience alone, but is clearly understood from language presenting an opportunity to improve the performance and safety of an autonomous vehicle.

3. AI exhibits emergent properties as it scales up, helping to address the alignment problem. As a neural network’s capacity (parameters), data and training increase, its performance improves. More importantly, as scale increases, performance improves because new behaviours emerge, which is often unexpected. This is an important result because it means scaling training provides a key strategy to overcome issues with performance and alignment of our AI models. Using human feedback, we can train our driving models to provide better aligned responses that achieve the intended objectives, which will ultimately lead to big improvements in performance and safety. Engineering for scale was one key ingredient which transformed performance of GPT-1 to GPT-4. Multi-billion parameter neural networks create the levels of performance and alignment that provide the impact of AI today. 

Interestingly, as large language models grow larger with more parameters, some in the AI community are advocating for leaner, more efficient systems and other techniques to unlock new capabilities. These optimisations can potentially allow smaller AI models to achieve similar capabilities and thus make AI more accessible. But in computer vision and robotics applications, we have yet to build models as large as today’s LLMs so there’s still plenty of headroom to grow performance with scale.

All of these AI breakthroughs apply to robotics and embodied AI. In fact, the process of fleet learning in our self-driving cars is directly analogous to reinforcement learning from human feedback (RLHF), which drove GPT-3 into ChatGPT.

While these problems may be solved similarly, there are some important differences. The autonomous driving problem is more challenging than language modelling, as it requires training on petabytes of video and other sensor data rather than gigabytes of text data and requires increased levels of safety for deployment on public roads. Despite this increased difficulty, an end-to-end machine learning approach is still the best way to solve this problem.

2023’s AI breakthroughs will unlock embodied AI, opening new possibilities for autonomous driving.

Now that we’ve explored what’s led to the recent breakthroughs in AI, let’s take a closer look at how these advancements are poised to unlock new possibilities for autonomous driving.

Generalisation: The emergent capabilities of foundation models can help overcome the long tail of driving scenarios and edge cases through general-purpose reasoning, making it possible for vehicles to be prompted to drive in any scenario without prior experience. This is because its foundational intelligence would allow it to reason about situations in a generalised way, unlocking the possibility of Level 5 autonomy.

Please change your cookie settings to view embedded content, or view this video on YouTube.

In this video, you can see the diversity of driving scenarios that our AI Driver is learning to generalise to, including busy cities around the UK and weather conditions from bright sun to snowy winters.

Performance: It’s no surprise that the vast knowledge of these foundation models will drive improvements in driving performance. We are exploring ways of accelerating our roadmap by using other domain-agnostic data sources for pre-training models, such as text data. This data isn’t a replacement for on-road testing or data used for safety validation, but it could be used to augment our training data corpus which includes a diverse mixture of data such as on-road expert driving data, fleet data (supplied by our fleet partners), and simulated and re-simulated off-road data. Discovering new ways to pre-train foundation models and learn robustness through other data sources can reduce our fleet data requirements and enable us to train models faster. 

Understanding & Reasoning: Generative AI allows us to use natural language and generative techniques to interrogate and understand our AI models. Our innovative research is helping us shed light on how our AI models understand the world and how they reason to drive through it safely. We’re pioneering ways for our AI to answer questions in natural language, render a video of what it expects to happen next, or even reason about counterfactual changes to the scene.

We are developing self-supervised world models that can imagine what happens next in a driving scene based on conditioning from a short video prompt (top frame). Here’s an example where we prompt the model to imagine what happens at a busy intersection after the light turns green. It generates an entirely new video clip where it imagines that the car drives forward.

Human-machine interaction: When we align robotics with natural language, this allows us to give instructions to the autonomous vehicle in a conversational manner. This opens up a whole host of possibilities to ‘backseat drive’ a car, personalise the driving experience or provide more flexibility in the service. This may also allow us to have our AV safety operators provide feedback in real-time and have this information fed back to our AI Driver as context tokens to improve. We can then use this feedback to help the model better align with human expectations to improve trust and safety.

Our AI has generated this caption to describe how it navigates this edge-case scenario in our simulator. Our AI explains the scenario and its own driving behaviour, including commenting on the severity and outcome. This presents a transformative opportunity to improve and understand our driving performance through language.

Remote assistance: Aligning our language representations with the AI Driver will allow us to send text prompts to the model, explaining specific behaviours in much greater detail than sending space and time coordinates or driving commands. The richness of this additional data could help increase the speed and accuracy of assisting AVs remotely.

As we consider the potential for embodied intelligence to transform the world of autonomous driving, we must ask ourselves: where will lasting value be built? 

The technology is impressive, but it’s only one piece of the puzzle. To harness the power of AI and create a sustainable future for autonomous driving, we need to think beyond the breakthroughs themselves and consider the broader ecosystem in which it operates.

Companies that successfully deploy Embodied AI in a specific vertical will build long-lasting value. This requires:

  1. Being first to develop the breakaway foundation model in the category. Our teams are focused on pioneering the breakaway foundation model for autonomous driving. 
  2. Creating a data flywheel that can deliver feedback at scale to align the embodied AI with human preferences. Our partnerships with some of the industry’s largest commercial fleets provide the scale of data required to deliver trust and safety. 
  3. Finally, you need a deployment platform that can be proven safe so that end users and customers can realise the benefits of embodied AI at scale. Our first-generation AV platform is already delivering value in our partners’ commercial last-mile grocery operations.

At Wayve, our mission is to reimagine autonomous mobility through embodied intelligence. We’ve been pioneering Fleet Learning technology to safely and robustly deploy AI driving models on the road since 2017making us well-positioned to take advantage of these latest AI breakthroughs. We believe that these advancements in AI will be what unlock autonomous driving at scale.

Are you interested in learning more about embodied AI and its potential applications? Join us at ICRA 2023 in London at the end of May for an exciting opportunity to explore this topic in more detail. We’ll also be presenting at CVPR 2023 in Vancouver in June.