Using natural language to enhance the learning and explainability of Wayve’s foundation models

Natural language for self-driving

Understanding the reasoning behind our AI models’ decisions is crucial to ensuring the development of safe self-driving capabilities. Wayve’s LINGO models introduce new capabilities that significantly improve the interpretability of our AI Driver.

Trained using vision and language as inputs, LINGO-2 can ouput driving behavior and explain the reasoning behind its actions. This innovation introduces a new way to interpret, explain, and train AI models.

The above video is taken from a LINGO-2 drive through Central London. The same deep learning model generates driving behavior and textual predictions.

Vision-Language-Action Model

LINGO-2 uses images, driving data, and language to explain causal factors in the driving scene, accelerate training, and adapt driving behaviors to new environments.

Language can also be used for model introspection, where we can ask a driving model about its driving decisions. This can open up new possibilities for interacting with autonomous driving systems through dialogue, where passengers can ask the technology what it’s doing and why.

Language enhances the interpretation, explanation, and training of driving models

While LINGO-2 is driving, it can respond to questions about the scene and what it is doing. The example below is taken from a virtual test in our neural simulator, Ghost Gym.

LINGO’s capabilities

LINGO can enhance the trustworthiness of assisted and autonomous driving systems by providing insight into the AI’s scene understanding, reasoning, and decision-making. Additionally, integrating language can enable more efficient learning for handling new or long-tail scenarios by incorporating descriptions of driving actions and causal reasoning into the model’s training.

Driving commentary

LINGO-2 provides continuous commentary in natural language explaining the model’s driving actions, helping users understand its focus and behavior.

Adapting driving through language

We can prompt LINGO-2 with constrained navigation commands (e.g., “pull over,” “turn right,” etc.) and adapt the vehicle’s behavior to aid model training.

Visual question and answer (VQA)

LINGO-2 provides answers to questions about the scene and its driving behavior, showcasing its ability to understand the vehicle’s surroundings and navigate safely.

Referential segmentation

LINGO uses referential segmentation to visually “show and tell” its focus in the scene, strengthening the connection between language and vision tasks.

Referential segmentation

Watch LINGO reason about night-time driving hazards.

Read our research blogs


21 Dec 2023
LingoQA: Video Question Answering for Autonomous Driving
Download paper
13 Oct 2023
Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving
Download paper
Back to top