21 December 2023  |  Research

Ghost Gym: A Neural Simulator for Autonomous Driving

Simulation is a powerful complement to real-world testing, and it plays a crucial role in Wayve’s workflow for building safe and reliable AI driving models. This blog introduces Wayve Ghost Gym, our closed-loop data-driven neural simulator. It uses state-of-the-art neural rendering to create photorealistic 4D worlds for generating thousands of simulated scenarios to train, test, and debug our end-to-end AI driving models.

In the image, the ego vehicle is shown in grey.

A data-driven approach to simulation

When testing autonomous vehicles (AVs) in complex, urban environments such as Central London, no two drives are the sameeven the same street presents fluctuating scenarios from one moment to the next, which is why developing AV technology and ensuring its safety demands a multifaceted approach to testing and validation.

Traditional on-road tests provide valuable insights but cannot provide controlled environments to A/B test our models and are operationally resource-intensive. Additionally, many “edge case” scenarios are too rare or unsafe to test on-road. Simulation helps solve these challenges, which makes it a powerful complement to real-world testing. With “off-road” evaluation, we can replay specific driving scenarios to debug and evaluate our models within a controlled, virtual environment. Furthermore, simulation supports large-scale synthetic data generation, allowing us to leverage thousands of diverse driving scenarios for comprehensive testing and training. 

At Wayve, we continuously improve and expand our simulation techniques to enhance realism, controllability, diversity, and scalability. True to our DNA, we are exploring learning-based approaches to establish controllable and photorealistic simulators. To overcome the challenge of building high-quality simulations that accurately represent how our AI models drive in the real world, we have evolved our previous procedural-based simulation tools, such as Wayve Infinity, and incorporated advances in neural rendering.

This blog discusses how we’re using Ghost Gym, our closed-loop data-driven neural simulator to efficiently test and validate our end-to-end AI driving models in a safe, controlled virtual environment.

In the video, the ego vehicle is shown in grey.

Introducing Ghost Gym

Ghost Gym is a closed-loop neural simulator built with data-driven techniques to model the environment and the actors within it. We achieve this by using learnt neural rendering techniques that model the geometry, motion, and appearance of the scene, as well as how our robot would interact with the given environment. 

Traditional open-loop replay does not propagate the vehicle’s simulated actions to modify the state of the simulated environment, thus limiting its usefulness for testing new behaviours. A closed-loop replay system, on the other hand, can simulate how the environment would change given the vehicle’s revised driving behaviour. In this way, Wayve’s Ghost Gym unlocks far more powerful debugging and testing capabilities. This is especially important for end-to-end (e2e) AI models that perform prediction and planning in a single neural network.

Since Wayve’s AV2.0 approach couples perception tightly with planning in an end-to-end AI model, our simulation must offer exceptionally high photorealism and accuracy.

Building Blocks of Ghost Gym

To build an effective closed-loop simulation, we must accurately model how our actions affect both the environment and the vehicle’s state. Our data-driven simulation approach allows us to reconstruct and render scenes with a high degree of photorealism and accuracy using a neural renderer. We also need a high-fidelity simulated robot and an accurate model of the vehicle dynamics to update the robot’s state based on the current actuation commands.

Neural Renderer

An explosion in research around Neural Radiance Fields (NeRF’s) and other similar neural rendering techniques has led to a paradigm shift in the field of synthetic data generation. Each time Wayve’s AVs drive in London, we learn from their captured sensor data a 4D representation of the world that we can render at different points in space and time. By learning this world representation from actual driving data, we capture scene properties with high accuracy and realism, narrowing the domain gap between the real world and our simulated one. 

In contrast, traditional simulation techniques require intense hand-crafted assets and rules, which are difficult to make visually and physically realistic. In the same way that ML-based approaches for driving are eclipsing rule-based methods, learned scene representations are proving to be exceptionally performant and scalable. 

While the blistering pace of research continues, we’ve begun deriving value in production from current approaches. We can already simulate a wide variety of scenarios with high accuracy and realism, allowing us to validate our technology across a wide spectrum of scenarios and accelerate our development cycle.

A High-Fidelity Simulated Robot

An underrated but equally crucial aspect to closed-loop simulation is modelling the internal systems of the robot. In a real-world scenario, various processes within a robot operate asynchronously at different frequencies—from model inference and control mechanisms to drive-by-wire systems—presenting a complex symphony that must be meticulously replicated in our simulation. Following our data-driven approach, our system collects the raw sensor data and logs from actual drives and  processes these asynchronous processes offline in order to simulate the robot as accurately as possible.

The above diagram shows a simplified view of this process with only four componentsour end-to-end AI driving model, a single camera, a controller, and the drive-by-wire system. 

In a live drive, as these sensors and components run, we log information about the timing of every event, enabling us to play these asynchronous processes back in a deterministic fashion. The modelling of these processes is essential to reproducing driving behaviour that most closely models what would happen in the real world, giving us high confidence in Ghost Gym’s simulations.

An Accurate Model of Vehicle Dynamics

We started by generating synthetic sensor data and then fed it to our high-fidelity robot emulator to produce the desired actions. The final piece of this closed-loop simulation requires us to carefully specify how the simulated ego-vehicle’s actions interact with the environment to update the simulation state.

An accurate vehicle dynamics model captures the intricate physics of the vehicle’s movement, such as its response to control inputs, external forces, and environmental factors. This level of detail is essential for closed-loop simulation, where the robot’s action feeds back as an input, ensuring that the feedback loop realistically reflects what would happen in the real world.

By aligning these 3 componentsa neural renderer, a high-fidelity simulated robot car, and an accurate model of the vehicle dynamicsGhost Gym becomes a robust virtual testing ground that allows us to test our AI driving models with precise scene representation and accurate driving behaviour.

AV2.0 Development Cycle with Ghost Gym

Ghost Gym powers multiple steps of our AV2.0 development process. By being able to reproduce model failures offline, we can dive deeper into the state of each individual component to get better visibility into the reason for the failure. We can also subsequently create “unit tests” to catch regressions in the model or robotics stack before they hit the road. To expand the scope of these tests, we can incorporate a wide variety of settings and extensively evaluate our model in a safe and controlled environment. 

Reproducing On-road Model Behaviour Offline

Intervention data is one of our most valuable data signals. It comes from segments of AV testing where our human safety operator intervenes to correct the car’s behaviour with a manual driving correction. This human feedback aligns our model on how to drive like a safe and competent driver, and we can re-simulate interventions in Ghost Gym to ensure that future models do not make the same error. 

Below is an example of an intervention from on-road testing. Here, the AI driving model is overly cautious at an empty zebra crossing and slows down when no pedestrians are present. We can see this by looking at the plot of the model’s speed through the run. The plot is green when the vehicle is driving autonomously and red when the safety operator takes control of the vehicle.

In Ghost Gym, we can reproduce this same failure mode by creating a 4D representation of the given intervention. In the video below, we run the same AI driving model within Ghost Gym and compare it to the logged vehicle speeds from the actual run. In addition to the vehicle’s actual speed plotted in green and red, we can see the simulated robot’s speed in blue when replaying this in Ghost Gym. We can see that the model exhibits this same slowing behaviour offline, with the blue plot tracking closely with the green one, validating the fidelity of our simulator. We can also see what the model would have done after the intervention, by allowing the simulation to continue beyond the point of the correction.

Creating AV2.0 Unit Tests for Model Development

To correct the model from unnecessarily slowing down at empty zebra crossings when no pedestrians are present, our model developers focused on addressing speed maintenance issues in future model iterations. We tested the next model iteration in Ghost Gym for better speed consistency in the same driving scenario.

In the unit test below, we can see the speed of the new model driving in Ghost Gym in blue and the logged speeds from the original model displayed in the green/red plot. The new model’s speed stays high throughout the segment, indicating that this model does not exhibit the same issues as the original model. This unit test in Ghost Gym quickly gave our researchers a concrete failure mode to work towards improving and can serve as a regression test for subsequent model releases.

Scaling Up Insights

Above, we have provided an example of improving our driving model behaviour in a single scenario. While this is a helpful first step for debugging and development, we are developing a wide variety of such scenarios against which to run our models. This allows us to measure model performance holistically. 

By evaluating our models on a large controlled test set of scenarios, we can quantify and compare model performance across different scenarios at a scale where we can achieve statistical confidence that far outpaces what we could do with on-road testing. This allows us to validate our driving models on large-scale data offline, thus having high confidence in their performance and safety.

Here are additional examples of re-simulated interventions. These examples, similar to the one mentioned above, demonstrate our ability to reproduce model behaviour offline in Ghost Gym, allowing the creation of further unit tests. For simplicity, we are only showing the reproducibility speed graphs. Above each example, we describe the reason for the original intervention as well as show the replayed model behaviour.

Example: Zebra Crossing

Real-world Behaviour (green/red line): The model slows down excessively resulting in a correction from the safety operator to increase the vehicle speed. 

Replayed Behaviour (blue line): We are able to reproduce the reason for the failure and can see the model would have continued to slow down if not for the correction.

Example: Right turn

Real-world Behaviour (green/red line): The model slows down excessively after performing a right turn, resulting in a correction from the safety operator to increase the vehicle speed. 

Replayed Behaviour (blue line): We are able to reproduce the reason for the failure and can see the model would have sped up slightly if not for the correction.

Example: Winding Road

Real-world Behaviour (Green/Red line): The model is going slightly too fast to keep inside the lane safely on this winding road, resulting in a correction from the safety operator to reduce the vehicle speed and keep the vehicle in its lane. 

Replayed Behaviour (Blue line): We are able to reproduce the reason for the failure and can see the model would have overshot the lane line slightly if not for the correction.

Future work

At Wayve, we continue to push the boundaries and iterate on cutting-edge techniques while productionising solutions like Ghost Gym that improve our off-road measurement capabilities today. Ghost Gym has already proven an impactful tool for developing high-fidelity re-simulations that can speed up and enhance our model development iteration cycle. However, there are still many open research questions in the field of neural rendering, such as rendering efficiency and light source estimation.

One main area of research we are pursuing is the modelling of dynamic elements, such as pedestrians, motorbikes, cyclists, and other vehicles on the road. This continues to be an area of fervent research, both within and outside of Wayve. Accurately modelling the high-level and reactive behaviours of these agents affects the closed-loop performance of Ghost Gym and remains an open research problem. We are exploring how to use our world modelling capabilities from GAIA-1 to model these reactive behaviours. Generative world modelling can also be a powerful tool to create different plausible future scenarios from past context.

We are also exploring other approaches, for example, learning from diverse data sources and large-scale datasets to form strong priors that enhance our simulation’s predictive power. Finally, as our neural rendering technology advances, we will use it to generate synthetic data to improve, generalise, and robustify our driving models.

Stay tuned for future blogs on neural rendering, where we will focus on the technical aspects and advancements in our approach, and on synthetic data, which will focus on our approach for generating out-of-distribution synthetic data to accelerate our development process.