Dreaming about Driving
During day-dreams, humans are thought to simulate possible future experiences, by recombining elements from the past. This is known as imagination. Imagination is not an exact replay of past experiences, but a simulation of something which is new and feasible given our model of how the world works. In neuroscience, this hypothesis is known as constructive episodic simulation (Schacter & Addis, 2007) and is believed to play an important role for learning new skills. A key piece of machinery in this process is our internal prediction model of the world. From a young age, humans learn an internal simulator (prediction model) of the world from real data. Our brains then use this simulator to learn to perform all sorts of tasks, from catching a ball to intricately knitting a sweater.
What about driving a car? Have a look at this scene.
Based on this still image, how would you accelerate and brake in this moment? It’s almost impossible to say! Is the car in front speeding up? Slowing down? Let’s add some temporal information, here is a short sequence of driving up to the scene.
Now our prediction models really kick in. We observe the motion of the relevant other agents in the scene, predict where they are headed and make an informed driving decision. Notice we do not need to worry about the cars in the distance; they do not affect our short term decisions. What makes humans so good at driving in complicated urban scenes, is our ability to predict motion, focus on what is relevant and ignore what is not.
In this blog post, we show how to use these ideas of prediction and imagination to teach a car to autonomously drive.
We trained a car to drive in its dreams.
A reinforcement learning system which was trained entirely in imagination drives a car in the real world.
The car is being driven by a model-based deep reinforcement learning system. This algorithm learns a prediction model from real-world data which was collected offline. This allows the model to learn to drive using new scenarios which have been imagined by the prediction model. The training pipeline we followed was based on World Models (Ha & Schmidhuber, 2018), using monocular camera input on our autonomous vehicle.
We trained our system with data collected on sunny, bright days during this summer’s heat wave in Cambridge, UK. Here are examples of random driving episodes we collected.
Examples of the real-world data used to train the prediction model.
Our model had never seen rain before.
Nevertheless, the system worked just fine during the recent summer rain storms.
Despite being trained with only sunny weather data, the same system can successfully drive in the rain (note: we tripped a safety limit on the steering motor torque while crossing the cattle grate, requiring intervention).
By focusing on what is relevant to the driving decision making, our system is not distracted by the reflections produced by puddles, or the droplets of water on the camera lens. In fact, we believe focusing solely on what is relevant for driving is what will make our approach transferable to new situations unseen during training.
Here is an overview of how we trained our system. First we train the prediction model on collected data.
This is a sequence modelling task. We use a variational auto encoder to encode the images into a low dimensional state. We then train a probabilistic recurrent neural network which forms the prediction model - estimating the next probabilistic state given the current state and action. We train the encoder and prediction model on real-world data.
Next we initialise a driving policy, and assess how it would perform using the prediction model in simulated experiences.
We can train over many simulated sequences, by imagining experiences. During training, we can also visualise the imagined sequences to observe the learned policy.
Imagining an untrained control policy driving off the road.
Imagining a fully trained control policy driving around a bend.
The first animation shows an untrained policy imagining to drive. You can observe that the prediction model imagines that this untrained policy will veer off the road. In contrast, the second animation shows a fully trained policy which is able to successfully imagine and drive around a bend in the road.
We improve the policy by dreaming to drive - getting better and better over many imagined driving sequences.
Examples of the model dreaming about driving down a road during training.
Our algorithm dreams about driving, getting better with every dream.
Learning a data-driven simulator is both computationally and statistically more scalable than hand-engineering one. In our 'Learning to drive in a day' work, we iterated between exploration and optimisation using the limited on-board computation of a single robotic car. However, using a prediction model, we can dream to drive on a massively parallel server, independent of the robotic vehicle. Furthermore, traditional simulation approaches require people to hand-engineer individual situations to cover a wide variety of driving scenarios. Our approach of learning a prediction model from data automates the process of scenario generation, taking the human engineer out of the loop.
Typically there are deviations in appearance and behaviour between simulator solutions and the real world, which makes it challenging to directly leverage knowledge acquired in simulation. Our system does not have this limitation. Since we have trained directly on real-world data, there is close to no difference between the simulation and the real world.
Finally, since the learned simulator is differentiable, we can directly optimise a driving policy using gradient descent.
The framework outlined in this post requires powerful temporal prediction models. So far, computer vision research has only scratched the surface but has an exciting road ahead.
Here is an example of a model predicting how the appearance, semantics and geometry of a driving scene may evolve over 3 seconds.
Predicting appearance, semantic segmentation and depth of an urban road scene three seconds into the future.
Wayve is committed to developing richer and more robust temporal prediction models and believe this is key to building intelligent and safe autonomous vehicles.
Special thanks: We would like to thank StreetDrone for building us an awesome robotic vehicle, Admiral for insuring our vehicle trials and the Cambridge Polo Club for granting us access to their private land for our autonomous driving research.