11 December 2018  |  Research

Simulation training, real driving

Our autonomous car drove on real UK roads by learning to drive solely in simulation.

Example of Wayve's simulation training showing virtual vehicles in an urban setting

Our procedurally generated simulation environment.

Our algorithm trained in simulation, driving in the real-world

Leveraging simulation is a powerful approach to gaining experience in situations which are expensive, dangerous or rare in the real world. For example, medical surgeons of the future will go through rigorous training not just on real patients, but simulated ones. In simulation the cost of a mistake is negligible and provides a valuable lesson, whilst on a real patient, a mistake can be the difference between life and death. Note the subtle but important distinction between a trainee surgeon actually training in a simulated surgical environment versus only being tested in one through examination.

Similarly, would you not want an autonomous car to have extensively learnt to deal with rare edge cases in simulation? Most self driving car companies use simulation to validate their systems. We train our car to drive in simulation, and transfer the learned knowledge to the real world.

Our agent learnt to drive in simulation, with no real world demonstrations.

It then drove on never-seen-before real roads.

Please change your cookie settings to view embedded content, or view this video on YouTube.

While this is only a first step on relatively quiet roads with limited other road agents, we believe the results are remarkable. For example, the simulator world we built for training has cartoon-like rendering with randomly generated procedural content that does not match the road in which we tested in any meaningful way. Our testing environments includes both urban and rural driving with different road types, lighting and weather conditions — much more so than the variety exhibited in our simulated environment. Nonetheless, our learned knowledge was able to handle and generalise to the extra variety present in the real world.

Our end-to-end zero-shot framework combines image translation and behavioural cloning.

Instead of treating simulation and the real world as two distinct domains we have designed a framework for combining both where it is possible to train a steering policy in simulation while also exhibiting similar behaviour in the real world without ever seeing a real demonstration.

Diagram showing our model is composed of a pair of convolutional variational auto-encoder style networks originally used for image translation (Unsupervised Image-to-Image Translation Networks, UNIT)

For a detailed view of how we bridge both domains, our model is composed of a pair of convolutional variational auto-encoder style networks originally used for image translation (Unsupervised Image-to-Image Translation Networks, UNIT). This model translates images from one domain to another by first encoding the image, X, into a common latent space, Z, before decoding into the another domain. This part of the model is optimised to match the appearance of their respective domains with domain specific discriminators similar to cycle GANs along with a cycle consistent reconstruction losses.

Without any known alignment or correspondences between the two domains, the model is able to translate between them. Below is one such example that captures the main layout of the scene, despite not seeing an anything close to an exact replication of a Cambridge street. Note the translation of the car in front is inconsistent as for now our simulated environment do not contain other dynamic agents.

To facilitate driving, we augmented the UNIT framework with the auxiliary task of predicting the steering control, c, of a vehicle as it drives down a simulated road. Here, an additional decoder network is used for mapping the latent representation to the control vector and is supervised from simulated labels. Through joint training of the image translator and the controller, the model learns a control policy which is able to steer with images originating from the real domain without the need for explicit control labels from the real world. More details can be found in our paper.

Notice that the visual fidelity of the simulator is not of paramount importance when learning to drive. Our simulated world is cartoon-like and nowhere near as artistically detailed as some of the latest driving video games on the market — it does not need to be photorealistic. Content fidelity matters much more than photorealism. However, simulating other agent behaviour effectively is important and remains a big challenge.

Example of Wayve's simulation training showing virtual vehicles in an urban settingExample of Wayve's simulation training showing virtual vehicles in an urban setting

The real world is also significantly more diverse and complicated than what we are able to simulate. This is often referred to as the reality gap. In our Dreaming About Driving blog post, we showed that directly building a simulator from real-world data can help to bridge the reality gap. But therein lies another issue, a simulator built purely on collected driving data will lack the very edge cases we wish to overcome.

Going forward, we must leverage both real-world driving data and carefully designed edge-case simulation scenarios to build the best driving environment possible for training an autonomous agent.

 

Download full paper

 

With special thanks

We would like to thank StreetDrone for building us an awesome robotic vehicle and Admiral for insuring our vehicle trials.