22 October 2020 | Engineering
Scaling machine learning from garage to fleet with Microsoft Azure
Wayve’s Fleet Learning Loop has evolved from scripts running on servers in a ‘house-office’ bedroom to petabyte-scale services running in the cloud. In this article we’re excited to share how we collaborate with Microsoft as we build a unique machine learning platform for autonomous driving.
See more in the Microsoft Transform blog, in this statement from Wayve and Microsoft or the live discussion at Microsoft Ignite

Image: Where it all began: outside Wayve’s house-office with our first autonomous vehicle.
In the beginning…
Our self-driving system is data-driven at every layer, using end-to-end deep learning, allowing for continuous optimisation without re-engineering. We are betting on a combination of training methods — Imitation Learning to get us started safely, and Reinforcement Learning for refinement of the policy — to develop artificial intelligence for autonomous driving.
Wayve started out in Cambridge with a single vehicle, a Renault Twizy we retrofitted for self-driving. In our first demo, we demonstrated learning to drive in a day, using Reinforcement Learning running on the car. We learned to lane follow in real-time, using only a handful of feedback examples from a safety-driver. This was exciting, as it was the first time we’d seen reinforcement learning learn with enough data-efficiency for real-world mobile robotics. However, it did not scale, as it relied on online data, with all machine learning training using an onboard computer. To scale to public roads, we needed more data.
We started data collection with our Twizy – each employee with a driving license taking turns to go out and do a random drive around Cambridge. We even had a leaderboard showing number of hours of data collected! Eventually we built up a dataset of around 40 hours of decent quality driving data, after filtering. We stored this data on a server in our office, with 12 HDDs in RAID.
We quickly learned that most driving data is pretty boring – driving straight down the road, or stopped in traffic. Iterating sequentially over epochs of our dataset didn’t work. The solution was to develop driving curriculum — targeted sampling of data most interesting to our model. This approach added a whole new set of hyperparameters to optimise, which impacted final performance significantly. Changing one of these hyperparameters would require a full rebuild of the dataset, a very time consuming process, adding a lot of friction to research.
To solve this, we abandoned the idea of epochs completely, instead relying on random access to our data at train time, over a fixed number of steps. We built our own SSD object cache to handle this: consisting of a bunch of commodity SSDs hosted in a central cache server, with some data loading logic that partitioned the data across the disks based on a hash of the key. This gave us the best of both worlds: flexibility for the researchers, with fast data access times at training.
This culminated in our demo of urban driving with conditional imitation learning, on the streets of Cambridge, which led to us securing our Series-A fundraise of $20M in 2019.
Scaling data collection to 5M images per second

With our Series-A funding secured, we relocated our company to London and built out extensive autonomy development facilities, with a dedicated vehicle ops team working full-time to support data collection and autonomous vehicle trials. Today, we collect three types of data:
- Expert policy driving data. Our team of professional chauffeur drivers collect over 12 petabytes of raw driving data per year.
- On-policy driving data. Our team of experienced autonomous safety drivers collect over 6 petabytes of raw driving data per year using our autonomous Jaguar I-PACE platform.
- Off-policy driving data. Beyond this, we also deploy our autonomous sensing platform on third-party external fleets, collecting 100’s of petabytes of driving data from external fleets. This is only economically possible because of our lean camera-first approach.
Wayve’s Fleet Learning Loop
This stream of data is the input to our Fleet Learning Loop: a continuous cycle of data collection, curation, training of models, re-simulation, and licensing models before deployment into the fleet. This loop is Wayve’s driving school – taking the brains developed by our researchers, then training and testing them at scale.

Real and synthetic data ingest
The first step is to get the data off the cars. In Cambridge, we plugged the car’s computer into our LAN and rsync’d (remote sync utility) the data off. This was slow though, and we quite often ran out of disk space on the car.
To scale this, we developed our own ingest station and installed accessible SSD caddies into the cars, allowing our ops team to quickly swap out disks after going on a driving/testing run. This would also enable ops in other cities away from our HQ. With a good supply of disks, running out of space wouldn’t be an issue.
For third-party external fleets, we ingest data using 4G connectivity and Microsoft Azure IoT services.
We also continually collect synthetic data from our ‘virtual fleet’ – several instances of our in-house simulator running on Kubernetes.
Data curation and analysis
With the scale of data increasing by several orders of magnitude, our approach to cleaning and enriching the data for training would need to scale too.
In Cambridge, we relied on pandas (a data analysis library) to do our data pre-processing, with caching to speed things up. As for labels, we had our own basic UI to apply manual labels to the data to enable conditional imitation learning based on route commands.
As we started early experiments with our I-PACE fleet, it became clear we needed to scale up. We adopted Apache Beam to do the heavy-lifting in our data pre-processing pipeline, and engaged with third party providers (HERE Maps and Scale AI) to automate our labelling.
The end goal of this stage is to understand the distribution and diversity of our data, and to extract insights from it to guide research. We do this by enriching it with meta-labels about the static and dynamic environment, as well as the intent of the driver at the time (the current ‘driving task’). We apply this enrichment to autonomous driving data and to expert demonstration data. This gives us a fine-grained understanding of our performance, and where we need to improve.
By linking our expert data to real failure cases, we surface gaps and biases in our expert data curriculum. If we find a common failure case, we look for similar examples in our data and upweight them in training. If we don’t have enough examples, we redirect data collection efforts, or send more data for human labelling if need be. This is how we continually refine our data curriculum to improve performance and address the long-tail distribution of driving edge-cases.
Re-simulation
It is extremely important to understand where our system fails and why. This allows us to improve its reliability and safety. We need to know what the system perceives and analyse its decision making processes by using ‘re-simulation’.
During re-simulation, the driving model is placed in a simulated environment. We present it with the inputs (in the form of sensor data collected in the real world or synthetically generated) to see what it decides to do. We can then inspect internal information to analyse why it made its decision. For example, we could look at its perception outputs to see whether it ‘saw’ a pedestrian or obstacle on the road.
Re-simulation also allows us to perform a type of regression test on future driving models that we develop. By passing new models through the same re-simulation process, we can compare how it would perform in the same conditions. The process will also allow us to produce a 3D synthetic twin of the environment and scenario which causes the failure. Having a synthetic environment allows for data collection of synthetic data and perform many variations of the regression test related to this scenario.
Production training
There are three stages of model development at Wayve:
- Feature development – fast iteration training and testing of individual features (e.g. traffic lights intelligence) on a smaller (dev) dataset;
- Product baseline feature integration – integrating all features into our end-to-end learned driving policy with multi-task learning, to form the latest-and-greatest model;
- Production training and continue learning of our autonomy product baseline.
Production training means training our product baseline with our full dataset over many epochs, and continually re-training as we collect new data through iterations of the fleet learning loop. This is where we spend most of our compute budget, so new models have to prove their worth through re-simulation and small-scale tests before we begin production training.
Currently, we are using collections of machines with up to 8x V100 GPUs and 612 GB of RAM for training. We distribute the data in a way that provides theoretical throughput of up to 400 Gbps. Some of the steps that we are taking towards that goal involve changing our infrastructure to be more elastic to fit our current and future needs better.
Driver licensing
In the early days, we were testing our driving models on controlled off-road circuits. When a new model was ready for testing, it was put through some basic tests before being uploaded into the Twizy. While this was fine for off-road tests, we realised that for on-road tests – interacting in a less controlled and more complex environment – safety would be a much bigger concern. We thought about how this is done for humans. We, humans, generally learn some rules, learn some theory and do some supervised practice. Finally when we’re ready for the road we do a series of specific tests to get a score (or series of scores) after which if we pass the threshold, we get a license. Could we not do this for our driving models? This is how the concept of Driver Licensing came about at Wayve.
Once our driving models come out of training, we pass them through a series of practical tests both using real and synthetic data to gauge how suitable the ‘new driver’ is to go on to the on-road testing phase. If it passes the test, the ‘new driver’ gets issued with a set of licenses on what type of on-road tests it is allowed to be taken on.
The driver licensing process also allows us to retrieve a swathe of metrics and other data related to how well this ‘new driver’ performs against other drivers in the past in various tasks. We can then show this in the form of a Leaderboard or other A-B comparison views. These metrics also allow us to track our progress in terms of the performance of the driving models we train.
Closing the loop with autonomous driving
Once we have passed through all stages of our fleet learning loop, we are ready to deploy our model for autonomous driving to collect new driving experiences and training data. Here are some demonstrations of our technology driving on London’s roads using end-to-end deep learning, trained by our Fleet Learning loop.
Here’s a video of our system autonomously driving through London traffic, roundabouts and past some roadworks.
This film illustrates our system autonomously driving through London traffic, roundabouts and past some roadworks.
This film illustrates our system driving through some residential streets in central London.
Teaming up with Microsoft
With the scale of our ambition to build our Fleet Learning platform to train on diverse and enormous driving data, it became immediately clear we need to move to the cloud.
We built a cost model comparing cloud vs on-premise for our storage and training (GPU-compute) workload, comparing the total cost of ownership over multiple years. We found that with the right architecture, cloud would actually be competitive with on-premise in terms of costs at data scales above 3 petabytes. More importantly, cloud would become more competitive the more data we have, with savings increasing at scale.
So we are going to cloud – but which cloud? It was clear across multiple facets such as cost, technology, and strategic alignment, that Microsoft Azure was the best choice for Wayve.
On cost, Azure was the clear winner, especially due to their option of locally redundant storage (LRS), combined with reserved storage. LRS is a great option for us, as a loss of some of our data would not significantly affect our business, given we are collecting new data continually.
We adopted two tiers of storage for our data: archive and hot. We keep the unfiltered, full-size image data for all cameras in archive storage, whereas hot storage only stores data relevant to our latest training curriculum. For training image (hot) storage, we need very high throughput to keep our GPUs fed. Latency isn’t so much a concern, thanks to the pipelined data loading that PyTorch provides us. This allows us to use Azure Blob Storage, greatly simplifying our infrastructure.
We were particularly impressed with Microsoft’s innovative approach to deploying the bleeding edge of compute technology into their cloud offering, from Graphcore’s IPU to NVIDIA’s A100 GPUs. We use a mix of reserved instances and spot (pre-emptible) instances to provide cost effective GPU compute for production training. The reserved instances allow us to meet our base load without fear of eviction, while the spot instances are a great solution for bursty workloads.
Additionally, Azure IoT services provide a lot of infrastructure to support our connected vehicle hardware we deploy across fleets.
Finally, we are inspired to collaborate with Microsoft because of:
- Their existing partnerships within the automotive industry, which we can look to use to commercialise our product,
- Their overall strategy of being a partner and enabler of deep tech, rather than competing directly in the space themselves (e.g. Google/Alphabet & Waymo, Amazon & Zoox),
- Their experience in machine learning at scale through services like Azure Machine Learning and partnerships with industry-leading companies.
Putting it all together
In our early beginnings, we showed we can learn an urban driving policy with a small amount of data. We are continuing this research approach to slingshot models into our Fleet Learning Loop: Wayve’s driving school, running on Microsoft Azure. This is where we train and evaluate models on petabyte-scale real and synthetic data.
It’s our Fleet Learning Loop that will take us from proving an approach is viable with demo-level performance to human-level, product-worthy autonomy.
We’re thrilled to work with Microsoft to scale this technology, read more in the Microsoft Transform blog or see our discussion at Microsoft Ignite here.