17 June 2024  |  Research

Introducing PRISM-1: Photorealistic reconstruction in static and dynamic scenes

This blog introduces PRISM-1, a scene reconstruction model of 4D scenes (3D in space + time) from video data. PRISM-1 provides a flexible scene representation that expands the variety of scenarios we can simulate with Ghost Gym

Simulation plays a critical role in the development of autonomous driving. It provides a safe, repeatable, and cost-effective way to test and refine driving models. The challenge with real-world testing is that conditions can vary significantly between tests, even on the same stretch of road. Our R&D investments in state-of-the-art simulation technology enable controlled model evaluation and the exploration of “what-if” scenarios. 

In December 2023, we introduced Ghost Gym—a cutting-edge closed-loop neural simulator tailored for autonomous driving. Ghost Gym enables consistent testing conditions to evaluate and validate our driving models. This capability is critical for accelerating development cycles, allowing for the rapid iteration and testing of algorithms. It also supports the ability to augment the scene by changing the vehicle’s viewpoint, position, or speed, creating novel scenarios in various settings and conditions to evaluate driving models, which is particularly important for safety.

Simulation not only enhances evaluation but also improves our training data by introducing new scenarios that may be underrepresented, such as expanding into new geographies or driving in uncommon weather conditions. It also enables us to efficiently test the adaptability of our driving models to new vehicle platforms or camera setups we may be exploring. 

A key aspect of an effective simulator is the realism of the environment used in it.

Today, we are excited to showcase our new scene reconstruction approach, PRISM-1, which improves the realism of our simulation environments and powers the next generation of Ghost Gym scenarios.

Challenges with simulating dynamic urban scenes

Simulating urban environments for autonomous driving presents many challenges. Perhaps the hardest technical challenge, and the most important from a safety perspective, is accurately representing dynamic elements such as pedestrians, cyclists, and vehicles, which exhibit unpredictable behaviours and varied appearances. Pedestrians, in particular, have deformable and articulated forms, often wearing diverse clothing. Moreover, urban settings are marked by many dynamic conditions such as changing weather, groups of moving pedestrians, changing traffic lights, vehicle brake and indicator lights, and environmental factors like reflections and shadows. 

Traditionally, scene graphs are used to organise scene elements in hierarchical structures based on their spatial and semantic relationships, where each node represents an object with edges defining their interconnections. This method allows for modular manipulation of elements, allowing for specific adjustments without altering the overall structure. Despite their utility, scene graphs struggle with dynamic interactions and complex environmental conditions, such as moving groups and variable lighting, which can reveal their limitations in simulating scenarios where perfect separation of elements is impractical, especially in dense urban settings. These limitations, rooted in its modular approach, lead to error propagation and increased complexity with more intricate scenes. 

To overcome these challenges, we believe a shift towards more adaptive and flexible simulation is needed.

Building on our innovative end-to-end approach to autonomous driving, we are introducing a new approach to understand and represent complex environments. This approach greatly enhances simulations in diverse and dynamic environments, transforming how autonomous systems interact with and respond to complex urban landscapes.

Introducing PRISM-1

PRISM-1 represents a significant advance in 4D scene reconstruction.

It enables the resimulation of complex and dynamic scenes with minimal engineering effort, emphasizing generalisation and scalability. 

  1. Self-supervised scene disentanglement: PRISM-1 separates static and dynamic elements within scenes in a self-supervised manner. This eliminates the need for explicit labels or predefined models, making it easy to generalise to different camera setups without additional sensors or explicit 3D information.
  2. Flexible framework. PRISM-1 efficiently handles a diverse range of elements commonly found in urban environments. It can model moving elements like vehicles (including brake lights, indicator lights, and even windscreen wipers), cyclists, pedestrians, and variably appearing objects such as traffic lights under different lighting conditions. PRISM-1 also captures transient scene elements like roadside debris, wind-swept leaves, and the fluctuating light and reflections encountered in tunnels.
  3. Scalable representation. PRISM-1 remains efficient, even as scene complexity escalates. It minimises engineering effort but also curtails the propagation of errors that typically arise from conventional modelling techniques.

PRISM-1 provides a flexible representation that scales effectively across diverse scenarios we encounter daily on the road. Its capability to reconstruct detailed, high-fidelity scenes fulfils the rigorous demands of closed-loop simulation and model training in the dynamic world of autonomous driving. This enables quick iteration at scale, focusing on re-simulating driving scenes, while generative models like GAIA-1 are used to generate entirely new scenarios.

PRISM-1 achieves generalisation by incorporating inductive biases, integrating geometric elements (depth, surface normals, and optical flow) and semantic elements (such as semantic segmentation and features from a foundation model). This relies solely on image-level 2D self-supervision without explicit 3D labels, enabling PRISM-1 to generalise across arbitrary camera rigs without additional sensors. 

PRISM-1 implicitly infers scene flow and maintains geometric consistency, ensuring an accurate understanding and representation of changes within the scene. This enhances the simulation’s reliability and effectiveness, which is critical for handling diverse and complex scenes.

Novel view synthesis

This section showcases PRISM-1’s capabilities under various conditions, focusing on deviations from the observed camera path. Specifically, we reconstruct scenes from a sparse set of images captured by a vehicle driving through the environment without relying on additional sensors. This reconstruction from partially seen or unseen viewpoints is essential when building a simulator for self-driving applications. A primary use of our simulation is to test new models in closed-loop scenarios that show divergent behaviours or involve safety-critical situations, which necessitate rendering views that deviate from the path observed in the original sensor data. 

We usually encounter dynamic scenes featuring moving vehicles, pedestrians, and cyclists, each captured from only a single viewpoint. Reconstructing these scenes from different perspectives at various arbitrary moments poses a significant challenge. PRISM-1 excels in identifying and tracking changes in the appearance of scene elements over time. Unlike traditional object-centric simulations that treat vehicles as rigid, static entities, our approach acknowledges their dynamic nature, capturing crucial behaviours like indicator use and sudden braking—key signal cues for drivers and autonomous systems. PRISM-1 can accurately simulate traffic light shifts and brake light activations, essential for realistic driving simulations. Additionally, it efficiently handles transient lighting artefacts and reflections, demonstrating its adaptability to varying lighting conditions and visual changes in the environment.

In the following examples, we demonstrate PRISM-1’s capability to reconstruct a scene from various viewpoints by changing the camera path in two ways. When “freezing time” occurs, the ego-vehicle remains fixed in time while we pan the camera from left to right to view the scene from different angles. When the “freezing position” occurs, the ego-vehicle stays stationary in space, and we observe the surrounding world move back and forth in time. 

In the following examples, we reconstruct a highly dynamic scene featuring traffic lights, pedestrians, vehicles and attributes such as brake lights and windshield wipers. 

It is virtually impossible to fully define the configuration of a busy urban street crowded with cars, pedestrians, rubbish, foxes, and more. PRISM-1 effectively handles these complex and unique scenarios—what the industry calls the ‘long-tailwithout adding to the model’s complexity.

Reconstruction of complex driving scenarios with perception outputs

This section shows examples of scene reconstructions visualised alongside depth and 3D velocity magnitude rendered onto images. Note that these visualisations are reconstructions, not the real videos. As we are focusing on complex urban scenes, the following two examples highlight challenging aspects of the scene: traffic lights, cyclists, pedestrians, moving cars, different lighting conditions, and reflections.

Cyclists are particularly hard to reconstruct and we can see how we accurately reconstruct the cyclists in this example and visualise the depth.

In the following two examples, we are showing 3D reconstruction, depth and 3D velocity outputs blended.

Future work

The above scenarios showcase PRISM-1’s capability to model complex dynamics in diverse and arbitrary driving scenes. It is important to note that all videos are resimulations that accurately reflect the real dynamics of other agents. Resimulation is crucial for simulating autonomous driving, and we continue to push the boundaries of what is possible and scalable.  

Counterfactuals and probing our system in different ways are also very important aspects of our simulation capabilities. In this example, we reconstruct a scene with a pedestrian and then demonstrate that we can remove the pedestrian. 

Stay tuned for future blogs where we will continue to highlight technological advancements in our end-to-end simulation engine.

Are you interested in learning more?

Join us at CVPR 2024 in Seattle to learn more about PRISM-1 and other R&D initiatives. To discover more about what sets Science at Wayve apart, read this leadership blog from our Chief Scientist, Jamie Shotton, and check out our open roles.

Access WayveScenes101 Dataset

Our work in developing PRISM-1 also led to the development of the WayveScenes101 Dataset. This dataset focuses on complex dynamic scenes and enables iterative testing and continuous refinement of our method in 4D scene reconstruction.

Now publicly available, the WayveScenes101 Dataset comprises 101 scenes from diverse driving environments in the UK and US, including urban, suburban, and highway settings under various weather and lighting conditions. We aim to inspire further innovation in scene reconstruction and novel view synthesis, leading to the development of more precise and resilient reconstruction models for autonomous driving.

Back to top