PRIX: Learning to Plan from Raw Pixels for End-to-End Autonomous Driving
Project Page | GitHub | Paper
PRIX (Plan from Raw Pixels) is an efficient end-to-end autonomous driving architecture that operates using only camera data. It eliminates the reliance on expensive LiDAR sensors and computationally intensive Bird's-Eye View (BEV) representations, making it a practical solution for real-world deployment on mass-market vehicles.
Model Description
The PRIX architecture leverages a visual feature extractor coupled with a generative planning head to predict safe trajectories directly from raw pixel inputs. Key components include:
- Context-aware Recalibration Transformer (CaRT): A novel module designed to effectively enhance multi-level visual features by modeling long-range dependencies across the spatial domain without explicit 3D geometry.
- Conditional Diffusion Planner: A planning head that treats trajectory prediction as a denoising process, using a vocabulary of trajectory anchors to refine noisy proposals into safe, feasible paths in just 2 steps.
Performance
PRIX achieves state-of-the-art performance on major benchmarks while maintaining high efficiency:
- NavSim-v1: Reaches 87.8 PDMS with a real-time inference speed of 57 FPS on consumer-grade hardware.
- nuScenes: Achieves competitive trajectory prediction results (0.57m L2 Avg and 0.07% Collision Rate).
Installation
The model is built on the NAVSIM framework. For setup and usage instructions, please refer to the official GitHub repository.
Citation
If you find this work useful, please cite the following paper:
@article{wozniak2026prix,
title={PRIX: Learning to Plan from Raw Pixels for End-to-End Autonomous Driving},
author={Wozniak, Maciej and Liu, Lianhang and Cai, Yixi and Jensfelt, Patric},
journal={IEEE Robotics and Automation Letters},
year={2026},
publisher={IEEE}
}