--- license: cc-by-nc-4.0 tags: - diffusion - inpainting - multimodal - autonomous-driving - nuscenes --- # 🐳 MObI: Multimodal Object Inpainting Using Diffusion Models Pretrained weights for **MObI**, a diffusion-based model for joint multimodal object inpainting across camera and lidar, conditioned on a single reference image and a 3D bounding box. 📄 **Paper:** [arXiv:2501.03173](https://arxiv.org/abs/2501.03173) 💻 **Code:** [github.com/alexbuburuzan/MObI](https://github.com/alexbuburuzan/MObI) **Venue:** CVPR Workshop on Data-Driven Autonomous Driving Simulation (DDADS), 2025 ## Overview MObI extends [Paint-by-Example](https://github.com/Fantasy-Studio/Paint-by-Example) to: - Jointly inpaint **RGB camera, lidar depth, and lidar intensity** - Insert objects from a **single reference image** - Use **3D bounding box conditioning** for accurate spatial placement This combines the realism of reference-based inpainting with the controllability of 3D-aware methods. ## Contents | File | Description | |------|-------------| | `mobi_nuscenes_epoch28.ckpt` | MObI trained on nuScenes | | `autoencoders/range_autoencoder.ckpt` | Range-view VAE for lidar | ## Results (nuScenes) | Reference Type | FID ↓ | LPIPS ↓ | CLIP ↑ | D-LPIPS ↓ | I-LPIPS ↓ | |---------------|-------|---------|--------|-----------|-----------| | id-ref | 6.503 | 0.114 | 84.9 | 0.130 | 0.147 | | track-ref | 6.703 | 0.115 | 83.5 | 0.129 | 0.149 | | in-domain-ref | 8.947 | 0.127 | 77.5 | 0.132 | 0.154 | | cross-domain-ref | 9.046 | 0.130 | 76.0 | 0.132 | 0.153 | ## Usage See the [GitHub repository](https://github.com/alexbuburuzan/MObI) for installation, data preprocessing, inference, and training instructions. ```bash git clone https://github.com/alexbuburuzan/MObI.git cd MObI bash scripts/download_models.sh bash scripts/realism_test_bench.sh ``` ## Citation ```bibtex @InProceedings{Buburuzan_2025_CVPR, author = {Buburuzan, Alexandru and Sharma, Anuj and Redford, John and Dokania, Puneet K. and Mueller, Romain}, title = {MObI: Multimodal Object Inpainting Using Diffusion Models}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2025}, pages = {1999-2009} } ``` ## License Released under [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/). Note that this work builds on Paint-by-Example and BEVFusion, which have their own licenses.