| --- |
| license: cc-by-nc-4.0 |
| tags: |
| - diffusion |
| - inpainting |
| - multimodal |
| - autonomous-driving |
| - nuscenes |
| --- |
| |
| # π³ MObI: Multimodal Object Inpainting Using Diffusion Models |
|
|
| Pretrained weights for **MObI**, a diffusion-based model for joint multimodal object inpainting across camera and lidar, conditioned on a single reference image and a 3D bounding box. |
|
|
| π **Paper:** [arXiv:2501.03173](https://arxiv.org/abs/2501.03173) |
| π» **Code:** [github.com/alexbuburuzan/MObI](https://github.com/alexbuburuzan/MObI) |
| **Venue:** CVPR Workshop on Data-Driven Autonomous Driving Simulation (DDADS), 2025 |
|
|
| ## Overview |
|
|
| MObI extends [Paint-by-Example](https://github.com/Fantasy-Studio/Paint-by-Example) to: |
| - Jointly inpaint **RGB camera, lidar depth, and lidar intensity** |
| - Insert objects from a **single reference image** |
| - Use **3D bounding box conditioning** for accurate spatial placement |
|
|
| This combines the realism of reference-based inpainting with the controllability of 3D-aware methods. |
|
|
| ## Contents |
|
|
| | File | Description | |
| |------|-------------| |
| | `mobi_nuscenes_epoch28.ckpt` | MObI trained on nuScenes | |
| | `autoencoders/range_autoencoder.ckpt` | Range-view VAE for lidar | |
|
|
| ## Results (nuScenes) |
|
|
| | Reference Type | FID β | LPIPS β | CLIP β | D-LPIPS β | I-LPIPS β | |
| |---------------|-------|---------|--------|-----------|-----------| |
| | id-ref | 6.503 | 0.114 | 84.9 | 0.130 | 0.147 | |
| | track-ref | 6.703 | 0.115 | 83.5 | 0.129 | 0.149 | |
| | in-domain-ref | 8.947 | 0.127 | 77.5 | 0.132 | 0.154 | |
| | cross-domain-ref | 9.046 | 0.130 | 76.0 | 0.132 | 0.153 | |
|
|
| ## Usage |
|
|
| See the [GitHub repository](https://github.com/alexbuburuzan/MObI) for installation, data preprocessing, inference, and training instructions. |
|
|
| ```bash |
| git clone https://github.com/alexbuburuzan/MObI.git |
| cd MObI |
| bash scripts/download_models.sh |
| bash scripts/realism_test_bench.sh |
| ``` |
|
|
| ## Citation |
|
|
| ```bibtex |
| @InProceedings{Buburuzan_2025_CVPR, |
| author = {Buburuzan, Alexandru and Sharma, Anuj and Redford, John and Dokania, Puneet K. and Mueller, Romain}, |
| title = {MObI: Multimodal Object Inpainting Using Diffusion Models}, |
| booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, |
| month = {June}, |
| year = {2025}, |
| pages = {1999-2009} |
| } |
| ``` |
|
|
| ## License |
|
|
| Released under [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/). Note that this work builds on Paint-by-Example and BEVFusion, which have their own licenses. |