alexbuburuzan
/

MObI

autonomous-driving

Model card Files Files and versions

alexbuburuzan commited on 1 day ago

Commit

cb5dc6b

·

verified ·

1 Parent(s): c1ccb3b

Upload README.md

Files changed (1) hide show

README.md +70 -3

README.md CHANGED Viewed

@@ -1,3 +1,70 @@
----
-license: cc-by-nc-4.0
----

+---
+license: cc-by-nc-4.0
+tags:
+- diffusion
+- inpainting
+- multimodal
+- autonomous-driving
+- nuscenes
+---
+# 🐳 MObI: Multimodal Object Inpainting Using Diffusion Models
+Pretrained weights for **MObI**, a diffusion-based model for joint multimodal object inpainting across camera and lidar, conditioned on a single reference image and a 3D bounding box.
+📄 **Paper:** [arXiv:2501.03173](https://arxiv.org/abs/2501.03173)
+💻 **Code:** [github.com/alexbuburuzan/MObI](https://github.com/alexbuburuzan/MObI)
+**Venue:** CVPR Workshop on Data-Driven Autonomous Driving Simulation (DDADS), 2025
+## Overview
+MObI extends [Paint-by-Example](https://github.com/Fantasy-Studio/Paint-by-Example) to:
+- Jointly inpaint **RGB camera, lidar depth, and lidar intensity**
+- Insert objects from a **single reference image**
+- Use **3D bounding box conditioning** for accurate spatial placement
+This combines the realism of reference-based inpainting with the controllability of 3D-aware methods.
+## Contents
+| File | Description |
+|------|-------------|
+| `mobi_nuscenes_epoch28.ckpt` | MObI trained on nuScenes |
+| `autoencoders/range_autoencoder.ckpt` | Range-view VAE for lidar |
+## Results (nuScenes)
+| Reference Type | FID ↓ | LPIPS ↓ | CLIP ↑ | D-LPIPS ↓ | I-LPIPS ↓ |
+|---------------|-------|---------|--------|-----------|-----------|
+| id-ref | 6.503 | 0.114 | 84.9 | 0.130 | 0.147 |
+| track-ref | 6.703 | 0.115 | 83.5 | 0.129 | 0.149 |
+| in-domain-ref | 8.947 | 0.127 | 77.5 | 0.132 | 0.154 |
+| cross-domain-ref | 9.046 | 0.130 | 76.0 | 0.132 | 0.153 |
+## Usage
+See the [GitHub repository](https://github.com/alexbuburuzan/MObI) for installation, data preprocessing, inference, and training instructions.
+```bash
+git clone https://github.com/alexbuburuzan/MObI.git
+cd MObI
+bash scripts/download_models.sh
+bash scripts/realism_test_bench.sh
+```
+## Citation
+```bibtex
+@InProceedings{Buburuzan_2025_CVPR,
+    author    = {Buburuzan, Alexandru and Sharma, Anuj and Redford, John and Dokania, Puneet K. and Mueller, Romain},
+    title     = {MObI: Multimodal Object Inpainting Using Diffusion Models},
+    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
+    month     = {June},
+    year      = {2025},
+    pages     = {1999-2009}
+}
+```
+## License
+Released under [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/). Note that this work builds on Paint-by-Example and BEVFusion, which have their own licenses.