Upload README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,70 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: cc-by-nc-4.0
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: cc-by-nc-4.0
|
| 3 |
+
tags:
|
| 4 |
+
- diffusion
|
| 5 |
+
- inpainting
|
| 6 |
+
- multimodal
|
| 7 |
+
- autonomous-driving
|
| 8 |
+
- nuscenes
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
# 🐳 MObI: Multimodal Object Inpainting Using Diffusion Models
|
| 12 |
+
|
| 13 |
+
Pretrained weights for **MObI**, a diffusion-based model for joint multimodal object inpainting across camera and lidar, conditioned on a single reference image and a 3D bounding box.
|
| 14 |
+
|
| 15 |
+
📄 **Paper:** [arXiv:2501.03173](https://arxiv.org/abs/2501.03173)
|
| 16 |
+
💻 **Code:** [github.com/alexbuburuzan/MObI](https://github.com/alexbuburuzan/MObI)
|
| 17 |
+
**Venue:** CVPR Workshop on Data-Driven Autonomous Driving Simulation (DDADS), 2025
|
| 18 |
+
|
| 19 |
+
## Overview
|
| 20 |
+
|
| 21 |
+
MObI extends [Paint-by-Example](https://github.com/Fantasy-Studio/Paint-by-Example) to:
|
| 22 |
+
- Jointly inpaint **RGB camera, lidar depth, and lidar intensity**
|
| 23 |
+
- Insert objects from a **single reference image**
|
| 24 |
+
- Use **3D bounding box conditioning** for accurate spatial placement
|
| 25 |
+
|
| 26 |
+
This combines the realism of reference-based inpainting with the controllability of 3D-aware methods.
|
| 27 |
+
|
| 28 |
+
## Contents
|
| 29 |
+
|
| 30 |
+
| File | Description |
|
| 31 |
+
|------|-------------|
|
| 32 |
+
| `mobi_nuscenes_epoch28.ckpt` | MObI trained on nuScenes |
|
| 33 |
+
| `autoencoders/range_autoencoder.ckpt` | Range-view VAE for lidar |
|
| 34 |
+
|
| 35 |
+
## Results (nuScenes)
|
| 36 |
+
|
| 37 |
+
| Reference Type | FID ↓ | LPIPS ↓ | CLIP ↑ | D-LPIPS ↓ | I-LPIPS ↓ |
|
| 38 |
+
|---------------|-------|---------|--------|-----------|-----------|
|
| 39 |
+
| id-ref | 6.503 | 0.114 | 84.9 | 0.130 | 0.147 |
|
| 40 |
+
| track-ref | 6.703 | 0.115 | 83.5 | 0.129 | 0.149 |
|
| 41 |
+
| in-domain-ref | 8.947 | 0.127 | 77.5 | 0.132 | 0.154 |
|
| 42 |
+
| cross-domain-ref | 9.046 | 0.130 | 76.0 | 0.132 | 0.153 |
|
| 43 |
+
|
| 44 |
+
## Usage
|
| 45 |
+
|
| 46 |
+
See the [GitHub repository](https://github.com/alexbuburuzan/MObI) for installation, data preprocessing, inference, and training instructions.
|
| 47 |
+
|
| 48 |
+
```bash
|
| 49 |
+
git clone https://github.com/alexbuburuzan/MObI.git
|
| 50 |
+
cd MObI
|
| 51 |
+
bash scripts/download_models.sh
|
| 52 |
+
bash scripts/realism_test_bench.sh
|
| 53 |
+
```
|
| 54 |
+
|
| 55 |
+
## Citation
|
| 56 |
+
|
| 57 |
+
```bibtex
|
| 58 |
+
@InProceedings{Buburuzan_2025_CVPR,
|
| 59 |
+
author = {Buburuzan, Alexandru and Sharma, Anuj and Redford, John and Dokania, Puneet K. and Mueller, Romain},
|
| 60 |
+
title = {MObI: Multimodal Object Inpainting Using Diffusion Models},
|
| 61 |
+
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
|
| 62 |
+
month = {June},
|
| 63 |
+
year = {2025},
|
| 64 |
+
pages = {1999-2009}
|
| 65 |
+
}
|
| 66 |
+
```
|
| 67 |
+
|
| 68 |
+
## License
|
| 69 |
+
|
| 70 |
+
Released under [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/). Note that this work builds on Paint-by-Example and BEVFusion, which have their own licenses.
|