MObI / README.md
alexbuburuzan's picture
Upload README.md
cb5dc6b verified
---
license: cc-by-nc-4.0
tags:
- diffusion
- inpainting
- multimodal
- autonomous-driving
- nuscenes
---
# 🐳 MObI: Multimodal Object Inpainting Using Diffusion Models
Pretrained weights for **MObI**, a diffusion-based model for joint multimodal object inpainting across camera and lidar, conditioned on a single reference image and a 3D bounding box.
πŸ“„ **Paper:** [arXiv:2501.03173](https://arxiv.org/abs/2501.03173)
πŸ’» **Code:** [github.com/alexbuburuzan/MObI](https://github.com/alexbuburuzan/MObI)
**Venue:** CVPR Workshop on Data-Driven Autonomous Driving Simulation (DDADS), 2025
## Overview
MObI extends [Paint-by-Example](https://github.com/Fantasy-Studio/Paint-by-Example) to:
- Jointly inpaint **RGB camera, lidar depth, and lidar intensity**
- Insert objects from a **single reference image**
- Use **3D bounding box conditioning** for accurate spatial placement
This combines the realism of reference-based inpainting with the controllability of 3D-aware methods.
## Contents
| File | Description |
|------|-------------|
| `mobi_nuscenes_epoch28.ckpt` | MObI trained on nuScenes |
| `autoencoders/range_autoencoder.ckpt` | Range-view VAE for lidar |
## Results (nuScenes)
| Reference Type | FID ↓ | LPIPS ↓ | CLIP ↑ | D-LPIPS ↓ | I-LPIPS ↓ |
|---------------|-------|---------|--------|-----------|-----------|
| id-ref | 6.503 | 0.114 | 84.9 | 0.130 | 0.147 |
| track-ref | 6.703 | 0.115 | 83.5 | 0.129 | 0.149 |
| in-domain-ref | 8.947 | 0.127 | 77.5 | 0.132 | 0.154 |
| cross-domain-ref | 9.046 | 0.130 | 76.0 | 0.132 | 0.153 |
## Usage
See the [GitHub repository](https://github.com/alexbuburuzan/MObI) for installation, data preprocessing, inference, and training instructions.
```bash
git clone https://github.com/alexbuburuzan/MObI.git
cd MObI
bash scripts/download_models.sh
bash scripts/realism_test_bench.sh
```
## Citation
```bibtex
@InProceedings{Buburuzan_2025_CVPR,
author = {Buburuzan, Alexandru and Sharma, Anuj and Redford, John and Dokania, Puneet K. and Mueller, Romain},
title = {MObI: Multimodal Object Inpainting Using Diffusion Models},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
month = {June},
year = {2025},
pages = {1999-2009}
}
```
## License
Released under [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/). Note that this work builds on Paint-by-Example and BEVFusion, which have their own licenses.