---
license: cc-by-nc-4.0
tags:
- diffusion
- inpainting
- multimodal
- autonomous-driving
- nuscenes
---

# 🐳 MObI: Multimodal Object Inpainting Using Diffusion Models

Pretrained weights for **MObI**, a diffusion-based model for joint multimodal object inpainting across camera and lidar, conditioned on a single reference image and a 3D bounding box.

📄 **Paper:** [arXiv:2501.03173](https://arxiv.org/abs/2501.03173)
💻 **Code:** [github.com/alexbuburuzan/MObI](https://github.com/alexbuburuzan/MObI)
**Venue:** CVPR Workshop on Data-Driven Autonomous Driving Simulation (DDADS), 2025

## Overview

MObI extends [Paint-by-Example](https://github.com/Fantasy-Studio/Paint-by-Example) to:
- Jointly inpaint **RGB camera, lidar depth, and lidar intensity**
- Insert objects from a **single reference image**
- Use **3D bounding box conditioning** for accurate spatial placement

This combines the realism of reference-based inpainting with the controllability of 3D-aware methods.

## Contents

| File | Description |
|------|-------------|
| `mobi_nuscenes_epoch28.ckpt` | MObI trained on nuScenes |
| `autoencoders/range_autoencoder.ckpt` | Range-view VAE for lidar |

## Results (nuScenes)

| Reference Type | FID ↓ | LPIPS ↓ | CLIP ↑ | D-LPIPS ↓ | I-LPIPS ↓ |
|---------------|-------|---------|--------|-----------|-----------|
| id-ref | 6.503 | 0.114 | 84.9 | 0.130 | 0.147 |
| track-ref | 6.703 | 0.115 | 83.5 | 0.129 | 0.149 |
| in-domain-ref | 8.947 | 0.127 | 77.5 | 0.132 | 0.154 |
| cross-domain-ref | 9.046 | 0.130 | 76.0 | 0.132 | 0.153 |

## Usage

See the [GitHub repository](https://github.com/alexbuburuzan/MObI) for installation, data preprocessing, inference, and training instructions.

```bash
git clone https://github.com/alexbuburuzan/MObI.git
cd MObI
bash scripts/download_models.sh
bash scripts/realism_test_bench.sh
```

## Citation

```bibtex
@InProceedings{Buburuzan_2025_CVPR,
    author    = {Buburuzan, Alexandru and Sharma, Anuj and Redford, John and Dokania, Puneet K. and Mueller, Romain},
    title     = {MObI: Multimodal Object Inpainting Using Diffusion Models},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
    month     = {June},
    year      = {2025},
    pages     = {1999-2009}
}
```

## License

Released under [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/). Note that this work builds on Paint-by-Example and BEVFusion, which have their own licenses.