---
datasets:
- facebook/boxer
license: cc-by-nc-4.0
pipeline_tag: object-detection
tags:
- 3d-object-detection
- open-world-detection
- 3d-vision
---

# Boxer: Robust Lifting of Open-World 2D Bounding Boxes to 3D

[Project Page](https://facebookresearch.github.io/boxer) | [Paper](https://huggingface.co/papers/2604.05212) | [Code](https://github.com/facebookresearch/boxer)

Boxer is an algorithm designed to estimate static 3D bounding boxes (3DBBs) from 2D open-vocabulary object detections, posed images, and optional depth data. At its core is **BoxerNet**, a transformer-based network which lifts 2D bounding box (2DBB) proposals into 3D, followed by multi-view fusion and geometric filtering to produce globally consistent de-duplicated 3DBBs in metric world space.

![Boxer System Architecture](https://github.com/facebookresearch/boxer/raw/main/docs/images/boxer_system.jpg)

## Installation

We recommend using [uv](https://docs.astral.sh/uv/) to manage the environment:

```bash
# Create virtual environment
uv venv boxer --python 3.12
source boxer/bin/activate

# Core dependencies for running Boxer
uv pip install 'torch>=2.0' numpy opencv-python tqdm dill
```

## Usage

After installation and downloading the required checkpoints using the scripts provided in the repository, you can run BoxerNet on sample data. For example, to run BoxerNet in headless mode on a sample sequence:

```bash
python run_boxer.py --input nym10_gen1 --max_n=90 --track
```

This will estimate 3D bounding boxes and save the results (CSV and visualization) to the `output/` directory.

## Citation

If you find Boxer useful in your research, please consider citing:

```bibtex
@article{boxer2026,
      title={Boxer: Robust Lifting of Open-World 2D Bounding Boxes to 3D},
      author={Daniel DeTone and Tianwei Shen and Fan Zhang and Lingni Ma and Julian Straub and Richard Newcombe and Jakob Engel},
      year={2026},
}
```