facebook
/

boxer

 ---
+datasets:
+- facebook/boxer
 license: cc-by-nc-4.0
+pipeline_tag: object-detection
 tags:
 - 3d-object-detection
 - open-world-detection
 - 3d-vision
 ---
+# Boxer: Robust Lifting of Open-World 2D Bounding Boxes to 3D
+[Project Page](https://facebookresearch.github.io/boxer) | [Paper](https://huggingface.co/papers/2604.05212) | [Code](https://github.com/facebookresearch/boxer)
+Boxer is an algorithm designed to estimate static 3D bounding boxes (3DBBs) from 2D open-vocabulary object detections, posed images, and optional depth data. At its core is **BoxerNet**, a transformer-based network which lifts 2D bounding box (2DBB) proposals into 3D, followed by multi-view fusion and geometric filtering to produce globally consistent de-duplicated 3DBBs in metric world space.
+![Boxer System Architecture](https://github.com/facebookresearch/boxer/raw/main/docs/images/boxer_system.jpg)
+## Installation
+We recommend using [uv](https://docs.astral.sh/uv/) to manage the environment:
+```bash
+# Create virtual environment
+uv venv boxer --python 3.12
+source boxer/bin/activate
+# Core dependencies for running Boxer
+uv pip install 'torch>=2.0' numpy opencv-python tqdm dill
+```
+## Usage
+After installation and downloading the required checkpoints using the scripts provided in the repository, you can run BoxerNet on sample data. For example, to run BoxerNet in headless mode on a sample sequence:
+```bash
+python run_boxer.py --input nym10_gen1 --max_n=90 --track
+```
+This will estimate 3D bounding boxes and save the results (CSV and visualization) to the `output/` directory.
+## Citation
+If you find Boxer useful in your research, please consider citing:
+```bibtex
+@article{boxer2026,
+      title={Boxer: Robust Lifting of Open-World 2D Bounding Boxes to 3D},
+      author={Daniel DeTone and Tianwei Shen and Fan Zhang and Lingni Ma and Julian Straub and Richard Newcombe and Jakob Engel},
+      year={2026},
+}
+```