Twin3D logo

Twin3D β€” CoreML exports

On-device LiDAR scene capture β†’ URDF / MJCF / PLY, for iPhone.

Twin3D detecting a keyboard with an oriented 3D bounding box

This repository hosts the exported CoreML model packages that ship inside the Twin3D iOS app. They are format-conversions of Meta Boxer and Ultralytics YOLOE β€” no retraining, no weight modification β€” and inherit their licensing (CC-BY-NC 4.0 and AGPL-3.0 respectively).

Use them to avoid the Python export step when building the iOS app or when integrating with a non-iPhone capture stack via CoreML.

LiDAR point cloud with oriented bounding box on the left, MuJoCo simulation on the right

What's in this repo

File Size Precision Source
BoxerNetModel.mlpackage ~384 MB fp32 facebook/boxer (boxernet_hw960in4x6d768-wssxpf9p.ckpt + dinov3_vits16plus_pretrain_lvd1689m-4057cbaa.pth, inlined)
YoloEModel.mlpackage ~18 MB fp32 Ultralytics YOLOE-11S-seg
variants/BoxerNetModel_fp16.mlpackage ~192 MB fp16 same as fp32, cast to half precision

fp32 + MLComputeUnits.cpuAndGPU is the default configuration, and the one with the tightest parity against the PyTorch reference. See the parity section in the main repo for the measured bounds and the caveats around fp16 on the Neural Engine.

Using the models

In the Twin3D iOS app

Clone the Twin3D repo, then copy the exports directly into the app bundle:

git clone https://github.com/vilaksh01/Twin3D.git
cd Twin3D

# Fetch pre-built CoreML packages (fp32, default)
hf download suviz/Twin3D BoxerNetModel.mlpackage --local-dir Twin3D/
hf download suviz/Twin3D YoloEModel.mlpackage --local-dir Twin3D/

Open Twin3D.xcodeproj, plug in a LiDAR device, ⌘R. This skips the Python export step in the main README entirely.

Standalone (Swift + CoreML, no iOS app)

import CoreML

let config = MLModelConfiguration()
config.computeUnits = .cpuAndGPU  // default β€” see parity notes

let boxer = try MLModel(
    contentsOf: Bundle.main.url(forResource: "BoxerNetModel", withExtension: "mlpackage")!,
    configuration: config
)
let yoloe = try MLModel(
    contentsOf: Bundle.main.url(forResource: "YoloEModel", withExtension: "mlpackage")!,
    configuration: config
)

The input contract (shapes, dtypes, normalization, coordinate convention) is documented in Twin3D/BoxerNet.swift and Twin3D/YOLODetector.swift.

Sensor-agnostic / Python (ONNX)

For RealSense, ZED, OAK-D, Azure Kinect, or ROS bags, use the ONNX export path from the main repo β€” CoreML packages are iPhone-specific. See Adapting to other sensors.

Pipeline

iPhone LiDAR β†’ YOLOE (2D detection) β†’ BoxerNet (3D lifting) β†’ URDF / MJCF / JSON / PLY

One tap captures one frame of RGB + LiDAR depth + camera intrinsics + gravity, runs YOLOE for 2D detection, lifts each box with BoxerNet, and writes the four scene files. All on-device.

Colored LiDAR point cloud with an oriented bounding box around a detected keyboard

Model I/O summary

YOLOE (2D detection)

  • Input: 640 Γ— 640 RGB, [0, 1] float
  • Output: N Γ— (class, xmin, ymin, xmax, ymax, score)
  • Open vocabulary, frozen at export time (see VOCAB in the main repo's tools/export_yoloe.py)

BoxerNet (3D lifting)

  • Inputs: 960 Γ— 960 RGB, 60 Γ— 60 depth patches (median per 16Γ—16 tile), N Γ— 6 PlΓΌcker rays per patch, fx fy cx cy intrinsics, gravity 3-vector, 2D boxes
  • Outputs per box: center ∈ ℝ³, size ∈ ℝ³, yaw, confidence
  • DINOv3 ViT-S/16+ backbone is inlined β€” the .pth is not needed at inference time.

License and attribution

CC-BY-NC 4.0. Non-commercial research use only.

This repository redistributes CoreML format-conversions of:

  • Meta Boxer (CC-BY-NC 4.0) β€” the 3D lifter and the DINOv3 backbone. See facebook/boxer for the original PyTorch checkpoints. Cite the Boxer paper (below) if you use the model in research.
  • Ultralytics YOLOE (AGPL-3.0) β€” the 2D detector. Any network-served derivative inherits AGPL copyleft obligations; Ultralytics offers a commercial license for users who cannot comply.

No weight modifications have been made beyond format conversion (PyTorch β†’ CoreML) and, for the fp16 variant, precision cast.

For a commercial deployment, replace both the 3D lifter and the 2D detector with permissively licensed equivalents β€” see the Roadmap in the main repo.

Links

Citation

@article{boxer2026,
  title  = {Boxer: Robust Lifting of Open-World 2D Bounding Boxes to 3D},
  author = {DeTone, Daniel and Shen, Tianwei and Zhang, Fan and Ma, Lingni
            and Straub, Julian and Newcombe, Richard and Engel, Jakob},
  year   = {2026},
  url    = {https://arxiv.org/abs/2604.05212}
}
Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for vilaksh01/Twin3D

Base model

facebook/boxer
Quantized
(1)
this model

Paper for vilaksh01/Twin3D