OpenUrban3D (RAN) โ€” Annotation-Free Open-Vocabulary 3D Segmentation

Paper: OpenUrban3D (Wang et al., Sep 2025)

Architecture

  • 3D Backbone: MinkUNet (sparse 3D convolutions)
  • 2D Feature Extractor: ODISE (frozen)
  • Text Encoder: CLIP ViT-L/14 (frozen)
  • Training: Knowledge distillation (VL features โ†’ 3D backbone)

Usage

from anima_ran.inference.zero_shot import ZeroShotSegmenter

segmenter = ZeroShotSegmenter(backbone_checkpoint="pytorch/ran_v1.pth")
segmenter.load()

result = segmenter.segment(points, ["building", "vegetation", "road"])

Training Config

  • Optimizer: Adam, LR=1e-4
  • Epochs: 60, Batch size: 2
  • Voxel size: 0.2m
  • Hardware: 2x NVIDIA A6000

Citation

@article{wang2025openurban3d,
  title={OpenUrban3D: Annotation-Free Open-Vocabulary Semantic Segmentation of Large-Scale Urban Point Clouds},
  author={Wang, Chongyu and Jing, Kunlei and Zhu, Jihua and Wang, Di},
  journal={arXiv preprint arXiv:2509.10842},
  year={2025}
}

ANIMA Project

Part of the ANIMA Wave-6 multi-agent robotics perception system.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Paper for ilessio-aiflowlab/project_ran