VIS-OCCANY — ANIMA 3D Occupancy Prediction Module

Part of the ANIMA Intelligence Compiler Suite by Robot Flow Labs.

Paper

OccAny: Generalized Unconstrained Urban 3D Occupancy Prediction (CVPR 2026) Anh-Quan Cao, Tuan-Hung Vu — Valeo AI

arXiv: 2603.23502
Project: valeoai.github.io/OccAny
Reference repo: github.com/valeoai/OccAny

Architecture

OccAny predicts dense 3D occupancy from RGB camera inputs without LiDAR supervision. The ANIMA implementation uses:

DINOv2-Small/14 frozen encoder (384-dim patch tokens)
6-layer transformer decoder (384-dim, 6 heads) for geometry prediction
Prediction heads: global/local pointmaps, confidence, poses, SAM-style features
Novel-view rendering with TTVA (Test-Time View Augmentation)
CUDA-optimized trilinear voxelization (scatter_add, 11x faster than Python)
Multi-loss training: pointmap L1 + voxel BCE + feature distillation

Benchmarks (Paper Targets)

Benchmark	Metric	Paper	ANIMA Target
SemanticKITTI sequence	IoU	25.91	>= 24.5
SemanticKITTI monocular	IoU	24.03	>= 22.5
Occ3D-nuScenes surround	IoU	34.15	>= 32.0

Exported Formats

Format	File	Size	Use Case
PyTorch (.pth)	`pytorch/vis_occany_v1.pth`	108 MB	Training, fine-tuning
SafeTensors	`pytorch/vis_occany_v1.safetensors`	108 MB	Fast loading, safe
ONNX	`onnx/vis_occany_v1.onnx`	66 MB	Cross-platform inference
TensorRT FP32	`tensorrt/vis_occany_v1_fp32.trt`	67 MB	Full precision inference
TensorRT FP16	`tensorrt/vis_occany_v1_fp16.trt`	35 MB	Edge deployment (Jetson/L4)

Usage

import torch
from safetensors.torch import load_file

# Load weights
state = load_file("pytorch/vis_occany_v1.safetensors")

# Or with full model
from anima_vis_occany.model.reconstruction import ReconstructionStage
model = ReconstructionStage(hidden_dim=384, decoder_depth=6, decoder_heads=6)
model.load_state_dict({k.replace("reconstruction.", ""): v for k, v in state.items() if k.startswith("reconstruction.")})

Training

Hardware: NVIDIA L4 (23GB VRAM)
Framework: PyTorch 2.11 + CUDA 12.8
Precision: bf16 mixed precision (AMP)
Optimizer: AdamW, lr=3e-4, weight_decay=0.01
Scheduler: Cosine warmup (5% warmup steps)
Batch: 32 × 4 gradient accumulation = 128 effective
Data: KITTI cached voxels + DINOv2 features + point clouds (7,481 samples)
Config: See configs/training.toml

API Endpoints

Endpoint	Method	Description
`/health`	GET	Module health status
`/ready`	GET	Checkpoint readiness
`/info`	GET	Module metadata
`/infer`	POST	Run 3D occupancy inference

Docker

docker compose -f docker-compose.serve.yml --profile serve up -d

License

Apache 2.0 — Robot Flow Labs / AIFLOW LABS LIMITED

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Robotics

Paper for ilessio-aiflowlab/project_vis_occany

OccAny: Generalized Unconstrained Urban 3D Occupancy

Paper • 2603.23502 • Published 26 days ago • 1