VIDAR — LCF3D Late-Cascade Fusion for 3D Object Detection

Part of the ANIMA Perception Suite by Robot Flow Labs.

Paper

LCF3D: A Robust and Real-Time Late-Cascade Fusion Framework for 3D Object Detection in Autonomous Driving Carlo Sgaravatti, Riccardo Pieroni, Matteo Corno, Sergio M. Savaresi, Luca Magri, Giacomo Boracchi (2026)

Paper on HuggingFace | GitHub

Architecture

VIDAR implements the LCF3D pipeline — a two-stage fusion approach that combines:

Late Fusion: Match LiDAR 3D detections with RGB 2D detections to filter false positives
Cascade Fusion: Generate 3D frustum proposals from unmatched RGB detections to recover missed objects

The PointNet++ backbone processes raw 3D point clouds (16,384 points) and outputs 3D bounding boxes with class predictions for Car, Pedestrian, and Cyclist.

Model: PointNet++ (512,202 parameters) Input: Point cloud (B, N, 3) — xyz coordinates Output: 3D detections (B, K, 10) — [cx, cy, cz, dx, dy, dz, yaw, class, confidence, ...]

Exported Formats

Format	File	Size	Use Case
PyTorch (.pth)	`pytorch/vidar_v1.pth`	~2MB	Training, fine-tuning
SafeTensors	`pytorch/vidar_v1.safetensors`	~2MB	Fast safe loading
ONNX	`onnx/vidar_v1.onnx`	~2MB	Cross-platform inference
TorchScript	`torchscript/vidar_v1.pt`	~2MB	C++ deployment
TensorRT FP16	`tensorrt/vidar_v1_fp16.trt`	~4MB	Edge (Jetson/L4)
TensorRT FP32	`tensorrt/vidar_v1_fp32.trt`	~6MB	Full precision

Usage

PyTorch

import torch
from anima_vidar.detectors.pointnet_pp import PointNetPPDetector

model = PointNetPPDetector(num_classes=3, input_dim=3)
state = torch.load("pytorch/vidar_v1.pth")
model.load_state_dict(state)
model.eval()

# Inference
points = torch.randn(1, 16384, 3)  # (batch, num_points, xyz)
detections = model(points)  # (1, 128, 10)

SafeTensors

from safetensors.torch import load_file
state = load_file("pytorch/vidar_v1.safetensors")
model.load_state_dict(state)

ONNX

import onnxruntime as ort
session = ort.InferenceSession("onnx/vidar_v1.onnx")
result = session.run(None, {"point_cloud": points_numpy})

Training Results

Parameter	Value
Dataset	KITTI 3D (Chen split: 3,712 train / 3,769 val)
Hardware	2x NVIDIA L4 (23GB each)
Precision	bf16 mixed precision
Optimizer	AdamW (lr=3e-4, weight_decay=1e-4)
Scheduler	Cosine annealing with 5% linear warmup
Batch size	76 (38/GPU)
Best epoch	6 / 16 (early stopped)
Best val_loss	3.1681
Train loss	3.0544 (box: 2.521, cls: 0.422)
Training time	~60 min
Config	`configs/training_cuda.yaml`

Target Metrics (Paper Table 1, KITTI Moderate)

Metric	Paper (PointPillars)	Our Target (>=90%)
AP_3D Car	78.8	>=70.9
AP_3D Pedestrian	52.4	>=47.2
AP_3D Cyclist	63.1	>=56.8

Sensor Stack

Sensor	Model	Purpose
RGB Camera	ZED 2i	2D detection + depth estimation
LiDAR	Unitree L2	3D point cloud (360 deg)
Platform	Unitree L2 robot	Mobile deployment

License

Apache 2.0 — Robot Flow Labs / AIFLOW LABS LIMITED

Training Report

See TRAINING_REPORT.md for detailed training curves and final metrics.

Model Info

Parameters: 512,202
Input shape: [1, 16384, 3]

Downloads last month: -; Downloads are not tracked for this model. How to track

Paper for ilessio-aiflowlab/project_vidar

LCF3D: A Robust and Real-Time Late-Cascade Fusion Framework for 3D Object Detection in Autonomous Driving

Paper • 2601.09812 • Published Jan 14