VIDAR β€” LCF3D Late-Cascade Fusion for 3D Object Detection

Part of the ANIMA Perception Suite by Robot Flow Labs.

Paper

LCF3D: A Robust and Real-Time Late-Cascade Fusion Framework for 3D Object Detection in Autonomous Driving Carlo Sgaravatti, Riccardo Pieroni, Matteo Corno, Sergio M. Savaresi, Luca Magri, Giacomo Boracchi (2026)

Paper on HuggingFace | GitHub

Architecture

VIDAR implements the LCF3D pipeline β€” a two-stage fusion approach that combines:

  1. Late Fusion: Match LiDAR 3D detections with RGB 2D detections to filter false positives
  2. Cascade Fusion: Generate 3D frustum proposals from unmatched RGB detections to recover missed objects

The PointNet++ backbone processes raw 3D point clouds (16,384 points) and outputs 3D bounding boxes with class predictions for Car, Pedestrian, and Cyclist.

Model: PointNet++ (512,202 parameters) Input: Point cloud (B, N, 3) β€” xyz coordinates Output: 3D detections (B, K, 10) β€” [cx, cy, cz, dx, dy, dz, yaw, class, confidence, ...]

Exported Formats

Format File Size Use Case
PyTorch (.pth) pytorch/vidar_v1.pth ~2MB Training, fine-tuning
SafeTensors pytorch/vidar_v1.safetensors ~2MB Fast safe loading
ONNX onnx/vidar_v1.onnx ~2MB Cross-platform inference
TorchScript torchscript/vidar_v1.pt ~2MB C++ deployment
TensorRT FP16 tensorrt/vidar_v1_fp16.trt ~4MB Edge (Jetson/L4)
TensorRT FP32 tensorrt/vidar_v1_fp32.trt ~6MB Full precision

Usage

PyTorch

import torch
from anima_vidar.detectors.pointnet_pp import PointNetPPDetector

model = PointNetPPDetector(num_classes=3, input_dim=3)
state = torch.load("pytorch/vidar_v1.pth")
model.load_state_dict(state)
model.eval()

# Inference
points = torch.randn(1, 16384, 3)  # (batch, num_points, xyz)
detections = model(points)  # (1, 128, 10)

SafeTensors

from safetensors.torch import load_file
state = load_file("pytorch/vidar_v1.safetensors")
model.load_state_dict(state)

ONNX

import onnxruntime as ort
session = ort.InferenceSession("onnx/vidar_v1.onnx")
result = session.run(None, {"point_cloud": points_numpy})

Training Results

Parameter Value
Dataset KITTI 3D (Chen split: 3,712 train / 3,769 val)
Hardware 2x NVIDIA L4 (23GB each)
Precision bf16 mixed precision
Optimizer AdamW (lr=3e-4, weight_decay=1e-4)
Scheduler Cosine annealing with 5% linear warmup
Batch size 76 (38/GPU)
Best epoch 6 / 16 (early stopped)
Best val_loss 3.1681
Train loss 3.0544 (box: 2.521, cls: 0.422)
Training time ~60 min
Config configs/training_cuda.yaml

Target Metrics (Paper Table 1, KITTI Moderate)

Metric Paper (PointPillars) Our Target (>=90%)
AP_3D Car 78.8 >=70.9
AP_3D Pedestrian 52.4 >=47.2
AP_3D Cyclist 63.1 >=56.8

Sensor Stack

Sensor Model Purpose
RGB Camera ZED 2i 2D detection + depth estimation
LiDAR Unitree L2 3D point cloud (360 deg)
Platform Unitree L2 robot Mobile deployment

License

Apache 2.0 β€” Robot Flow Labs / AIFLOW LABS LIMITED

Training Report

See TRAINING_REPORT.md for detailed training curves and final metrics.

Model Info

  • Parameters: 512,202
  • Input shape: [1, 16384, 3]
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for ilessio-aiflowlab/project_vidar