VIDAR β LCF3D Late-Cascade Fusion for 3D Object Detection
Part of the ANIMA Perception Suite by Robot Flow Labs.
Paper
LCF3D: A Robust and Real-Time Late-Cascade Fusion Framework for 3D Object Detection in Autonomous Driving Carlo Sgaravatti, Riccardo Pieroni, Matteo Corno, Sergio M. Savaresi, Luca Magri, Giacomo Boracchi (2026)
Architecture
VIDAR implements the LCF3D pipeline β a two-stage fusion approach that combines:
- Late Fusion: Match LiDAR 3D detections with RGB 2D detections to filter false positives
- Cascade Fusion: Generate 3D frustum proposals from unmatched RGB detections to recover missed objects
The PointNet++ backbone processes raw 3D point clouds (16,384 points) and outputs 3D bounding boxes with class predictions for Car, Pedestrian, and Cyclist.
Model: PointNet++ (512,202 parameters) Input: Point cloud (B, N, 3) β xyz coordinates Output: 3D detections (B, K, 10) β [cx, cy, cz, dx, dy, dz, yaw, class, confidence, ...]
Exported Formats
| Format | File | Size | Use Case |
|---|---|---|---|
| PyTorch (.pth) | pytorch/vidar_v1.pth |
~2MB | Training, fine-tuning |
| SafeTensors | pytorch/vidar_v1.safetensors |
~2MB | Fast safe loading |
| ONNX | onnx/vidar_v1.onnx |
~2MB | Cross-platform inference |
| TorchScript | torchscript/vidar_v1.pt |
~2MB | C++ deployment |
| TensorRT FP16 | tensorrt/vidar_v1_fp16.trt |
~4MB | Edge (Jetson/L4) |
| TensorRT FP32 | tensorrt/vidar_v1_fp32.trt |
~6MB | Full precision |
Usage
PyTorch
import torch
from anima_vidar.detectors.pointnet_pp import PointNetPPDetector
model = PointNetPPDetector(num_classes=3, input_dim=3)
state = torch.load("pytorch/vidar_v1.pth")
model.load_state_dict(state)
model.eval()
# Inference
points = torch.randn(1, 16384, 3) # (batch, num_points, xyz)
detections = model(points) # (1, 128, 10)
SafeTensors
from safetensors.torch import load_file
state = load_file("pytorch/vidar_v1.safetensors")
model.load_state_dict(state)
ONNX
import onnxruntime as ort
session = ort.InferenceSession("onnx/vidar_v1.onnx")
result = session.run(None, {"point_cloud": points_numpy})
Training Results
| Parameter | Value |
|---|---|
| Dataset | KITTI 3D (Chen split: 3,712 train / 3,769 val) |
| Hardware | 2x NVIDIA L4 (23GB each) |
| Precision | bf16 mixed precision |
| Optimizer | AdamW (lr=3e-4, weight_decay=1e-4) |
| Scheduler | Cosine annealing with 5% linear warmup |
| Batch size | 76 (38/GPU) |
| Best epoch | 6 / 16 (early stopped) |
| Best val_loss | 3.1681 |
| Train loss | 3.0544 (box: 2.521, cls: 0.422) |
| Training time | ~60 min |
| Config | configs/training_cuda.yaml |
Target Metrics (Paper Table 1, KITTI Moderate)
| Metric | Paper (PointPillars) | Our Target (>=90%) |
|---|---|---|
| AP_3D Car | 78.8 | >=70.9 |
| AP_3D Pedestrian | 52.4 | >=47.2 |
| AP_3D Cyclist | 63.1 | >=56.8 |
Sensor Stack
| Sensor | Model | Purpose |
|---|---|---|
| RGB Camera | ZED 2i | 2D detection + depth estimation |
| LiDAR | Unitree L2 | 3D point cloud (360 deg) |
| Platform | Unitree L2 robot | Mobile deployment |
License
Apache 2.0 β Robot Flow Labs / AIFLOW LABS LIMITED
Training Report
See TRAINING_REPORT.md for detailed training curves and final metrics.
Model Info
- Parameters: 512,202
- Input shape: [1, 16384, 3]