How to use from
Docker Model Runner
docker model run hf.co/xiwenyoumu/Fast-dDrive
Quick Links

Fast-dDrive

Fast-dDrive is a block-diffusion Vision-Language-Action (VLA) model for end-to-end autonomous driving, built on Qwen2.5-VL-3B. It pairs section-aware structured-diffusion training (SASD) with scaffold-aware speculative decoding (Scaffold Spec) and an optional shared-prefix multi-trajectory inference scaling scheme, and reaches SOTA accuracy on the Waymo Open Dataset End-to-End Driving (WOD-E2E) benchmark at over 200 tokens / second on a single H100.

Quick start

import torch
from transformers import AutoModelForCausalLM, AutoProcessor

MODEL = "Efficient-Large-Model/Fast_dDrive_3B"   # or your local clone

processor = AutoProcessor.from_pretrained(MODEL, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    MODEL,
    trust_remote_code=True,
    dtype=torch.bfloat16,
).cuda().eval()

# Scaffold Spec (paper canonical, threshold = 0.0)
output_ids = model.scaffold_speculative_sample(
    input_ids=input_ids,
    attention_mask=attention_mask,
    pixel_values=pixel_values,
    image_grid_thw=image_grid_thw,
    confidence_threshold=0.0,
    block_size=32,
    max_new_tokens=512,
)

Inference paths

This release exposes three decoding paths as bound methods on the model:

Method Description Threshold
mdm_sample_deep_scaffold Section Diffusion (SD) β€” iterative MDM denoising over a pre-filled JSON scaffold 0.9
scaffold_speculative_sample Scaffold Spec (SS) β€” scaffold-aware self-speculative decoding (MDM draft + AR verify per block). Paper canonical. 0.0
scaffold_spec_with_ss_multi_traj SS multi-rollout β€” shared-prefix N-rollout inference scaling on the trajectory section 0.0

Important: scaffold_speculative_sample and its multi-traj variant must be run with confidence_threshold=0.0 to reproduce the paper numbers. Running at 0.9 silently degrades both ADE and throughput.

Headline results β€” WOD-E2E test set (single H100)

Mode RFS ↑ ADE@3s ↓ ADE@5s ↓ TPS ↑ Tok/Step ↑
Scaffold Spec 7.823 1.254 2.907 210.4 4.90
+ Inference scaling (N=4) 7.827 1.240 2.821 114.7 2.76

On the WOD-E2E val set, Scaffold Spec runs at 1919 ms / sample (4.1Γ— over the AR baseline); fused with SGLang the same configuration drops to 665 ms / sample at 608.5 TPS β€” the 11.8Γ— / 12Γ— speedup over AR cited in the paper.

Files

  • modeling.py β€” model definition (Fast_dDriveForConditionalGeneration)
  • configuration.py β€” config classes
  • section_utils.py β€” scaffold construction + section-aligned block index utilities
  • generation_utils.py β€” the three inference paths, attached to the model class on import
  • config.json, generation_config.json, preprocessor_config.json, chat_template.jinja, tokenizer files β€” standard HF artifacts
  • model-0000{1..4}-of-00004.safetensors β€” model weights (4 shards)

Citation

@misc{zhang2026fastddriveefficientblockdiffusionvlm,
      title={Fast-dDrive: Efficient Block-Diffusion VLM for Autonomous Driving},
      author={Kewei Zhang and Jin Wang and Sensen Gao and Chengyue Wu and Yulong Cao and Songyang Han and Boris Ivanovic and Langechuan Liu and Marco Pavone and Song Han and Daquan Zhou and Enze Xie},
      year={2026},
      eprint={2605.23163},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2605.23163},
}
Downloads last month
-
Safetensors
Model size
0.2B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for xiwenyoumu/Fast-dDrive