Instructions to use xiwenyoumu/Fast-dDrive with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use xiwenyoumu/Fast-dDrive with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="xiwenyoumu/Fast-dDrive", trust_remote_code=True) messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("xiwenyoumu/Fast-dDrive", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use xiwenyoumu/Fast-dDrive with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "xiwenyoumu/Fast-dDrive" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "xiwenyoumu/Fast-dDrive", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/xiwenyoumu/Fast-dDrive
- SGLang
How to use xiwenyoumu/Fast-dDrive with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "xiwenyoumu/Fast-dDrive" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "xiwenyoumu/Fast-dDrive", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "xiwenyoumu/Fast-dDrive" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "xiwenyoumu/Fast-dDrive", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use xiwenyoumu/Fast-dDrive with Docker Model Runner:
docker model run hf.co/xiwenyoumu/Fast-dDrive
Fast-dDrive
Fast-dDrive is a block-diffusion Vision-Language-Action (VLA) model for end-to-end autonomous driving, built on Qwen2.5-VL-3B. It pairs section-aware structured-diffusion training (SASD) with scaffold-aware speculative decoding (Scaffold Spec) and an optional shared-prefix multi-trajectory inference scaling scheme, and reaches SOTA accuracy on the Waymo Open Dataset End-to-End Driving (WOD-E2E) benchmark at over 200 tokens / second on a single H100.
Quick start
import torch
from transformers import AutoModelForCausalLM, AutoProcessor
MODEL = "Efficient-Large-Model/Fast_dDrive_3B" # or your local clone
processor = AutoProcessor.from_pretrained(MODEL, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
MODEL,
trust_remote_code=True,
dtype=torch.bfloat16,
).cuda().eval()
# Scaffold Spec (paper canonical, threshold = 0.0)
output_ids = model.scaffold_speculative_sample(
input_ids=input_ids,
attention_mask=attention_mask,
pixel_values=pixel_values,
image_grid_thw=image_grid_thw,
confidence_threshold=0.0,
block_size=32,
max_new_tokens=512,
)
Inference paths
This release exposes three decoding paths as bound methods on the model:
| Method | Description | Threshold |
|---|---|---|
mdm_sample_deep_scaffold |
Section Diffusion (SD) β iterative MDM denoising over a pre-filled JSON scaffold | 0.9 |
scaffold_speculative_sample |
Scaffold Spec (SS) β scaffold-aware self-speculative decoding (MDM draft + AR verify per block). Paper canonical. | 0.0 |
scaffold_spec_with_ss_multi_traj |
SS multi-rollout β shared-prefix N-rollout inference scaling on the trajectory section | 0.0 |
Important:
scaffold_speculative_sampleand its multi-traj variant must be run withconfidence_threshold=0.0to reproduce the paper numbers. Running at0.9silently degrades both ADE and throughput.
Headline results β WOD-E2E test set (single H100)
| Mode | RFS β | ADE@3s β | ADE@5s β | TPS β | Tok/Step β |
|---|---|---|---|---|---|
| Scaffold Spec | 7.823 | 1.254 | 2.907 | 210.4 | 4.90 |
| + Inference scaling (N=4) | 7.827 | 1.240 | 2.821 | 114.7 | 2.76 |
On the WOD-E2E val set, Scaffold Spec runs at 1919 ms / sample (4.1Γ over the AR baseline); fused with SGLang the same configuration drops to 665 ms / sample at 608.5 TPS β the 11.8Γ / 12Γ speedup over AR cited in the paper.
Files
modeling.pyβ model definition (Fast_dDriveForConditionalGeneration)configuration.pyβ config classessection_utils.pyβ scaffold construction + section-aligned block index utilitiesgeneration_utils.pyβ the three inference paths, attached to the model class on importconfig.json,generation_config.json,preprocessor_config.json,chat_template.jinja, tokenizer files β standard HF artifactsmodel-0000{1..4}-of-00004.safetensorsβ model weights (4 shards)
Citation
@misc{zhang2026fastddriveefficientblockdiffusionvlm,
title={Fast-dDrive: Efficient Block-Diffusion VLM for Autonomous Driving},
author={Kewei Zhang and Jin Wang and Sensen Gao and Chengyue Wu and Yulong Cao and Songyang Han and Boris Ivanovic and Langechuan Liu and Marco Pavone and Song Han and Daquan Zhou and Enze Xie},
year={2026},
eprint={2605.23163},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2605.23163},
}
- Downloads last month
- -
docker model run hf.co/xiwenyoumu/Fast-dDrive