Add model card

#3
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +38 -0
README.md ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: image-text-to-text
3
+ ---
4
+
5
+ # BARD-VL
6
+
7
+ BARD (Bridging AutoRegressive and Diffusion) is a framework designed to convert pretrained autoregressive Vision-Language Models (VLMs) into decoding-efficient diffusion VLMs (dVLMs). By combining progressive supervised block merging with stage-wise distillation, BARD-VL achieves significant speedups in decoding throughput while maintaining strong multimodal capabilities.
8
+
9
+ [**Project Page**](https://fudan-generative-vision.github.io/Bard-VL) | [**Paper**](https://huggingface.co/papers/2604.16514) | [**GitHub**](https://github.com/fudan-generative-vision/Bard-VL)
10
+
11
+ ## Model Description
12
+ BARD-VL addresses the inference bottleneck of token-by-token autoregressive decoding by enabling a parallel decoding paradigm. It transfers multimodal capabilities from existing large-scale VLMs (like Qwen3-VL) to a large-block diffusion VLM. Experimental results show that BARD-VL establishes a new state-of-the-art among comparable-scale open dVLMs while achieving up to 3$\times$ decoding throughput speedup compared to the source model.
13
+
14
+ ## Inference
15
+ To use BARD-VL, please follow the installation instructions in the [official GitHub repository](https://github.com/fudan-generative-vision/Bard-VL). You can run inference for image or video understanding using the provided `inference.py` script:
16
+
17
+ ```bash
18
+ python3 inference.py \
19
+ --model_id fudan-generative-ai/Bard-VL-B4-Mask-4B-Instruct \
20
+ --block_size 4 \
21
+ --denoising_steps 4 \
22
+ --confidence_threshold 0.6
23
+ ```
24
+
25
+ ## Citation
26
+ If you find BARD-VL useful in your research, please cite the following paper:
27
+
28
+ ```bibtex
29
+ @article{chen2026bard,
30
+ title={BARD: Bridging AutoRegressive and Diffusion Vision-Language Models Via Highly Efficient Progressive Block Merging and Stage-Wise Distillation},
31
+ author={Chen, Baoyou and Xia, Hanchen and Tu, Peng ...},
32
+ journal={arXiv preprint arXiv:2604.16514},
33
+ year={2026}
34
+ }
35
+ ```
36
+
37
+ ## Acknowledgements
38
+ This repository builds on top of [NVIDIA NeMo AutoModel](https://github.com/NVIDIA-NeMo/Automodel).