Add model card and metadata

#3
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +42 -0
README.md ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: image-text-to-text
3
+ ---
4
+
5
+ # BARD-VL
6
+
7
+ BARD (Bridging AutoRegressive and Diffusion) is a framework that converts a pretrained autoregressive Vision-Language Model (VLM) into a decoding-efficient diffusion VLM (dVLM) through progressive block merging and stage-wise distillation.
8
+
9
+ - **Paper:** [BARD: Bridging AutoRegressive and Diffusion Vision-Language Models Via Highly Efficient Progressive Block Merging and Stage-Wise Distillation](https://huggingface.co/papers/2604.16514)
10
+ - **Code:** [GitHub Repository](https://github.com/fudan-generative-vision/Bard-VL)
11
+ - **Project Page:** [https://fudan-generative-vision.github.io/Bard-VL](https://fudan-generative-vision.github.io/Bard-VL)
12
+
13
+ ## Description
14
+
15
+ Autoregressive VLMs deliver strong multimodal capability, but their token-by-token decoding imposes a fundamental inference bottleneck. BARD-VL addresses this by converting a pretrained autoregressive VLM into a same-architecture, decoding-efficient dVLM. It establishes a new SOTA among comparable-scale open dVLMs while achieving up to 3× decoding throughput speedup compared to the source model.
16
+
17
+ ## Inference
18
+
19
+ The official repository provides an `inference.py` script for image and video understanding. To run inference, clone the repository and use the following command as a template:
20
+
21
+ ```bash
22
+ python3 inference.py \
23
+ --model_id fudan-generative-ai/Bard-VL-B4-Mask-8B-Instruct \
24
+ --block_size 4 \
25
+ --denoising_steps 4 \
26
+ --confidence_threshold 0.6
27
+ ```
28
+
29
+ ## Citation
30
+
31
+ ```bibtex
32
+ @article{chen2026bard,
33
+ title={BARD: Bridging AutoRegressive and Diffusion Vision-Language Models Via Highly Efficient Progressive Block Merging and Stage-Wise Distillation},
34
+ author={Chen, Baoyou and Xia, Hanchen and Tu, Peng and Shi, Haojun and Mu, Shan and Yuan, Weihao and Zhu, Siyu},
35
+ journal={arXiv preprint arXiv:2604.16514},
36
+ year={2026}
37
+ }
38
+ ```
39
+
40
+ ## Acknowledgements
41
+
42
+ This repository builds on top of [NVIDIA NeMo AutoModel](https://github.com/NVIDIA-NeMo/Automodel).