Instructions to use MingxuChai/PA-BDM with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use MingxuChai/PA-BDM with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="MingxuChai/PA-BDM", trust_remote_code=True)
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("MingxuChai/PA-BDM", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use MingxuChai/PA-BDM with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "MingxuChai/PA-BDM"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MingxuChai/PA-BDM",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/MingxuChai/PA-BDM

SGLang

How to use MingxuChai/PA-BDM with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "MingxuChai/PA-BDM" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MingxuChai/PA-BDM",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "MingxuChai/PA-BDM" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MingxuChai/PA-BDM",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use MingxuChai/PA-BDM with Docker Model Runner:
```
docker model run hf.co/MingxuChai/PA-BDM
```

PA-BDM

File size: 3,686 Bytes

359dad1
ed897f6
359dad1
 
 
 
 
ed897f6
 
c4411ba
ed897f6
 
 
fe83f6b
2f94a59
fe83f6b
66031fc
fe83f6b
2f94a59
fe83f6b
359dad1
 
ed897f6
 
 
 
 
fe83f6b
 
66031fc
 
fe83f6b
66031fc
fe83f6b
66031fc
 
ed897f6
fe83f6b
66031fc
fe83f6b
66031fc
fe83f6b
66031fc
fe83f6b
ed897f6
fe83f6b
 
 
 
 
 
 
 
ed897f6
 
fe83f6b
ed897f6
 
fe83f6b
ed897f6
fe83f6b
 
 
359dad1

---
tags:
  - diffusion
  - vision-language
  - document-recognition
  - qwen2.5-vl
  - block-diffusion
pipeline_tag: image-text-to-text
library_name: transformers
---

<div align="center">

<h1>PA-BDM: Prefix-Adaptive Block Diffusion for Efficient Document Recognition</h1>

**_Efficient Document Recognition with Prefix-Adaptive Block Diffusion_**

Mingxu Chai, Ziyu Shen, Chenyu Liu, Kaidi Zhang, Jiazheng Zhang, Dingwei Zhu, Zhiheng Xi, Ruoyu Chen, Jun Long, Jihua Kang, Tao Gui, Qi Zhang

[![arXiv](https://img.shields.io/badge/arXiv-PA--BDM-b31b1b.svg)](https://arxiv.org/pdf/2605.16861)
[![GitHub](https://img.shields.io/badge/GitHub-Repository-black?logo=github)](https://github.com/SII-sc22mc/PA-BDM)
[![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-blue)](https://huggingface.co/MingxuChai/PA-BDM)

</div>

## 📰 News

- **[2026.05]** 🎉 We release **PA-BDM**, a prefix-adaptive block diffusion framework for efficient document recognition.

## 📄 Introduction

Document recognition aims to convert document images containing text, formulas, tables, and complex layouts into structured machine-readable formats. While autoregressive vision-language models have achieved strong recognition quality, their sequential decoding process can be inefficient for long structured outputs. Block diffusion models provide a promising alternative by enabling semi-parallel generation and KV-cache reuse, but existing block diffusion approaches often rely on a fixed block granularity, which limits decoding flexibility and may introduce instability for structure-sensitive recognition tasks.

**PA-BDM** addresses these limitations with a prefix-adaptive block diffusion framework. Instead of treating the block size as a fixed generation unit, PA-BDM uses it as a maximum candidate generation range and dynamically commits reliable prefixes during decoding. This design enables adaptive generation lengths, timely KV-cache reuse, and more stable recognition of structured document outputs.

## ✨ Highlights

- **Prefix-Adaptive Decoding:** Dynamically commits reliable prefixes within each candidate block, allowing the effective decoding length to adapt to local prediction confidence.

- **Efficient KV-cache Reuse:** Enables timely cache updates without waiting for an entire fixed block to be fully resolved.

- **Structure-sensitive Document Recognition:** Designed for document recognition tasks involving text, formulas, tables, and structured outputs.

- **Improved Efficiency-Accuracy Trade-off:** Achieves faster inference while maintaining strong recognition performance across document recognition benchmarks.

## 🚀 Usage

Please refer to the repository for installation and inference instructions:

- GitHub: https://github.com/SII-sc22mc/PA-BDM
- Model: https://huggingface.co/MingxuChai/PA-BDM
- Paper: https://arxiv.org/pdf/2605.16861

## ❤️ Acknowledgements

This project builds upon prior work and open-source resources including Qwen2.5-VL, DiffusionVL, BD3LMs, and related diffusion language modeling frameworks. We thank the authors for their valuable contributions to the community.

## 📝 Citation

If you find our work useful, please cite our paper:

```bibtex
@misc{chai2026prefixadaptiveblockdiffusionefficient,
  title={Prefix-Adaptive Block Diffusion for Efficient Document Recognition}, 
  author={Mingxu Chai and Ziyu Shen and Chenyu Liu and Kaidi Zhang and Jiazheng Zhang and Dingwei Zhu and Zhiheng Xi and Ruoyu Chen and Jun Long and Jihua Kang and Tao Gui and Qi Zhang},
  year={2026},
  eprint={2605.16861},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2605.16861}
}
```