File size: 3,686 Bytes
359dad1
ed897f6
359dad1
 
 
 
 
ed897f6
 
c4411ba
ed897f6
 
 
fe83f6b
2f94a59
fe83f6b
66031fc
fe83f6b
2f94a59
fe83f6b
359dad1
 
ed897f6
 
 
 
 
fe83f6b
 
66031fc
 
fe83f6b
66031fc
fe83f6b
66031fc
 
ed897f6
fe83f6b
66031fc
fe83f6b
66031fc
fe83f6b
66031fc
fe83f6b
ed897f6
fe83f6b
 
 
 
 
 
 
 
ed897f6
 
fe83f6b
ed897f6
 
fe83f6b
ed897f6
fe83f6b
 
 
359dad1
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
---
tags:
  - diffusion
  - vision-language
  - document-recognition
  - qwen2.5-vl
  - block-diffusion
pipeline_tag: image-text-to-text
library_name: transformers
---

<div align="center">

<h1>PA-BDM: Prefix-Adaptive Block Diffusion for Efficient Document Recognition</h1>

**_Efficient Document Recognition with Prefix-Adaptive Block Diffusion_**

Mingxu Chai, Ziyu Shen, Chenyu Liu, Kaidi Zhang, Jiazheng Zhang, Dingwei Zhu, Zhiheng Xi, Ruoyu Chen, Jun Long, Jihua Kang, Tao Gui, Qi Zhang

[![arXiv](https://img.shields.io/badge/arXiv-PA--BDM-b31b1b.svg)](https://arxiv.org/pdf/2605.16861)
[![GitHub](https://img.shields.io/badge/GitHub-Repository-black?logo=github)](https://github.com/SII-sc22mc/PA-BDM)
[![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-blue)](https://huggingface.co/MingxuChai/PA-BDM)

</div>

## ๐Ÿ“ฐ News

- **[2026.05]** ๐ŸŽ‰ We release **PA-BDM**, a prefix-adaptive block diffusion framework for efficient document recognition.

## ๐Ÿ“„ Introduction

Document recognition aims to convert document images containing text, formulas, tables, and complex layouts into structured machine-readable formats. While autoregressive vision-language models have achieved strong recognition quality, their sequential decoding process can be inefficient for long structured outputs. Block diffusion models provide a promising alternative by enabling semi-parallel generation and KV-cache reuse, but existing block diffusion approaches often rely on a fixed block granularity, which limits decoding flexibility and may introduce instability for structure-sensitive recognition tasks.

**PA-BDM** addresses these limitations with a prefix-adaptive block diffusion framework. Instead of treating the block size as a fixed generation unit, PA-BDM uses it as a maximum candidate generation range and dynamically commits reliable prefixes during decoding. This design enables adaptive generation lengths, timely KV-cache reuse, and more stable recognition of structured document outputs.

## โœจ Highlights

- **Prefix-Adaptive Decoding:** Dynamically commits reliable prefixes within each candidate block, allowing the effective decoding length to adapt to local prediction confidence.

- **Efficient KV-cache Reuse:** Enables timely cache updates without waiting for an entire fixed block to be fully resolved.

- **Structure-sensitive Document Recognition:** Designed for document recognition tasks involving text, formulas, tables, and structured outputs.

- **Improved Efficiency-Accuracy Trade-off:** Achieves faster inference while maintaining strong recognition performance across document recognition benchmarks.

## ๐Ÿš€ Usage

Please refer to the repository for installation and inference instructions:

- GitHub: https://github.com/SII-sc22mc/PA-BDM
- Model: https://huggingface.co/MingxuChai/PA-BDM
- Paper: https://arxiv.org/pdf/2605.16861

## โค๏ธ Acknowledgements

This project builds upon prior work and open-source resources including Qwen2.5-VL, DiffusionVL, BD3LMs, and related diffusion language modeling frameworks. We thank the authors for their valuable contributions to the community.

## ๐Ÿ“ Citation

If you find our work useful, please cite our paper:

```bibtex
@misc{chai2026prefixadaptiveblockdiffusionefficient,
  title={Prefix-Adaptive Block Diffusion for Efficient Document Recognition}, 
  author={Mingxu Chai and Ziyu Shen and Chenyu Liu and Kaidi Zhang and Jiazheng Zhang and Dingwei Zhu and Zhiheng Xi and Ruoyu Chen and Jun Long and Jihua Kang and Tao Gui and Qi Zhang},
  year={2026},
  eprint={2605.16861},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2605.16861}
}
```