ReAlign: Optimizing the Visual Document Retriever with Reasoning-Guided Fine-Grained Alignment
Hao Yang1, Yifan Ji1, Zhipeng Xu1, Zhenghao Liu1, Yukun Yan2, Zulong Chen3, Shuo Wang2, Yu Gu1, Ge Yu1
1Northeastern University, 2Tsinghua University, 3Alibaba Group
Overview
Reasoning-Guided Alignment (ReAlign) is a method that enhances visual document retrieval by leveraging the reasoning capability of Vision-Language Models (VLMs) to provide fine-grained visual document descriptions as supervision signals for training. By identifying query-related regions on a page and generating query-aware descriptions, ReAlign helps the retriever focus on critical visual cues within complex layouts.
This repository contains the visual document retriever based on Qwen2.5-VL-7B-Instruct.
The paper is available at ReAlign: Optimizing the Visual Document Retriever with Reasoning-Guided Fine-Grained Alignment.
Our work is accepted by SIGIR 2026 πππ!
Collections
We have made the following resources available on π€ReAlign collection.
| Resource | Description | Link |
|---|---|---|
| ReAlign-Phi3v | The visual document retriever based on Phi-3-vision-128k-instruct | π€ReAlign-Phi3v |
| ReAlign-Qwen | The visual document retriever based on Qwen2.5-VL-7B-Instruct | π€ReAlign-Qwen |
| Training Data | The data used to train the ReAlign retriever | π€ReAlign-Trainset |
Setup
For detailed training instructions and data preparation, please refer to the official GitHub repository: ReAlign.
Citation
@article{yang2026realign,
title={ReAlign: Optimizing the Visual Document Retriever with Reasoning-Guided Fine-Grained Alignment},
author={Yang, Hao and Ji, Yifan and Xu, Zhipeng and Liu, Zhenghao and Yan, Yukun and Chen, Zulong and Wang, Shuo and Gu, Yu and Yu, Ge},
year={2026},
url={https://arxiv.org/abs/2604.07419},
}
Contact
If you have questions, suggestions, and bug reports, please email: yanghao123@mails.neu.edu.cn
- Downloads last month
- 91
Model tree for yanghaoir/ReAlign-Qwen
Base model
Qwen/Qwen2.5-VL-7B-Instruct