ReAlign: Optimizing the Visual Document Retriever with Reasoning-Guided Fine-Grained Alignment

GitHub arXiv HuggingFace

Hao Yang1, Yifan Ji1, Zhipeng Xu1, Zhenghao Liu1, Yukun Yan2, Zulong Chen3, Shuo Wang2, Yu Gu1, Ge Yu1

1Northeastern University, 2Tsinghua University, 3Alibaba Group

Overview

Reasoning-Guided Alignment (ReAlign) is a method that enhances visual document retrieval by leveraging the reasoning capability of Vision-Language Models (VLMs) to provide fine-grained visual document descriptions as supervision signals for training. By identifying query-related regions on a page and generating query-aware descriptions, ReAlign helps the retriever focus on critical visual cues within complex layouts.

This repository contains the visual document retriever based on Qwen2.5-VL-7B-Instruct.

The paper is available at ReAlign: Optimizing the Visual Document Retriever with Reasoning-Guided Fine-Grained Alignment.

Our work is accepted by SIGIR 2026 πŸŽ‰πŸŽ‰πŸŽ‰!

method

Collections

We have made the following resources available on πŸ€—ReAlign collection.

Resource Description Link
ReAlign-Phi3v The visual document retriever based on Phi-3-vision-128k-instruct πŸ€—ReAlign-Phi3v
ReAlign-Qwen The visual document retriever based on Qwen2.5-VL-7B-Instruct πŸ€—ReAlign-Qwen
Training Data The data used to train the ReAlign retriever πŸ€—ReAlign-Trainset

Setup

For detailed training instructions and data preparation, please refer to the official GitHub repository: ReAlign.

Citation

@article{yang2026realign,
      title={ReAlign: Optimizing the Visual Document Retriever with Reasoning-Guided Fine-Grained Alignment},
      author={Yang, Hao and Ji, Yifan and Xu, Zhipeng and Liu, Zhenghao and Yan, Yukun and Chen, Zulong and Wang, Shuo and Gu, Yu and Yu, Ge},
      year={2026},
      url={https://arxiv.org/abs/2604.07419}, 
}

Contact

If you have questions, suggestions, and bug reports, please email: yanghao123@mails.neu.edu.cn

Downloads last month
91
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for yanghaoir/ReAlign-Qwen

Adapter
(239)
this model

Dataset used to train yanghaoir/ReAlign-Qwen

Collection including yanghaoir/ReAlign-Qwen

Paper for yanghaoir/ReAlign-Qwen