--- base_model: colqwen2.5-base library_name: peft --- # RegionRet RegionRet is a LoRA adapter model for region-level vision-language retrieval, fine-tuned from ColQwen2.5-Base using Parameter-Efficient Fine-Tuning (PEFT). ## Model Details - **Model Type:** LoRA Adapter (PEFT) - **Base Model:** ColQwen2.5-Base - **Task Type:** Feature Extraction - **Framework:** PEFT 0.14.0 ### LoRA Configuration - **Rank (r):** 32 - **LoRA Alpha:** 32 - **LoRA Dropout:** 0.1 - **Target Modules:** MLP projections (down_proj, gate_proj, up_proj) and attention projections (k_proj, q_proj, v_proj, o_proj), plus custom_text_proj ### Model Architecture - **Processor:** ColQwen2_5_Processor - **Max Visual Tokens:** 1536 - **Attention:** Flash Attention 2 - **Precision:** bfloat16 ## Uses Please refer to [https://github.com/Aeryn666/RegionRAG](https://github.com/Aeryn666/RegionRAG). ## Training Details ### Training Data - VisRAG-Ret-Train-In-domain-data - Visual-CoT (DocVQA, TextCap, TextVQA, InfographicsVQA) ### Training Configuration - **Loss Function:** RegionContraLoss (global_tau=0.02, local_tau=0.25, local_coef=0.01) - **Epochs:** 5 - **Batch Size:** 80 per device - **Learning Rate:** 2e-4 - **Precision:** bfloat16 - **Gradient Checkpointing:** Enabled ## Limitations - Requires ColQwen2.5-Base base model to function - Optimized for region-level vision-language retrieval tasks - GPU with bfloat16 and Flash Attention 2 support recommended ## Citation If you use this model, please cite: ```bibtex @misc{li2025regionragregionlevelretrievalaugmentedgeneration, title={RegionRAG: Region-level Retrieval-Augmented Generation for Visual Document Understanding}, author={Yinglu Li and Zhiying Lu and Zhihang Liu and Yiwei Sun and Chuanbin Liu and Hongtao Xie}, year={2025}, eprint={2510.27261}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2510.27261}, } ``` ## License Please refer to the license of the base model ColQwen2.5.