---
base_model: colqwen2.5-base
library_name: peft
---

# RegionRet

RegionRet is a LoRA adapter model for region-level vision-language retrieval, fine-tuned from ColQwen2.5-Base using Parameter-Efficient Fine-Tuning (PEFT).

## Model Details

- **Model Type:** LoRA Adapter (PEFT)
- **Base Model:** ColQwen2.5-Base
- **Task Type:** Feature Extraction
- **Framework:** PEFT 0.14.0

### LoRA Configuration

- **Rank (r):** 32
- **LoRA Alpha:** 32
- **LoRA Dropout:** 0.1
- **Target Modules:** MLP projections (down_proj, gate_proj, up_proj) and attention projections (k_proj, q_proj, v_proj, o_proj), plus custom_text_proj

### Model Architecture

- **Processor:** ColQwen2_5_Processor
- **Max Visual Tokens:** 1536
- **Attention:** Flash Attention 2
- **Precision:** bfloat16

## Uses

Please refer to [https://github.com/Aeryn666/RegionRAG](https://github.com/Aeryn666/RegionRAG).


## Training Details

### Training Data

- VisRAG-Ret-Train-In-domain-data
- Visual-CoT (DocVQA, TextCap, TextVQA, InfographicsVQA)

### Training Configuration

- **Loss Function:** RegionContraLoss (global_tau=0.02, local_tau=0.25, local_coef=0.01)
- **Epochs:** 5
- **Batch Size:** 80 per device
- **Learning Rate:** 2e-4
- **Precision:** bfloat16
- **Gradient Checkpointing:** Enabled

## Limitations

- Requires ColQwen2.5-Base base model to function
- Optimized for region-level vision-language retrieval tasks
- GPU with bfloat16 and Flash Attention 2 support recommended

## Citation

If you use this model, please cite:

```bibtex
@misc{li2025regionragregionlevelretrievalaugmentedgeneration,
      title={RegionRAG: Region-level Retrieval-Augmented Generation for Visual Document Understanding}, 
      author={Yinglu Li and Zhiying Lu and Zhihang Liu and Yiwei Sun and Chuanbin Liu and Hongtao Xie},
      year={2025},
      eprint={2510.27261},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2510.27261}, 
}
```

## License

Please refer to the license of the base model ColQwen2.5.