Qwen2.5-VL-7B-Instruct_DPE_v3

Qwen2.5-VL-7B-Instruct_DPE_v3 is the final iteration of the DPE evolution for the 7B model, completing three full iterative cycles.

🌟 Model Overview

DPE (Diagnostic-driven Progressive Evolution) breaks the multimodal long-tail bottleneck by steering targeted data generation. v3 achieves the highest average score among all iterations for this base model.

v3 Key Features:

Optimized Performance: Significant improvements in OCR and logical reasoning.
Superior OCR: Achieves a major leap in CharXiv (+4.11%).

📊 Evaluation Results

Category	Benchmark	Base Model	DPE_v3 (Ours)	Improvement
STEM	MMMU	53.11	56.44	+3.33
	RealWorldQA	68.63	70.46	+1.83
Visual Math	MathVerse	43.12	45.10	+1.98
OCR	CharXiv (RQ)	36.80	40.91	+4.11
Overall	Average	57.29	59.29	+2.00

📑 Citation

@misc{jia2026blindspotsgainsdiagnosticdriven,
      title={From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models}, 
      author={Hongrui Jia and Chaoya Jiang and Shikun Zhang and Wei Ye},
      year={2026},
      eprint={2602.22859},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2602.22859}, 
}

Downloads last month: 6

Safetensors

Model size

8B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for hongruijia/Qwen2.5-VL-7B-Instruct_DPE_v3

From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models

Paper • 2602.22859 • Published Feb 26 • 151