Qwen3_VL_8B_Instruct_DPE_v3

Qwen3_VL_8B_Instruct_DPE_v3 is an enhanced version of the Qwen3-VL-8B-Instruct base model, evolved through the DPE (Diagnostic-driven Progressive Evolution) framework after three complete iterations.

🌟 Model Overview

DPE is a self-evolving training framework for Large Multimodal Models (LMMs). Inspired by the "diagnose-and-correct" mechanism in educational psychology, DPE moves beyond indiscriminate data expansion. It prioritizes the diagnosis of capability gaps to steer targeted data generation and mixture optimization, effectively breaking the multimodal long-tail bottleneck.

v3 Version Key Features:

Iterative Evolution: This version has undergone 3 full cycles of diagnosis, targeted data synthesis, and reinforced training.
Focused Capability Gains: Significant improvements in complex multimodal reasoning, particularly in Mathematics (MathVision), OCR & Chart Analysis (CharXiv), and University-level Knowledge (MMMU).
Stability & Precision: DPE_v3 mitigates capability regression commonly seen in fine-tuning, offering more stable and logically consistent outputs compared to the base model.

📊 Evaluation Results

Performance of Qwen3_VL_8B_Instruct_DPE_v3 compared to the base model across 11 benchmarks:

Category	Benchmark	Base Model	DPE_v3 (Ours)	Improvement
STEM	MMMU	65.44	69.11	+3.67
	MMVet	67.29	72.80	+5.51
	MMStar	61.27	72.13	+10.86
Visual Math	MathVerse	53.22	57.18	+3.96
	MathVision	51.97	53.88	+1.91
OCR	CharXiv (RQ)	47.20	48.10	+0.90
Specialized	BLINK	68.54	69.22	+0.68
Overall	Average	65.64	68.04	+2.40

📑 Citation

If you find this model or the DPE framework helpful, please cite our paper:

@misc{jia2026blindspotsgainsdiagnosticdriven,
      title={From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models}, 
      author={Hongrui Jia and Chaoya Jiang and Shikun Zhang and Wei Ye},
      year={2026},
      eprint={2602.22859},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2602.22859 }, 
}

📜 License

This model is licensed under the Qwen Research License.

Downloads last month: 21

Safetensors

Model size

9B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for hongruijia/Qwen3_VL_8B_Instruct_DPE_v3

From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models

Paper • 2602.22859 • Published Feb 26 • 151