YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Qwen3_VL_8B_Instruct_DPE_v3

Paper | GitHub

Qwen3_VL_8B_Instruct_DPE_v3 is an enhanced version of the Qwen3-VL-8B-Instruct base model, evolved through the DPE (Diagnostic-driven Progressive Evolution) framework after three complete iterations.

🌟 Model Overview

DPE is a self-evolving training framework for Large Multimodal Models (LMMs). Inspired by the "diagnose-and-correct" mechanism in educational psychology, DPE moves beyond indiscriminate data expansion. It prioritizes the diagnosis of capability gaps to steer targeted data generation and mixture optimization, effectively breaking the multimodal long-tail bottleneck.

v3 Version Key Features:

  • Iterative Evolution: This version has undergone 3 full cycles of diagnosis, targeted data synthesis, and reinforced training.
  • Focused Capability Gains: Significant improvements in complex multimodal reasoning, particularly in Mathematics (MathVision), OCR & Chart Analysis (CharXiv), and University-level Knowledge (MMMU).
  • Stability & Precision: DPE_v3 mitigates capability regression commonly seen in fine-tuning, offering more stable and logically consistent outputs compared to the base model.

πŸ“Š Evaluation Results

Performance of Qwen3_VL_8B_Instruct_DPE_v3 compared to the base model across 11 benchmarks:

Category Benchmark Base Model DPE_v3 (Ours) Improvement
STEM MMMU 65.44 69.11 +3.67
MMVet 67.29 72.80 +5.51
MMStar 61.27 72.13 +10.86
Visual Math MathVerse 53.22 57.18 +3.96
MathVision 51.97 53.88 +1.91
OCR CharXiv (RQ) 47.20 48.10 +0.90
Specialized BLINK 68.54 69.22 +0.68
Overall Average 65.64 68.04 +2.40

πŸ“‘ Citation

If you find this model or the DPE framework helpful, please cite our paper:

@misc{jia2026blindspotsgainsdiagnosticdriven,
      title={From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models}, 
      author={Hongrui Jia and Chaoya Jiang and Shikun Zhang and Wei Ye},
      year={2026},
      eprint={2602.22859},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2602.22859 }, 
}

πŸ“œ License

This model is licensed under the Qwen Research License.

Downloads last month
21
Safetensors
Model size
9B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for hongruijia/Qwen3_VL_8B_Instruct_DPE_v3