YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Qwen3_VL_8B_Instruct_DPE_v3
Qwen3_VL_8B_Instruct_DPE_v3 is an enhanced version of the Qwen3-VL-8B-Instruct base model, evolved through the DPE (Diagnostic-driven Progressive Evolution) framework after three complete iterations.
π Model Overview
DPE is a self-evolving training framework for Large Multimodal Models (LMMs). Inspired by the "diagnose-and-correct" mechanism in educational psychology, DPE moves beyond indiscriminate data expansion. It prioritizes the diagnosis of capability gaps to steer targeted data generation and mixture optimization, effectively breaking the multimodal long-tail bottleneck.
v3 Version Key Features:
- Iterative Evolution: This version has undergone 3 full cycles of diagnosis, targeted data synthesis, and reinforced training.
- Focused Capability Gains: Significant improvements in complex multimodal reasoning, particularly in Mathematics (MathVision), OCR & Chart Analysis (CharXiv), and University-level Knowledge (MMMU).
- Stability & Precision: DPE_v3 mitigates capability regression commonly seen in fine-tuning, offering more stable and logically consistent outputs compared to the base model.
π Evaluation Results
Performance of Qwen3_VL_8B_Instruct_DPE_v3 compared to the base model across 11 benchmarks:
| Category | Benchmark | Base Model | DPE_v3 (Ours) | Improvement |
|---|---|---|---|---|
| STEM | MMMU | 65.44 | 69.11 | +3.67 |
| MMVet | 67.29 | 72.80 | +5.51 | |
| MMStar | 61.27 | 72.13 | +10.86 | |
| Visual Math | MathVerse | 53.22 | 57.18 | +3.96 |
| MathVision | 51.97 | 53.88 | +1.91 | |
| OCR | CharXiv (RQ) | 47.20 | 48.10 | +0.90 |
| Specialized | BLINK | 68.54 | 69.22 | +0.68 |
| Overall | Average | 65.64 | 68.04 | +2.40 |
π Citation
If you find this model or the DPE framework helpful, please cite our paper:
@misc{jia2026blindspotsgainsdiagnosticdriven,
title={From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models},
author={Hongrui Jia and Chaoya Jiang and Shikun Zhang and Wei Ye},
year={2026},
eprint={2602.22859},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2602.22859 },
}
π License
This model is licensed under the Qwen Research License.
- Downloads last month
- 21