From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models
Paper β’ 2602.22859 β’ Published β’ 151
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Qwen3_VL_8B_Instruct_DPE_v1 is the first-iteration model evolved from Qwen3-VL-8B-Instruct using the DPE (Diagnostic-driven Progressive Evolution) framework.
DPE is a self-evolving training framework for Large Multimodal Models (LMMs). It prioritizes the diagnosis of capability gaps to steer targeted data generation and mixture optimization. This version represents the first full cycle of the DPE pipeline.
v1 Key Features:
| Category | Benchmark | Base Model | DPE_v1 (Ours) | Improvement |
|---|---|---|---|---|
| STEM | MMMU | 65.44 | 68.11 | +2.67 |
| MMVet | 67.29 | 70.92 | +3.63 | |
| MMStar | 61.27 | 71.40 | +10.13 | |
| Visual Math | MathVerse | 53.22 | 55.99 | +2.77 |
| MathVision | 51.97 | 52.04 | +0.07 | |
| Overall | Average | 65.64 | 67.48 | +1.84 |
@misc{jia2026blindspotsgainsdiagnosticdriven,
title={From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models},
author={Hongrui Jia and Chaoya Jiang and Shikun Zhang and Wei Ye},
year={2026},
eprint={2602.22859},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2602.22859},
}