From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models
Paper β’ 2602.22859 β’ Published β’ 151
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Qwen2.5-VL-7B-Instruct_DPE_v3 is the final iteration of the DPE evolution for the 7B model, completing three full iterative cycles.
DPE (Diagnostic-driven Progressive Evolution) breaks the multimodal long-tail bottleneck by steering targeted data generation. v3 achieves the highest average score among all iterations for this base model.
v3 Key Features:
| Category | Benchmark | Base Model | DPE_v3 (Ours) | Improvement |
|---|---|---|---|---|
| STEM | MMMU | 53.11 | 56.44 | +3.33 |
| RealWorldQA | 68.63 | 70.46 | +1.83 | |
| Visual Math | MathVerse | 43.12 | 45.10 | +1.98 |
| OCR | CharXiv (RQ) | 36.80 | 40.91 | +4.11 |
| Overall | Average | 57.29 | 59.29 | +2.00 |
@misc{jia2026blindspotsgainsdiagnosticdriven,
title={From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models},
author={Hongrui Jia and Chaoya Jiang and Shikun Zhang and Wei Ye},
year={2026},
eprint={2602.22859},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2602.22859},
}