UCSC-VLAA/VLM-CapCurriculum-Qwen3-VL-8B-Staged
Image-Text-to-Text • 9B • Updated
Staged post-training along the perception → reasoning capability axis. Models, datasets, paper. ICML 2026.
Note Primary release: Qwen3-VL-8B + staged training
Note Qwen2.5-VL-7B + staged training
Note InternVL3-8B + staged training
Note InternVL3.5-8B + staged training
Note Stage-1: synthesised + filtered DOCCI MCQs (with pass_rate)
Note Stage-2: ORZ-Math-13k textual reasoning (with pass_rate)
Note Stage-3: visual reasoning mix — CLEVR-Math + GeoQA170K + Math PUMA + ArxivQA (with pass_rate)