Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes
Paper • 2603.25562 • Published • 13
Note Tricks to make On-Policy Distillation better
Note Informative tokens come from two regions: positions with high student entropy, and positions with low student entropy plus high teacher–student divergence, where the student is overconfident and wrong