Model weights in "Distilling to Hybrid Attention Models via KL-Guided Layer Selection" (https://arxiv.org/abs/2512.20569).