Hybrid-Distillation Model weights in "Distilling to Hybrid Attention Models via KL-Guided Layer Selection" (https://arxiv.org/abs/2512.20569). yanhong-li/llama3_3b_gdn_v4_hybrid_0_125_ar_mutihop_stage2 4B • Updated Jan 25 yanhong-li/llama3_3b_gdn_v4_hybrid_0_125_ar_stage2 4B • Updated Jan 25 yanhong-li/llama3_3b_gdn_v4_hybrid_0_125_ga_s2_stage2 4B • Updated Jan 25 yanhong-li/llama3_3b_gdn_v4_hybrid_0_125_mse_stage2 4B • Updated Jan 25
Hybrid-Distillation Model weights in "Distilling to Hybrid Attention Models via KL-Guided Layer Selection" (https://arxiv.org/abs/2512.20569). yanhong-li/llama3_3b_gdn_v4_hybrid_0_125_ar_mutihop_stage2 4B • Updated Jan 25 yanhong-li/llama3_3b_gdn_v4_hybrid_0_125_ar_stage2 4B • Updated Jan 25 yanhong-li/llama3_3b_gdn_v4_hybrid_0_125_ga_s2_stage2 4B • Updated Jan 25 yanhong-li/llama3_3b_gdn_v4_hybrid_0_125_mse_stage2 4B • Updated Jan 25