Model weights in "Distilling to Hybrid Attention Models via KL-Guided Layer Selection" (https://arxiv.org/abs/2512.20569).
Yanhong Li
yanhong-li
AI & ML interests
None yet
Organizations
None yet
models 113
yanhong-li/qwen2_7b_gdn_v4_hybrid_0_33_uniform_stage2
8B • Updated • 1
yanhong-li/qwen2_7b_gdn_v4_hybrid_0_33_smart_stage2
8B • Updated
yanhong-li/qwen2_7b_gdn_v4_hybrid_0_33_ppl_stage2
8B • Updated
yanhong-li/qwen2_7b_gdn_v4_hybrid_0_33_mse_stage2
8B • Updated • 1
yanhong-li/qwen2_7b_gdn_v4_hybrid_0_33_ga_s2_stage2
8B • Updated
yanhong-li/qwen2_7b_gdn_v4_hybrid_0_33_ar_multihop_stage2
8B • Updated
yanhong-li/qwen2_7b_gdn_v4_hybrid_0_33_ar_stage2
8B • Updated
yanhong-li/qwen2_7b_gdn_v4_hybrid_0_25_uniform_stage2
8B • Updated
yanhong-li/qwen2_7b_gdn_v4_hybrid_0_25_uniform_stage1
9B • Updated
yanhong-li/qwen2_7b_gdn_v4_hybrid_0_25_smart_stage2
8B • Updated
datasets 0
None public yet