Qwen3.5-0.8B Coding Distilled

項目
生徒モデル unsloth/Qwen3.5-0.8B
教師モデル unsloth/Qwen3.5-4B
データ CodeAlpaca-20k (12000 samples)
蒸留方式 KL Divergence + CE (α=0.4, T=3.0)
訓練時間 2.54h
Downloads last month
14
Safetensors
Model size
0.9B params
Tensor type
F32
·
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for KARAGE-KUN/Qwen3.5-0.8b-distill-4b

Finetuned
(159)
this model