SlimPJ Balcony-LLaMA (Early Exit)

Base model: melhoushi/slimpj_h1152_d23_nh9_dhqk128_dhv128_ffnmult8_gbs76_tpp20.0_29117steps
Architecture: NestedLlamaForCausalLM (Balcony)
Exit layers: 4, 7, 10
Training method: DPP KL distillation (teacher = full model)

Notes

  • Base model weights are frozen
  • Exit heads are trained via KL divergence
  • Model is not instruction-tuned

Usage

This model must be loaded with NestedLlamaForCausalLM.

Downloads last month
1
Safetensors
Model size
0.3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support