SlimPJ Balcony-LLaMA (Early Exit)
Base model: melhoushi/slimpj_h1152_d23_nh9_dhqk128_dhv128_ffnmult8_gbs76_tpp20.0_29117steps
Architecture: NestedLlamaForCausalLM (Balcony)
Exit layers: 4, 7, 10
Training method: DPP KL distillation (teacher = full model)
Notes
- Base model weights are frozen
- Exit heads are trained via KL divergence
- Model is not instruction-tuned
Usage
This model must be loaded with NestedLlamaForCausalLM.
- Downloads last month
- 1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support