train lora
hi,i have a question ,your lora just finetune in klein-9b, not klein-base-9b?
Hello there, it was trained on Klein Base 9B model.
Thank you for your reply, so train on the klein base 9B, but inference directly load lora into Klein 9B? wouldn't this cause a quality degradation in some cases?
Training a LoRA on the base model and running it on the distilled model works under the assumption that distillation preserves the internal representation space — if the activation distributions stay close enough, the LoRA's learned corrections remain meaningful. In practice we've seen this hold on 9B, likely because the 9B distillation is less aggressive. There will be some quality loss compared to 28-step base inference regardless — 4 steps means larger ODE integration errors and some trajectory detail gets lost, that's unavoidable. However, on 4B distilled the representation gap appears large enough that the LoRA breaks down entirely, which is a problem we are actively trying to solve.
Every example in the model card was generated with the distilled version.
thank you.