train lora

by renht - opened 15 days ago

Discussion

renht

15 days ago

•

edited 15 days ago

hi，i have a question ，your lora just finetune in klein-9b, not klein-base-9b?

renht changed discussion status to closed 15 days ago

renht changed discussion status to open 15 days ago

canberkkkkk

Owner 15 days ago

•

edited 15 days ago

Hello there, it was trained on Klein Base 9B model.

renht

14 days ago

Thank you for your reply, so train on the klein base 9B, but inference directly load lora into Klein 9B? wouldn't this cause a quality degradation in some cases?

canberkkkkk

Owner 14 days ago

•

edited 14 days ago

Training a LoRA on the base model and running it on the distilled model works under the assumption that distillation preserves the internal representation space — if the activation distributions stay close enough, the LoRA's learned corrections remain meaningful. In practice we've seen this hold on 9B, likely because the 9B distillation is less aggressive. There will be some quality loss compared to 28-step base inference regardless — 4 steps means larger ODE integration errors and some trajectory detail gets lost, that's unavoidable. However, on 4B distilled the representation gap appears large enough that the LoRA breaks down entirely, which is a problem we are actively trying to solve.

Every example in the model card was generated with the distilled version.

renht

13 days ago

thank you.

canberkkkkk changed discussion status to closed 13 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment