Finetuning `chandra`
#1
by johnlockejrr - opened
Hi! I saw you finetuned chandra with grpo. I was thinking to try finetune it with unsloth + LoRA and SFT or try to add it to Llamafactory keeping in mind it is a Qwen3VLForConditionalGeneration model. How did you do it? There's no info on finetuning chandra from the authors. Thank you!
Hi, thanks for your query; haven't gotten around to adding the model cards. I used GRPOTrainer from trl with vllm for generations. I have been thinking about moving over to other frameworks as the trl support is not amazing. Thanks for the ideas on some alternatives!
I just successfully finetuned it with unsloth+LoRA+SFT. Still distilling the method but it works, the model learns very well and fast.
https://github.com/johnlockejrr/chandra_finetune