Query about training details

#1
by Tangchiu - opened

Hi Disya,

I found your model magnum-qwen3-4b wonderful! I’m looking into the training recipe of this model to better understand its behavioral alignment.

Would you mind sharing some insights here, or perhaps providing an email address for a more detailed technical discussion?

Thanks for your contribution to the community!

Owner

Hi, I stopped doing this, but on this server, they will be happy to help you: https://discord.gg/yceBv2a5

Thanks so much! But this link seems invalid or expired...

Owner

https://discord.gg/dV593hce9d BeaverAI
Yes, Discord gives 7 days, here's permanent

Owner

GreenerPastures/Basically-Human-4B is a base-trained model fine-tuned for instructions and creative writing; I chose it because Qwen3-4B would have been terribly poor (at that time, Qwen3-4b-2507-instructed didn't exist yet). Disya/magnum-qwen3-4b, as stated in its card, is an SFT-finetuned version of GreenerPastures/Basically-Human-4B. To replicate the classic Magnum models, I used anthracite-org/magnum-v4-12b as the base, with the same datasets, 2 epochs, lr 0,00005, batch size of 4, and accumulationof 4 (I think it was 4, I don't remember).

Thanks so much! It really helps me a lot!

Sign up or log in to comment