Query about training details

by Tangchiu - opened Feb 24

Feb 24

Hi Disya,

I found your model magnum-qwen3-4b wonderful! I’m looking into the training recipe of this model to better understand its behavioral alignment.

Would you mind sharing some insights here, or perhaps providing an email address for a more detailed technical discussion?

Thanks for your contribution to the community!

Disya

Owner Feb 24

Hi, I stopped doing this, but on this server, they will be happy to help you: https://discord.gg/yceBv2a5

Tangchiu

Mar 7

Thanks so much! But this link seems invalid or expired...

Disya

Owner Mar 7

https://discord.gg/dV593hce9d BeaverAI
Yes, Discord gives 7 days, here's permanent

Disya

Owner Mar 8

GreenerPastures/Basically-Human-4B is a base-trained model fine-tuned for instructions and creative writing; I chose it because Qwen3-4B would have been terribly poor (at that time, Qwen3-4b-2507-instructed didn't exist yet). Disya/magnum-qwen3-4b, as stated in its card, is an SFT-finetuned version of GreenerPastures/Basically-Human-4B. To replicate the classic Magnum models, I used anthracite-org/magnum-v4-12b as the base, with the same datasets, 2 epochs, lr 0,00005, batch size of 4, and accumulationof 4 (I think it was 4, I don't remember).

Tangchiu

Mar 8

Thanks so much! It really helps me a lot!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment