AI training

#53
by Hfhfj - opened

I really liked your idea. If I manage to collect more data, would you be able to train it even further?

I'm just learning AI stuff (going through the fast ai jupyter book), what data do models like these get trained on?

I really liked your idea. If I manage to collect more data, would you be able to train it even further?

Hi! Thanks a lot 🙌 If you’re able to collect more high-quality data, then absolutely — we can definitely continue training it further!!

I'm just learning AI stuff (going through the fast ai jupyter book), what data do models like these get trained on?

You can check out the Unsloth library and their introduction documents to get started. They also provide beginner-friendly quick start guides and reference notebooks that you can run for free on Colab or Kaggle 👍

Thanks so much for the pointer! I'll check it out!

Hfhfj changed discussion status to closed
Hfhfj changed discussion status to open

I really liked your idea. If I manage to collect more data, would you be able to train it even further?

Hi! Thanks a lot 🙌 If you’re able to collect more high-quality data, then absolutely — we can definitely continue training it further!!

As I understand it, you have completely used this dataset: https://huggingface.co/datasets/nohurry/Opus-4.6-Reasoning-3000x-filtered/viewer/default/train?p=1
If so, please let me know what format you need the data prepared in (maybe the same format as the link I sent above will work). I don't know how big it will turn out to be, but I think it will be similar to the link I provided above, or maybe even 3-4 times larger. Well, I'm not sure, possibly even much larger—I'm guessing a lot larger. Also, where can I send it, or should I just upload it to Hugging Face? And will it be an issue that roughly 1/3 of it will be in Russian and the rest in English?

And one more question: how many compute units do you have? Would it be enough for a full training of Qwen 3.5 or Qwen 3.6 with 125B or 250B parameters (I don't remember the exact figures)? I know it requires a lot of power, but...

I really liked your idea. If I manage to collect more data, would you be able to train it even further?

Hi! Thanks a lot 🙌 If you’re able to collect more high-quality data, then absolutely — we can definitely continue training it further!!

Ham.. where are you?

Sign up or log in to comment