dataset?
what dataset did you use to fine tune for that better performance
@Roman1111111 Could you run a quick light test? I’m a bit curious to see how it performs. The improvements look pretty impressive on popular benchmarks, and it doesn’t seem to hurt performance elsewhere. Though honestly, achieving that kind of gain usually takes some serious reinforcement learning.
Sorry, the dataset is private and optimized for chat, agent, and difficult tasks, which is why it improved significantly in these benchmarks.
could you tell me topics and what ai teacher models did you use?🙏❤️
@Roman1111111 Could you run a quick light test? I’m a bit curious to see how it performs. The improvements look pretty impressive on popular benchmarks, and it doesn’t seem to hurt performance elsewhere. Though honestly, achieving that kind of gain usually takes some serious reinforcement learning.
i did, and out of 10 questions, grm 2.6 scored highest grading of average 8.4 by gemini 3.1 pro critique, and qwen3.6 - 8.2, and qwen3.5 - 7.8
could you tell me topics and what ai teacher models did you use?🙏❤️
I used GLM-5.1 to generate most of the data
what aproximate size ?
what aproximate size ?
I cant reveal
just last thing, did you use full 1 million samples(glm5.1-1000000x), or curated?
they probably used some internal thing (i dont have access to) to make it
because JUST the 1m glm5.1 samples won't be enough for that big of a performance jump
i mean performance jump is not really big, but good jump for qwen3.6 27b (as alibaba already squeezed maximum from 27b parameters), what size is ideal(from frontier teacher models) so no risks of degradation will occurr? cause i kinda plan use similar datasets on same model
youd be better off with an MoE model and you would see good improvements from grokking (search that up) so you dont need to have big datasets but quality datasets
The GLM-5.1 data was personally collected and verified; I will not share the GRM-2.5-Plus training data. We have no plans to publish the dataset publicly. Thank you.
The GLM-5.1 data was personally collected and verified; I will not share the GRM-2.5-Plus training data. We have no plans to publish the dataset publicly. Thank you.
this is GRM-2.6-Plus, not GRM-2.5-Plus
Oki, thank you too, but i was just curious of size, cause it’s a bit important for me to ensure for minimum risks of catastrophic forgetting after I try to implement similar sft train
catastrophic forgetting happens when something in training data conflicts with pretraining data, and doesnt depend on size, id recommend RL on the GLM5.1 dataset for maximum performance
now i understand, so if my dataset is diverse, size doesn't matter,thanks
yes diverse and high quality