dataset?

#1
by Roman1111111 - opened

what dataset did you use to fine tune for that better performance

@Roman1111111 Could you run a quick light test? I’m a bit curious to see how it performs. The improvements look pretty impressive on popular benchmarks, and it doesn’t seem to hurt performance elsewhere. Though honestly, achieving that kind of gain usually takes some serious reinforcement learning.

Orion LLM Labs org

Sorry, the dataset is private and optimized for chat, agent, and difficult tasks, which is why it improved significantly in these benchmarks.

could you tell me topics and what ai teacher models did you use?🙏❤️

@Roman1111111 Could you run a quick light test? I’m a bit curious to see how it performs. The improvements look pretty impressive on popular benchmarks, and it doesn’t seem to hurt performance elsewhere. Though honestly, achieving that kind of gain usually takes some serious reinforcement learning.

i did, and out of 10 questions, grm 2.6 scored highest grading of average 8.4 by gemini 3.1 pro critique, and qwen3.6 - 8.2, and qwen3.5 - 7.8

could you tell me topics and what ai teacher models did you use?🙏❤️

I used GLM-5.1 to generate most of the data

what aproximate size ?

Orion LLM Labs org

what aproximate size ?

I cant reveal

just last thing, did you use full 1 million samples(glm5.1-1000000x), or curated?

Orion LLM Labs org

they probably used some internal thing (i dont have access to) to make it

Orion LLM Labs org

because JUST the 1m glm5.1 samples won't be enough for that big of a performance jump

i mean performance jump is not really big, but good jump for qwen3.6 27b (as alibaba already squeezed maximum from 27b parameters), what size is ideal(from frontier teacher models) so no risks of degradation will occurr? cause i kinda plan use similar datasets on same model

Orion LLM Labs org

youd be better off with an MoE model and you would see good improvements from grokking (search that up) so you dont need to have big datasets but quality datasets

Orion LLM Labs org

The GLM-5.1 data was personally collected and verified; I will not share the GRM-2.5-Plus training data. We have no plans to publish the dataset publicly. Thank you.

The GLM-5.1 data was personally collected and verified; I will not share the GRM-2.5-Plus training data. We have no plans to publish the dataset publicly. Thank you.

this is GRM-2.6-Plus, not GRM-2.5-Plus

Oki, thank you too, but i was just curious of size, cause it’s a bit important for me to ensure for minimum risks of catastrophic forgetting after I try to implement similar sft train

Orion LLM Labs org

catastrophic forgetting happens when something in training data conflicts with pretraining data, and doesnt depend on size, id recommend RL on the GLM5.1 dataset for maximum performance

now i understand, so if my dataset is diverse, size doesn't matter,thanks

Orion LLM Labs org

yes diverse and high quality

This comment has been hidden (marked as Off-Topic)

Sign up or log in to comment