OrionLLM/GRM-2.6-Plus

13 days ago

what dataset did you use to fine tune for that better performance

tanyiades

12 days ago

•

edited 12 days ago

@Roman1111111 Could you run a quick light test? I’m a bit curious to see how it performs. The improvements look pretty impressive on popular benchmarks, and it doesn’t seem to hurt performance elsewhere. Though honestly, achieving that kind of gain usually takes some serious reinforcement learning.

Orion LLM Labs org 12 days ago

Sorry, the dataset is private and optimized for chat, agent, and difficult tasks, which is why it improved significantly in these benchmarks.

12 days ago

could you tell me topics and what ai teacher models did you use?🙏❤️

12 days ago

@Roman1111111 Could you run a quick light test? I’m a bit curious to see how it performs. The improvements look pretty impressive on popular benchmarks, and it doesn’t seem to hurt performance elsewhere. Though honestly, achieving that kind of gain usually takes some serious reinforcement learning.

i did, and out of 10 questions, grm 2.6 scored highest grading of average 8.4 by gemini 3.1 pro critique, and qwen3.6 - 8.2, and qwen3.5 - 7.8

Orion LLM Labs org 11 days ago

•

edited 11 days ago

could you tell me topics and what ai teacher models did you use?🙏❤️

I used GLM-5.1 to generate most of the data

11 days ago

what aproximate size ?

Orion LLM Labs org 11 days ago

what aproximate size ?

I cant reveal

11 days ago

just last thing, did you use full 1 million samples(glm5.1-1000000x), or curated?

Orion LLM Labs org 11 days ago

they probably used some internal thing (i dont have access to) to make it

Orion LLM Labs org 11 days ago

because JUST the 1m glm5.1 samples won't be enough for that big of a performance jump

10 days ago

i mean performance jump is not really big, but good jump for qwen3.6 27b (as alibaba already squeezed maximum from 27b parameters), what size is ideal(from frontier teacher models) so no risks of degradation will occurr? cause i kinda plan use similar datasets on same model

Orion LLM Labs org 10 days ago

youd be better off with an MoE model and you would see good improvements from grokking (search that up) so you dont need to have big datasets but quality datasets

Orion LLM Labs org 10 days ago

The GLM-5.1 data was personally collected and verified; I will not share the GRM-2.5-Plus training data. We have no plans to publish the dataset publicly. Thank you.

pedrodev2026

10 days ago

The GLM-5.1 data was personally collected and verified; I will not share the GRM-2.5-Plus training data. We have no plans to publish the dataset publicly. Thank you.

this is GRM-2.6-Plus, not GRM-2.5-Plus

10 days ago

Oki, thank you too, but i was just curious of size, cause it’s a bit important for me to ensure for minimum risks of catastrophic forgetting after I try to implement similar sft train

Orion LLM Labs org 10 days ago

catastrophic forgetting happens when something in training data conflicts with pretraining data, and doesnt depend on size, id recommend RL on the GLM5.1 dataset for maximum performance

10 days ago

now i understand, so if my dataset is diverse, size doesn't matter,thanks