benchmarks

#3
by ale-volpe - opened

hello, nice work. Are there any benchmarks?

TeichAI org

The model doesn't fit on my gpu, so to benchmark the model I'd have to use a cloud provider to do the compute (aka spend more money I don't have lol)
If anyone in the community is able to run these benchmarks that would be appreciated, otherwise let me wait for my next paycheck before running the benchmarks ;)

how to run bencharmk? I can do it if you want

how to run bencharmk? I can do it if you want

It would be much appreciated if you could bench the base glm 4.7 flash model against this one.

https://github.com/EleutherAI/lm-evaluation-harness

TeichAI org

Yes this lm_eval harness is the way I do it as well.

TeichAI org

benchmarks are live

armand0e changed discussion status to closed

I know this is closed, but I think its really cool that such a small dataset gave several of the benchmarks a nice little bump!

armand0e changed discussion status to open
TeichAI org

Yep! This is exactly what we're trying to show with our distills.

Many people see a small dataset and immediately dismiss the model or the distillation process, but it's important to remember that our goal isn't to distill massive amounts of knowledge from teacher models. Instead of that daunting task, we are only distilling the chain of thought (CoT). The performance shifts we're seeing across benchmarks show just how important a well-formulated CoT is when it comes to language models that are, at their core, predicting the next token.

So yes it's very interesting to see what areas certain CoT's show improvements and regressions in. Instead of trying to compete with these large AI labs by making the "smartest" model, we are distilling the step by step process that the teacher model (in this case Claude Opus 4.5) takes each time it is presented with a new task. This approach effectively unlocks the model's existing pre-training, which is exactly why google and openai don't reveal their raw reasoning traces πŸ˜‰

Well this is why CoT was so transformative for distillation

Sign up or log in to comment