will you test on becnhmarks?

#1
by Roman1111111 - opened

could you please evaluate it on benchmarks

TeichAI org

Way ahead of you brother. Just wait until you see these numbers ;)

thanks bro😊❀️

TeichAI org

all up. check out those massive relative gains on IFEval and ARC

1 epoch, did you tweak the learning rate?

TeichAI org

1 epoch, did you tweak the learning rate?

In my experience one epoch is just better than doing 2 or 3. Dont know if he did LR curves

TeichAI org

No tweak on LR as of now.

Just test on GPQA and MMLU, I wanna see how exactly this model perform

TeichAI org

Just test on GPQA and MMLU, I wanna see how exactly this model perform

When I get home i'll see if I can do that for you. :D

Do it. I want to see your real benchmark

TeichAI org

Don't know why the other one was "fake" but ok.

TeichAI org

Don't know why the other one was "fake" but ok.

Yea the relative gain/loss on those benchmarks should still hold. I think they just want it done the official way (using generate_until requests for the whole thing, few-shot, etc)

TeichAI org

Just test on GPQA and MMLU, I wanna see how exactly this model perform

So MMLU is 15,908 questions. Pretty sure im going to pass on that benchmark. But im working on others now

Sign up or log in to comment