will you test on becnhmarks?
could you please evaluate it on benchmarks
Way ahead of you brother. Just wait until you see these numbers ;)
thanks broπβ€οΈ
all up. check out those massive relative gains on IFEval and ARC
1 epoch, did you tweak the learning rate?
1 epoch, did you tweak the learning rate?
In my experience one epoch is just better than doing 2 or 3. Dont know if he did LR curves
No tweak on LR as of now.
Just test on GPQA and MMLU, I wanna see how exactly this model perform
Just test on GPQA and MMLU, I wanna see how exactly this model perform
When I get home i'll see if I can do that for you. :D
Do it. I want to see your real benchmark
Don't know why the other one was "fake" but ok.
Don't know why the other one was "fake" but ok.
Yea the relative gain/loss on those benchmarks should still hold. I think they just want it done the official way (using generate_until requests for the whole thing, few-shot, etc)
Just test on GPQA and MMLU, I wanna see how exactly this model perform
So MMLU is 15,908 questions. Pretty sure im going to pass on that benchmark. But im working on others now