Add evaluation results for HLE, GPQA, AIME, HMMT, SWE-Bench, and Terminal-Bench

#4
by SaylorTwift HF Staff - opened
No description provided.

there some metadata errors, plz fix them before i can merge. thx!

Hey ! Those errors are due to errors on the HLE dataset and have no impact on the model card so you can merge as is.
Congrats for the release btw!

bigeagle changed pull request status to merged

cool, merged!

also, would be great to have those results at release so that the community can immediately compare the model to other models on the leaderboards. Would it be possible to do it for next releases ? 🤗

Sign up or log in to comment