Leaderboard - 90%+

#81
by sheebz - opened

Congrats to those who have gotten recent high scores but I have to say I'm a little skeptical. Getting 90% on L1, L2, L3 would be pretty difficult as at least 10 percent of the questions are arguably impossible. Then looking at the code referenced in paper, I don't see the kind of tools that would be required to do well. Have to wonder if there possibly was test leakage or something like that.

Can u share how on the earth can I submit my submission? It kept saying my account is not authorized to submit.
As for your opinion, I agree. Not because of other things, simply because when I work with coding agent on building this agent, those AI would keep suggesting things thats definitely so over-fitting, and almost straight like cheating. Different LLMs are all the same, kept trying that unless I explicitly asked them isn't that like cheating? (and they mostly would agree). And probably people who want to get really high score on this were doing those overfitting a lot...

Sign up or log in to comment