Add SWE-bench Pro evaluation result (50.7%)
#101
by SaylorTwift HF Staff - opened
No description provided.
What method do you use for reasoning and evaluation? My score is only around 30.
We report the result from the tech report of the model