Add ScreenSpot-Pro evaluation result (Holo2-235B-A22B)
#2
by merve HF Staff - opened
No description provided.
Hello @merve
This PR seems to be a way to automate the reporting of benchmark numbers. I see YAML errors in the commit, can you please elaborate on how to fix them?
This comment has been hidden (marked as Resolved)
hey @marc-thibault-h the validation error is gone now, it was due to dataset's eval.yaml not being merged to main and this file not being able to validate. if you can merge this (and other PRs) it would be great! also would be nice to add ScreenSpot-Pro evals for your new model too ๐๐ป sorry I should have clarified this in PR description!
closing this one in favor of agentic one, they seem duplicated otherwise!
merve changed pull request status to closed