Add ScreenSpot-Pro evaluation result (Holo2-235B-A22B)

#2
by merve HF Staff - opened
H company org
No description provided.
H company org

Hello @merve
This PR seems to be a way to automate the reporting of benchmark numbers. I see YAML errors in the commit, can you please elaborate on how to fix them?

This comment has been hidden (marked as Resolved)
H company org
โ€ข
edited Mar 18

hey @marc-thibault-h the validation error is gone now, it was due to dataset's eval.yaml not being merged to main and this file not being able to validate. if you can merge this (and other PRs) it would be great! also would be nice to add ScreenSpot-Pro evals for your new model too ๐Ÿ™Œ๐Ÿป sorry I should have clarified this in PR description!

H company org

closing this one in favor of agentic one, they seem duplicated otherwise!

merve changed pull request status to closed

Sign up or log in to comment