Add ScreenSpot-Pro evaluation result (Holo2-235B-A22B)

by merve HF Staff - opened Mar 17

base: refs/heads/main

←

from: refs/pr/2

Discussion Files changed

+242

-0

merve

H company org Mar 17

No description provided.

Add ScreenSpot-Pro evaluation result (Holo2-235B-A22B)02404253

marc-thibault-h

H company org Mar 17

Hello @merve
This PR seems to be a way to automate the reporting of benchmark numbers. I see YAML errors in the commit, can you please elaborate on how to fix them?

SaylorTwift

Mar 18

This comment has been hidden (marked as Resolved)

merve

H company org Mar 18

•

edited Mar 18

hey @marc-thibault-h the validation error is gone now, it was due to dataset's eval.yaml not being merged to main and this file not being able to validate. if you can merge this (and other PRs) it would be great! also would be nice to add ScreenSpot-Pro evals for your new model too 🙌🏻 sorry I should have clarified this in PR description!

merve

H company org Mar 18

closing this one in favor of agentic one, they seem duplicated otherwise!

merve changed pull request status to closed Mar 18

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment