Spaces:
Running
A newer version of the Gradio SDK is available: 6.14.0
Submission Format
A submission is a single JSON file (.json) containing a top-level array of 500 prediction objects — one per question.
Schema
[
{"example_id": 1, "predicted_answer": "A"},
{"example_id": 2, "predicted_answer": "C"},
{"example_id": 500, "predicted_answer": "B"}
]
Required keys (per object):
example_id— integer in[1, 500], matchingexample_idinannotations_public.json.predicted_answer— single uppercase letter that appears in that question'soptionsdict.
Important: questions have between 4 and 10 options. The valid answer letters for any given question are exactly the keys of its options dict. Most are A-F; Event Ordering questions can extend to A-J. A letter outside the question's option set is rejected.
Optional keys (ignored, but useful for your own debugging): raw_response, confidence, tokens, etc.
Rules
- Top-level must be a JSON array (not an object).
- The submission must cover exactly 500 unique
example_ids, one per question. - Duplicate
example_ids are rejected. - Letters must be uppercase (whitespace is trimmed).
- File extension must be
.json.
Converting from existing eval-script output
The reference inference scripts in the EgoMemReason GitHub repo write a list of records with a pred field. One-liner to convert:
import json
src = json.load(open("results_my_model.json"))
sub = [{"example_id": r["example_id"], "predicted_answer": r["pred"]} for r in src]
json.dump(sub, open("submission.json", "w"))
How submissions are scored
Accuracy (%) for each of the six query_type splits:
- Cumulative State Tracking (100 Qs)
- Temporal Counting (100 Qs)
- Event Ordering (100 Qs)
- Event Linking (100 Qs)
- Spatial Preference (50 Qs)
- Activity Pattern (50 Qs)
plus Overall accuracy on all 500. All seven values appear on the leaderboard; ranking is by Overall descending.
Submission limits
- 5 submissions per HF user per 24-hour window.
- The 24-hour window is rolling, not midnight-aligned.
Selected submission
Submit as many times as you like under the cap. In the Manage my submissions tab you can mark one of your past submissions as your selected entry. The default leaderboard view shows only each team's selected entry; the "Show all submissions" toggle reveals all.
Required metadata fields
When you submit you must fill in:
| Field | Required | Notes |
|---|---|---|
team_name |
yes | Team or affiliation |
method_name |
yes | Short title displayed on the leaderboard |
uses_external_data |
yes (yes/no) | Did you train / finetune on anything beyond EgoLife? |
uses_video_frames |
yes | one of frames-only · video-only · frames+audio · captions-only · other |
model_size |
no | e.g. 8B, 32B, API |
method_description |
no | Free-form description |
project_url |
no | Project page |
publication_url |
no | arXiv / OpenReview link |