Spaces:
Running
Running
| # Submission Format | |
| A submission is a single JSON file (`.json`) containing a top-level array of 500 prediction objects — one per question. | |
| ## Schema | |
| ```json | |
| [ | |
| {"example_id": 1, "predicted_answer": "A"}, | |
| {"example_id": 2, "predicted_answer": "C"}, | |
| {"example_id": 500, "predicted_answer": "B"} | |
| ] | |
| ``` | |
| **Required keys (per object):** | |
| - `example_id` — integer in `[1, 500]`, matching `example_id` in `annotations_public.json`. | |
| - `predicted_answer` — single uppercase letter that appears in that question's `options` dict. | |
| **Important:** questions have **between 4 and 10 options**. The valid answer letters for any given question are exactly the keys of its `options` dict. Most are A-F; Event Ordering questions can extend to A-J. A letter outside the question's option set is rejected. | |
| **Optional keys (ignored, but useful for your own debugging):** `raw_response`, `confidence`, `tokens`, etc. | |
| ## Rules | |
| 1. Top-level must be a JSON array (not an object). | |
| 2. The submission must cover **exactly 500 unique `example_id`s**, one per question. | |
| 3. Duplicate `example_id`s are rejected. | |
| 4. Letters must be uppercase (whitespace is trimmed). | |
| 5. File extension must be `.json`. | |
| ## Converting from existing eval-script output | |
| The reference inference scripts in the [EgoMemReason GitHub repo](https://github.com/Ziyang412/EgoMemReason) write a list of records with a `pred` field. One-liner to convert: | |
| ```python | |
| import json | |
| src = json.load(open("results_my_model.json")) | |
| sub = [{"example_id": r["example_id"], "predicted_answer": r["pred"]} for r in src] | |
| json.dump(sub, open("submission.json", "w")) | |
| ``` | |
| ## How submissions are scored | |
| Accuracy (%) for each of the six `query_type` splits: | |
| - Cumulative State Tracking (100 Qs) | |
| - Temporal Counting (100 Qs) | |
| - Event Ordering (100 Qs) | |
| - Event Linking (100 Qs) | |
| - Spatial Preference (50 Qs) | |
| - Activity Pattern (50 Qs) | |
| plus **Overall** accuracy on all 500. All seven values appear on the leaderboard; ranking is by Overall descending. | |
| ## Submission limits | |
| - **5 submissions per HF user per 24-hour window.** | |
| - The 24-hour window is rolling, not midnight-aligned. | |
| ## Selected submission | |
| Submit as many times as you like under the cap. In the **Manage my submissions** tab you can mark **one** of your past submissions as your *selected* entry. The default leaderboard view shows only each team's selected entry; the "Show all submissions" toggle reveals all. | |
| ## Required metadata fields | |
| When you submit you must fill in: | |
| | Field | Required | Notes | | |
| |---|---|---| | |
| | `team_name` | yes | Team or affiliation | | |
| | `method_name` | yes | Short title displayed on the leaderboard | | |
| | `uses_external_data` | yes (yes/no) | Did you train / finetune on anything beyond EgoLife? | | |
| | `uses_video_frames` | yes | one of `frames-only` · `video-only` · `frames+audio` · `captions-only` · `other` | | |
| | `model_size` | no | e.g. `8B`, `32B`, `API` | | |
| | `method_description` | no | Free-form description | | |
| | `project_url` | no | Project page | | |
| | `publication_url` | no | arXiv / OpenReview link | | |