Spaces:

Ted412
/

EgoMemReason

Running

App Files Files Community

EgoMemReason / SUBMISSION_FORMAT.md

Ziyang Wang

update GitHub URL to Ziyang412/EgoMemReason

9cf02ac 2 days ago

preview code

raw

history blame contribute delete

3.05 kB

	# Submission Format

	A submission is a single JSON file (`.json`) containing a top-level array of 500 prediction objects — one per question.

	## Schema

	```json
	[
	{"example_id": 1, "predicted_answer": "A"},
	{"example_id": 2, "predicted_answer": "C"},
	{"example_id": 500, "predicted_answer": "B"}
	]
	```

	Required keys (per object):
	- `example_id` — integer in `[1, 500]`, matching `example_id` in `annotations_public.json`.
	- `predicted_answer` — single uppercase letter that appears in that question's `options` dict.

	Important: questions have between 4 and 10 options. The valid answer letters for any given question are exactly the keys of its `options` dict. Most are A-F; Event Ordering questions can extend to A-J. A letter outside the question's option set is rejected.

	Optional keys (ignored, but useful for your own debugging): `raw_response`, `confidence`, `tokens`, etc.

	## Rules

	1. Top-level must be a JSON array (not an object).
	2. The submission must cover exactly 500 unique `example_id`s, one per question.
	3. Duplicate `example_id`s are rejected.
	4. Letters must be uppercase (whitespace is trimmed).
	5. File extension must be `.json`.

	## Converting from existing eval-script output

	The reference inference scripts in the [EgoMemReason GitHub repo](https://github.com/Ziyang412/EgoMemReason) write a list of records with a `pred` field. One-liner to convert:

	```python
	import json
	src = json.load(open("results_my_model.json"))
	sub = [{"example_id": r["example_id"], "predicted_answer": r["pred"]} for r in src]
	json.dump(sub, open("submission.json", "w"))
	```

	## How submissions are scored

	Accuracy (%) for each of the six `query_type` splits:

	- Cumulative State Tracking (100 Qs)
	- Temporal Counting (100 Qs)
	- Event Ordering (100 Qs)
	- Event Linking (100 Qs)
	- Spatial Preference (50 Qs)
	- Activity Pattern (50 Qs)

	plus Overall accuracy on all 500. All seven values appear on the leaderboard; ranking is by Overall descending.

	## Submission limits

	- 5 submissions per HF user per 24-hour window.
	- The 24-hour window is rolling, not midnight-aligned.

	## Selected submission

	Submit as many times as you like under the cap. In the Manage my submissions tab you can mark one of your past submissions as your selected entry. The default leaderboard view shows only each team's selected entry; the "Show all submissions" toggle reveals all.

	## Required metadata fields

	When you submit you must fill in:

	\| Field \| Required \| Notes \|
	\|---\|---\|---\|
	\| `team_name` \| yes \| Team or affiliation \|
	\| `method_name` \| yes \| Short title displayed on the leaderboard \|
	\| `uses_external_data` \| yes (yes/no) \| Did you train / finetune on anything beyond EgoLife? \|
	\| `uses_video_frames` \| yes \| one of `frames-only` · `video-only` · `frames+audio` · `captions-only` · `other` \|
	\| `model_size` \| no \| e.g. `8B`, `32B`, `API` \|
	\| `method_description` \| no \| Free-form description \|
	\| `project_url` \| no \| Project page \|
	\| `publication_url` \| no \| arXiv / OpenReview link \|