temporal-twins-anon commited on
Commit
8287b76
·
verified ·
1 Parent(s): 2c3d57f

Delete metadata/CROISSANT_VALIDATION_NOTES.md

Browse files
metadata/CROISSANT_VALIDATION_NOTES.md DELETED
@@ -1,61 +0,0 @@
1
- # Temporal Twins Croissant Validation Notes
2
-
3
- ## 1. How to Validate
4
-
5
- Use the official MLCommons Croissant tooling after the dataset release files are hosted.
6
-
7
- 1. Confirm the hosted dataset and code repository URLs in `metadata/temporal_twins_croissant.json` are correct for the current release.
8
- 2. Validate the file with the official Croissant validator from the MLCommons Croissant project. If you use the web validator, upload the final JSON-LD file or point it at the hosted Croissant URL.
9
- 3. As a local smoke check, you can also load the JSON-LD with a JSON parser before running the full validator:
10
-
11
- ```bash
12
- python3 - <<'PY'
13
- import json
14
- from pathlib import Path
15
- path = Path("metadata/temporal_twins_croissant.json")
16
- with path.open() as f:
17
- json.load(f)
18
- print("JSON parse OK")
19
- PY
20
- ```
21
-
22
- 4. After JSON parsing succeeds, run the official Croissant validation step and confirm the record sets, fields, and distribution references resolve correctly.
23
-
24
- ## 2. Hosted URLs and Remaining Placeholders
25
-
26
- Dataset-side URLs now resolve to:
27
-
28
- - Dataset URL: `https://huggingface.co/datasets/temporal-twins-benchmark/temporal-twins`
29
- - Croissant metadata URL: `https://huggingface.co/datasets/temporal-twins-benchmark/temporal-twins/raw/main/metadata/temporal_twins_croissant.json`
30
- - Croissant metadata browser page: `https://huggingface.co/datasets/temporal-twins-benchmark/temporal-twins/blob/main/metadata/temporal_twins_croissant.json`
31
- - Data URL: `https://huggingface.co/datasets/temporal-twins-benchmark/temporal-twins/tree/main/data`
32
- - Results URL: `https://huggingface.co/datasets/temporal-twins-benchmark/temporal-twins/tree/main/results`
33
- - Configs URL: `https://huggingface.co/datasets/temporal-twins-benchmark/temporal-twins/tree/main/configs`
34
- - Metadata URL: `https://huggingface.co/datasets/temporal-twins-benchmark/temporal-twins/tree/main/metadata`
35
- - Release landing URL: `https://huggingface.co/datasets/temporal-twins-benchmark/temporal-twins`
36
-
37
- Code repository URL:
38
-
39
- - `https://huggingface.co/temporal-twins-benchmark/temporal-twins-code`
40
-
41
- Paper URL status:
42
-
43
- - Not available during double-blind review; to be added after publication.
44
-
45
- ## 3. Release Checklist
46
-
47
- - Dataset URL is accessible to reviewers.
48
- - Croissant file validates with the official MLCommons Croissant validator.
49
- - Distribution URLs resolve to the intended hosted artifacts.
50
- - Record-set columns match the actual hosted files.
51
- - RAI fields are present.
52
- - Dataset license is present (`CC-BY-4.0`).
53
- - Code repository license is present (`Apache-2.0`).
54
-
55
- ## 4. Packaging Notes
56
-
57
- - The Croissant file describes four dataset slices: `oracle_calib`, `easy`, `medium`, and `hard`.
58
- - It assumes deterministic release seeds `0, 1, 2, 3, 4`.
59
- - It assumes paper-suite configuration `num_users=350`, `simulation_days=45`, `fast_mode=false`, and `n_checkpoints=8`.
60
- - The `matched_prefix_examples` record set uses the release-facing column name `matched_local_event_idx`.
61
- - If the final hosted matched-pairs files keep the internal pipeline column name `eval_local_event_idx` instead, either rename that column in the export or update the Croissant metadata so the record-set field names match the hosted files exactly.