Delete metadata/CROISSANT_VALIDATION_NOTES.md
Browse files
metadata/CROISSANT_VALIDATION_NOTES.md
DELETED
|
@@ -1,61 +0,0 @@
|
|
| 1 |
-
# Temporal Twins Croissant Validation Notes
|
| 2 |
-
|
| 3 |
-
## 1. How to Validate
|
| 4 |
-
|
| 5 |
-
Use the official MLCommons Croissant tooling after the dataset release files are hosted.
|
| 6 |
-
|
| 7 |
-
1. Confirm the hosted dataset and code repository URLs in `metadata/temporal_twins_croissant.json` are correct for the current release.
|
| 8 |
-
2. Validate the file with the official Croissant validator from the MLCommons Croissant project. If you use the web validator, upload the final JSON-LD file or point it at the hosted Croissant URL.
|
| 9 |
-
3. As a local smoke check, you can also load the JSON-LD with a JSON parser before running the full validator:
|
| 10 |
-
|
| 11 |
-
```bash
|
| 12 |
-
python3 - <<'PY'
|
| 13 |
-
import json
|
| 14 |
-
from pathlib import Path
|
| 15 |
-
path = Path("metadata/temporal_twins_croissant.json")
|
| 16 |
-
with path.open() as f:
|
| 17 |
-
json.load(f)
|
| 18 |
-
print("JSON parse OK")
|
| 19 |
-
PY
|
| 20 |
-
```
|
| 21 |
-
|
| 22 |
-
4. After JSON parsing succeeds, run the official Croissant validation step and confirm the record sets, fields, and distribution references resolve correctly.
|
| 23 |
-
|
| 24 |
-
## 2. Hosted URLs and Remaining Placeholders
|
| 25 |
-
|
| 26 |
-
Dataset-side URLs now resolve to:
|
| 27 |
-
|
| 28 |
-
- Dataset URL: `https://huggingface.co/datasets/temporal-twins-benchmark/temporal-twins`
|
| 29 |
-
- Croissant metadata URL: `https://huggingface.co/datasets/temporal-twins-benchmark/temporal-twins/raw/main/metadata/temporal_twins_croissant.json`
|
| 30 |
-
- Croissant metadata browser page: `https://huggingface.co/datasets/temporal-twins-benchmark/temporal-twins/blob/main/metadata/temporal_twins_croissant.json`
|
| 31 |
-
- Data URL: `https://huggingface.co/datasets/temporal-twins-benchmark/temporal-twins/tree/main/data`
|
| 32 |
-
- Results URL: `https://huggingface.co/datasets/temporal-twins-benchmark/temporal-twins/tree/main/results`
|
| 33 |
-
- Configs URL: `https://huggingface.co/datasets/temporal-twins-benchmark/temporal-twins/tree/main/configs`
|
| 34 |
-
- Metadata URL: `https://huggingface.co/datasets/temporal-twins-benchmark/temporal-twins/tree/main/metadata`
|
| 35 |
-
- Release landing URL: `https://huggingface.co/datasets/temporal-twins-benchmark/temporal-twins`
|
| 36 |
-
|
| 37 |
-
Code repository URL:
|
| 38 |
-
|
| 39 |
-
- `https://huggingface.co/temporal-twins-benchmark/temporal-twins-code`
|
| 40 |
-
|
| 41 |
-
Paper URL status:
|
| 42 |
-
|
| 43 |
-
- Not available during double-blind review; to be added after publication.
|
| 44 |
-
|
| 45 |
-
## 3. Release Checklist
|
| 46 |
-
|
| 47 |
-
- Dataset URL is accessible to reviewers.
|
| 48 |
-
- Croissant file validates with the official MLCommons Croissant validator.
|
| 49 |
-
- Distribution URLs resolve to the intended hosted artifacts.
|
| 50 |
-
- Record-set columns match the actual hosted files.
|
| 51 |
-
- RAI fields are present.
|
| 52 |
-
- Dataset license is present (`CC-BY-4.0`).
|
| 53 |
-
- Code repository license is present (`Apache-2.0`).
|
| 54 |
-
|
| 55 |
-
## 4. Packaging Notes
|
| 56 |
-
|
| 57 |
-
- The Croissant file describes four dataset slices: `oracle_calib`, `easy`, `medium`, and `hard`.
|
| 58 |
-
- It assumes deterministic release seeds `0, 1, 2, 3, 4`.
|
| 59 |
-
- It assumes paper-suite configuration `num_users=350`, `simulation_days=45`, `fast_mode=false`, and `n_checkpoints=8`.
|
| 60 |
-
- The `matched_prefix_examples` record set uses the release-facing column name `matched_local_event_idx`.
|
| 61 |
-
- If the final hosted matched-pairs files keep the internal pipeline column name `eval_local_event_idx` instead, either rename that column in the export or update the Croissant metadata so the record-set field names match the hosted files exactly.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|