| # Push logs — chronological |
|
|
| Persisted output of every metadata-stub push run. Each file is raw stdout |
| from `scripts/push_metadata_stubs.py`, kept verbatim so we have a paper |
| trail of which slugs went up when and what errored. |
|
|
| ## Session: 2026-04-19 |
|
|
| ### `push_batch_initial_10.log` |
| |
| First sanity run. 10 diverse stubs pushed to validate the renderer on a |
| variety of sources (NEMAR HBN releases, OpenNeuro, MEG, iEEG, various |
| pathologies). |
| |
| Slugs: `ds000117`, `ds000246`, `ds000247`, `EEG2025r1`, `ds003800`, |
| `ds002799`, `EEG2025r10`, `ds004551`, `ds004598`, `ds003061`. |
| |
| Pushed: **10**. Failures: 0. Wall clock: ~3m13s (serial, ~20 s/push). |
| |
| ### `push_bulk_parallel_day1.log` |
|
|
| Option C: every remaining slug, parallelized with 12 workers. |
|
|
| - Attempted: 670 (736 total − 10 initial − 56 already on HF from prior runs). |
| - **Pushed: 233** (18:02:37 → 18:09:02, ~1.7 s/push wall clock × 12 threads). |
| - Hit HF's org-level **rate limit of 300 dataset repo creations per day**. |
| - Remaining: ~437 (queued for tomorrow). |
|
|
| Script is idempotent — resume with the same command: |
|
|
| ```bash |
| python scripts/push_metadata_stubs.py --all --skip-existing --workers 12 |
| ``` |
|
|
| ## Totals after day 1 |
|
|
| - `EEGDash/*` dataset repos live: **299** (plus the `catalog` Space). |
| - Remaining to push: **~437**. |
| - HF rate-limit window resets 24 h after the first repo creation of the day. |
|
|
| ## Session: 2026-04-20 |
|
|
| ### `push_bulk_parallel_day2.log` |
| |
| Second bulk run, kicked off after the 24h reset. Same command, same 12 workers. |
| |
| - Attempted: 435 remaining (after day 1's 299 + a probe push). |
| - **Pushed: 299**. Hit the 300/day quota again (probe earlier in day burned one slot). |
| - Wall clock: ~6 min (12 workers, ~1.2 s/push thread). |
| |
| ## Totals after day 2 |
| |
| - `EEGDash/*` dataset repos live: **600**. |
| - Remaining: **~136**. |
| - A recurring cron (`10 */6 * * *`, job id kept in session state) retries the |
| same command until all are pushed; `--skip-existing` makes each attempt |
| idempotent. |
| |
| ## Session: 2026-04-20 (evening) — queued for tomorrow |
| |
| Hit the 300/day HF quota again mid-afternoon. All pending work is queued via |
| the existing recurring cron (job `586420c6`, `10 */6 * * *`) — next fires |
| at 00:10, 06:10, 12:10, 18:10 local. The first fire after HF's window |
| refreshes picks up everything via `--skip-existing`. |
| |
| ### Pending when quota reopens (~24 h from the first 429 of day 2) |
| |
| 1. **~136 remaining metadata stubs** — the cron will push them automatically |
| (`python scripts/push_metadata_stubs.py --all --skip-existing --workers 12`). |
| 2. **Push the org card** at `org-readme/README.md`. Two ways: |
| - Create `EEGDash/README` as a **Space** (amazon-style — lets us host |
| images alongside the card): |
| ```python |
| from huggingface_hub import HfApi |
| api = HfApi() |
| api.create_repo("EEGDash/README", repo_type="space", space_sdk="static", exist_ok=True) |
| api.upload_file( |
| repo_id="EEGDash/README", |
| repo_type="space", |
| path_or_fileobj="org-readme/README.md", |
| path_in_repo="README.md", |
| ) |
| ``` |
| - Or just paste the markdown from `org-readme/README.md` into the |
| description field at https://huggingface.co/organizations/EEGDash/settings. |
| 3. **Empty-commit the Space** once stubs are done so the `on 🤗` cache |
| refreshes. |
| |
| ### Reference: current state |
|
|
| | | | |
| |---|---| |
| | Total datasets in CSV | 736 | |
| | Mirrored to HF | 600 | |
| | Remaining | ~136 | |
| | Org card drafted | `org-readme/README.md` (pushed to Space repo) | |
| | Org card published | **no** (blocked on quota) | |
| | HF rate limit | 300 repo-creations / 24 h org-wide | |
| | First 429 day 2 | 2026-04-20 ~18:34 UTC | |
| | Earliest clean window | ~2026-04-21 18:30 UTC | |
|
|
| ## Session: 2026-04-21 — closed ✅ |
|
|
| ### `push_bulk_parallel_day3.log` |
| |
| Cron `586420c6` fired at 23:41 local. 24 h window had cleared. |
| |
| - Attempted: 136 remaining. |
| - **Pushed: 136 / 136** (zero 429s, zero failures). |
| - Wall clock: ~16 s total (12 workers, <200 ms per push thread). |
| |
| ## Totals after day 3 (final) |
| |
| - `EEGDash/*` dataset repos live: **736 — complete coverage of the CSV**. |
| - 600 skipped from earlier runs. |
| - Recurring cron `586420c6` deleted after verification. |
| |
| | day | attempted | pushed | 429s | cumulative | |
| |---|---|---|---|---| |
| | 2026-04-19 | 10 + 670 | 10 + 233 | 0 + 437 | 299 | |
| | 2026-04-20 | 435 | 299 | 136 | 600 | |
| | 2026-04-21 | 136 | 136 | 0 | **736** | |
| |