catalog / logs /README.md
bruAristimunha's picture
Day 3: final 136 stubs pushed — 736 / 736 coverage ✅
1c2ddf2
# Push logs — chronological
Persisted output of every metadata-stub push run. Each file is raw stdout
from `scripts/push_metadata_stubs.py`, kept verbatim so we have a paper
trail of which slugs went up when and what errored.
## Session: 2026-04-19
### `push_batch_initial_10.log`
First sanity run. 10 diverse stubs pushed to validate the renderer on a
variety of sources (NEMAR HBN releases, OpenNeuro, MEG, iEEG, various
pathologies).
Slugs: `ds000117`, `ds000246`, `ds000247`, `EEG2025r1`, `ds003800`,
`ds002799`, `EEG2025r10`, `ds004551`, `ds004598`, `ds003061`.
Pushed: **10**. Failures: 0. Wall clock: ~3m13s (serial, ~20 s/push).
### `push_bulk_parallel_day1.log`
Option C: every remaining slug, parallelized with 12 workers.
- Attempted: 670 (736 total − 10 initial − 56 already on HF from prior runs).
- **Pushed: 233** (18:02:37 → 18:09:02, ~1.7 s/push wall clock × 12 threads).
- Hit HF's org-level **rate limit of 300 dataset repo creations per day**.
- Remaining: ~437 (queued for tomorrow).
Script is idempotent — resume with the same command:
```bash
python scripts/push_metadata_stubs.py --all --skip-existing --workers 12
```
## Totals after day 1
- `EEGDash/*` dataset repos live: **299** (plus the `catalog` Space).
- Remaining to push: **~437**.
- HF rate-limit window resets 24 h after the first repo creation of the day.
## Session: 2026-04-20
### `push_bulk_parallel_day2.log`
Second bulk run, kicked off after the 24h reset. Same command, same 12 workers.
- Attempted: 435 remaining (after day 1's 299 + a probe push).
- **Pushed: 299**. Hit the 300/day quota again (probe earlier in day burned one slot).
- Wall clock: ~6 min (12 workers, ~1.2 s/push thread).
## Totals after day 2
- `EEGDash/*` dataset repos live: **600**.
- Remaining: **~136**.
- A recurring cron (`10 */6 * * *`, job id kept in session state) retries the
same command until all are pushed; `--skip-existing` makes each attempt
idempotent.
## Session: 2026-04-20 (evening) — queued for tomorrow
Hit the 300/day HF quota again mid-afternoon. All pending work is queued via
the existing recurring cron (job `586420c6`, `10 */6 * * *`) — next fires
at 00:10, 06:10, 12:10, 18:10 local. The first fire after HF's window
refreshes picks up everything via `--skip-existing`.
### Pending when quota reopens (~24 h from the first 429 of day 2)
1. **~136 remaining metadata stubs** — the cron will push them automatically
(`python scripts/push_metadata_stubs.py --all --skip-existing --workers 12`).
2. **Push the org card** at `org-readme/README.md`. Two ways:
- Create `EEGDash/README` as a **Space** (amazon-style — lets us host
images alongside the card):
```python
from huggingface_hub import HfApi
api = HfApi()
api.create_repo("EEGDash/README", repo_type="space", space_sdk="static", exist_ok=True)
api.upload_file(
repo_id="EEGDash/README",
repo_type="space",
path_or_fileobj="org-readme/README.md",
path_in_repo="README.md",
)
```
- Or just paste the markdown from `org-readme/README.md` into the
description field at https://huggingface.co/organizations/EEGDash/settings.
3. **Empty-commit the Space** once stubs are done so the `on 🤗` cache
refreshes.
### Reference: current state
| | |
|---|---|
| Total datasets in CSV | 736 |
| Mirrored to HF | 600 |
| Remaining | ~136 |
| Org card drafted | `org-readme/README.md` (pushed to Space repo) |
| Org card published | **no** (blocked on quota) |
| HF rate limit | 300 repo-creations / 24 h org-wide |
| First 429 day 2 | 2026-04-20 ~18:34 UTC |
| Earliest clean window | ~2026-04-21 18:30 UTC |
## Session: 2026-04-21 — closed ✅
### `push_bulk_parallel_day3.log`
Cron `586420c6` fired at 23:41 local. 24 h window had cleared.
- Attempted: 136 remaining.
- **Pushed: 136 / 136** (zero 429s, zero failures).
- Wall clock: ~16 s total (12 workers, <200 ms per push thread).
## Totals after day 3 (final)
- `EEGDash/*` dataset repos live: **736 — complete coverage of the CSV**.
- 600 skipped from earlier runs.
- Recurring cron `586420c6` deleted after verification.
| day | attempted | pushed | 429s | cumulative |
|---|---|---|---|---|
| 2026-04-19 | 10 + 670 | 10 + 233 | 0 + 437 | 299 |
| 2026-04-20 | 435 | 299 | 136 | 600 |
| 2026-04-21 | 136 | 136 | 0 | **736** |