File size: 4,371 Bytes
4dae5fa
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
af38fad
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
dfa6c8b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1c2ddf2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
# Push logs — chronological

Persisted output of every metadata-stub push run. Each file is raw stdout
from `scripts/push_metadata_stubs.py`, kept verbatim so we have a paper
trail of which slugs went up when and what errored.

## Session: 2026-04-19

### `push_batch_initial_10.log`

First sanity run. 10 diverse stubs pushed to validate the renderer on a
variety of sources (NEMAR HBN releases, OpenNeuro, MEG, iEEG, various
pathologies).

Slugs: `ds000117`, `ds000246`, `ds000247`, `EEG2025r1`, `ds003800`,
`ds002799`, `EEG2025r10`, `ds004551`, `ds004598`, `ds003061`.

Pushed: **10**. Failures: 0. Wall clock: ~3m13s (serial, ~20 s/push).

### `push_bulk_parallel_day1.log`

Option C: every remaining slug, parallelized with 12 workers.

- Attempted: 670 (736 total − 10 initial − 56 already on HF from prior runs).
- **Pushed: 233** (18:02:37 → 18:09:02, ~1.7 s/push wall clock × 12 threads).
- Hit HF's org-level **rate limit of 300 dataset repo creations per day**.
- Remaining: ~437 (queued for tomorrow).

Script is idempotent — resume with the same command:

```bash
python scripts/push_metadata_stubs.py --all --skip-existing --workers 12
```

## Totals after day 1

- `EEGDash/*` dataset repos live: **299** (plus the `catalog` Space).
- Remaining to push: **~437**.
- HF rate-limit window resets 24 h after the first repo creation of the day.

## Session: 2026-04-20

### `push_bulk_parallel_day2.log`

Second bulk run, kicked off after the 24h reset. Same command, same 12 workers.

- Attempted: 435 remaining (after day 1's 299 + a probe push).
- **Pushed: 299**. Hit the 300/day quota again (probe earlier in day burned one slot).
- Wall clock: ~6 min (12 workers, ~1.2 s/push thread).

## Totals after day 2

- `EEGDash/*` dataset repos live: **600**.
- Remaining: **~136**.
- A recurring cron (`10 */6 * * *`, job id kept in session state) retries the
  same command until all are pushed; `--skip-existing` makes each attempt
  idempotent.

## Session: 2026-04-20 (evening) — queued for tomorrow

Hit the 300/day HF quota again mid-afternoon. All pending work is queued via
the existing recurring cron (job `586420c6`, `10 */6 * * *`) — next fires
at 00:10, 06:10, 12:10, 18:10 local. The first fire after HF's window
refreshes picks up everything via `--skip-existing`.

### Pending when quota reopens (~24 h from the first 429 of day 2)

1. **~136 remaining metadata stubs** — the cron will push them automatically
   (`python scripts/push_metadata_stubs.py --all --skip-existing --workers 12`).
2. **Push the org card** at `org-readme/README.md`. Two ways:
   - Create `EEGDash/README` as a **Space** (amazon-style — lets us host
     images alongside the card):
     ```python
     from huggingface_hub import HfApi
     api = HfApi()
     api.create_repo("EEGDash/README", repo_type="space", space_sdk="static", exist_ok=True)
     api.upload_file(
         repo_id="EEGDash/README",
         repo_type="space",
         path_or_fileobj="org-readme/README.md",
         path_in_repo="README.md",
     )
     ```
   - Or just paste the markdown from `org-readme/README.md` into the
     description field at https://huggingface.co/organizations/EEGDash/settings.
3. **Empty-commit the Space** once stubs are done so the `on 🤗` cache
   refreshes.

### Reference: current state

| | |
|---|---|
| Total datasets in CSV | 736 |
| Mirrored to HF | 600 |
| Remaining | ~136 |
| Org card drafted | `org-readme/README.md` (pushed to Space repo) |
| Org card published | **no** (blocked on quota) |
| HF rate limit | 300 repo-creations / 24 h org-wide |
| First 429 day 2 | 2026-04-20 ~18:34 UTC |
| Earliest clean window | ~2026-04-21 18:30 UTC |

## Session: 2026-04-21 — closed ✅

### `push_bulk_parallel_day3.log`

Cron `586420c6` fired at 23:41 local. 24 h window had cleared.

- Attempted: 136 remaining.
- **Pushed: 136 / 136** (zero 429s, zero failures).
- Wall clock: ~16 s total (12 workers, <200 ms per push thread).

## Totals after day 3 (final)

- `EEGDash/*` dataset repos live: **736 — complete coverage of the CSV**.
- 600 skipped from earlier runs.
- Recurring cron `586420c6` deleted after verification.

| day | attempted | pushed | 429s | cumulative |
|---|---|---|---|---|
| 2026-04-19 | 10 + 670 | 10 + 233 | 0 + 437 | 299 |
| 2026-04-20 | 435 | 299 | 136 | 600 |
| 2026-04-21 | 136 | 136 | 0 | **736** |