| # Deploying the EEGDash Space and datasets |
|
|
| One-time setup, per-push workflow, and how the dataset mirrors are kept in sync. |
|
|
| ## 1. Create the org (one-time) |
|
|
| 1. Sign in at <https://huggingface.co>. |
| 2. Create org → handle **`EEGDash`**, display name *EEG-DaSh*, link |
| `https://eegdash.org` and `https://github.com/eegdash/EEGDash`, upload |
| `docs/source/_static/eegdash_image_only.svg` as the logo. |
| 3. Add maintainers. |
| 4. Generate a **write** access token (Settings → Access Tokens) and export it as |
| `HF_TOKEN` locally and in CI. |
|
|
| ## 2. Create the Space |
|
|
| ```bash |
| huggingface-cli login # paste the write token |
| huggingface-cli repo create \ |
| --type space --space_sdk gradio EEGDash/catalog |
| ``` |
|
|
| ## 3. Push the Space |
|
|
| From the repo root: |
|
|
| ```bash |
| cd huggingface-space |
| |
| git init -b main |
| git remote add origin https://huggingface.co/spaces/EEGDash/catalog |
| git add README.md app.py requirements.txt dataset_summary.csv |
| git commit -m "Initial Space: searchable EEGDash catalog" |
| git push origin main |
| ``` |
|
|
| The Space will build and expose at <https://huggingface.co/spaces/EEGDash/catalog>. |
|
|
| ### Keeping the catalog fresh |
|
|
| `dataset_summary.csv` in this folder is a snapshot of |
| `eegdash/dataset/dataset_summary.csv`. Refresh it whenever the source changes: |
|
|
| ```bash |
| cp ../eegdash/dataset/dataset_summary.csv dataset_summary.csv |
| git add dataset_summary.csv |
| git commit -m "Refresh catalog snapshot" |
| git push |
| ``` |
|
|
| A GitHub Action that runs on pushes to `develop` can automate this — see the |
| stub in `.github/workflows/sync-hf-space.yml` (add when ready). |
|
|
| ## 4. Mirror datasets to `EEGDash/<slug>` |
|
|
| This is what powers the `on 🤗` column. Push one or more datasets with the helper |
| script at `scripts/push_to_hf.py`: |
|
|
| ```bash |
| # Single dataset |
| python scripts/push_to_hf.py --dataset ds002718 |
| |
| # Batch, skipping anything already on the Hub, capped at 5 GB |
| python scripts/push_to_hf.py \ |
| --from-csv eegdash/dataset/dataset_summary.csv \ |
| --max-size-gb 5 \ |
| --skip-existing |
| ``` |
|
|
| Under the hood this calls `EEGDashDataset(...).push_to_hub("EEGDash/<slug>")`, |
| which is the `HubDatasetMixin` braindecode inherits from. The resulting repo |
| lays out: |
|
|
| ``` |
| EEGDash/<slug>/ |
| ├── README.md # Dataset card with load snippets |
| ├── format_info.json # Version + compression metadata |
| └── sourcedata/braindecode/ |
| ├── dataset_description.json # BIDS-compliant |
| ├── participants.tsv # BIDS-compliant |
| ├── dataset.zarr/ # blosc-compressed windowed data |
| └── sub-<label>/eeg/ |
| ├── *_events.tsv |
| ├── *_channels.tsv |
| └── *_eeg.json |
| ``` |
|
|
| Users then load it with: |
|
|
| ```python |
| from braindecode.datasets import BaseConcatDataset |
| ds = BaseConcatDataset.pull_from_hub("EEGDash/ds002718") |
| ``` |
|
|
| ## 5. Verify |
|
|
| - Space renders: <https://huggingface.co/spaces/EEGDash/catalog>. |
| - Org page shows the Space card + dataset repos: <https://huggingface.co/EEGDash>. |
| - At least one dataset loadable end-to-end via `pull_from_hub`. |
|
|
| ## Troubleshooting |
|
|
| | Symptom | Likely cause | |
| |---|---| |
| | `on 🤗` column empty for everything | Space has no outbound network, or rate-limited; the Space caches once per process so redeploy to retry. | |
| | `push_to_hub` fails with `ImportError` | `pip install braindecode[hub]` (pulls in `zarr` + `huggingface_hub`). | |
| | Repo exists but Space doesn't flag it | `HfApi().list_datasets(author="EEGDash", limit=500)` caps at 500 — raise the limit in `app.py::_hf_repos` if the org grows beyond that. | |
| | `dataset_summary.csv` out of sync | Re-run step 3's refresh or add the workflow stub. | |
|
|