File size: 3,666 Bytes
e0a7464 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 | # Deploying the EEGDash Space and datasets
One-time setup, per-push workflow, and how the dataset mirrors are kept in sync.
## 1. Create the org (one-time)
1. Sign in at <https://huggingface.co>.
2. Create org → handle **`EEGDash`**, display name *EEG-DaSh*, link
`https://eegdash.org` and `https://github.com/eegdash/EEGDash`, upload
`docs/source/_static/eegdash_image_only.svg` as the logo.
3. Add maintainers.
4. Generate a **write** access token (Settings → Access Tokens) and export it as
`HF_TOKEN` locally and in CI.
## 2. Create the Space
```bash
huggingface-cli login # paste the write token
huggingface-cli repo create \
--type space --space_sdk gradio EEGDash/catalog
```
## 3. Push the Space
From the repo root:
```bash
cd huggingface-space
git init -b main
git remote add origin https://huggingface.co/spaces/EEGDash/catalog
git add README.md app.py requirements.txt dataset_summary.csv
git commit -m "Initial Space: searchable EEGDash catalog"
git push origin main
```
The Space will build and expose at <https://huggingface.co/spaces/EEGDash/catalog>.
### Keeping the catalog fresh
`dataset_summary.csv` in this folder is a snapshot of
`eegdash/dataset/dataset_summary.csv`. Refresh it whenever the source changes:
```bash
cp ../eegdash/dataset/dataset_summary.csv dataset_summary.csv
git add dataset_summary.csv
git commit -m "Refresh catalog snapshot"
git push
```
A GitHub Action that runs on pushes to `develop` can automate this — see the
stub in `.github/workflows/sync-hf-space.yml` (add when ready).
## 4. Mirror datasets to `EEGDash/<slug>`
This is what powers the `on 🤗` column. Push one or more datasets with the helper
script at `scripts/push_to_hf.py`:
```bash
# Single dataset
python scripts/push_to_hf.py --dataset ds002718
# Batch, skipping anything already on the Hub, capped at 5 GB
python scripts/push_to_hf.py \
--from-csv eegdash/dataset/dataset_summary.csv \
--max-size-gb 5 \
--skip-existing
```
Under the hood this calls `EEGDashDataset(...).push_to_hub("EEGDash/<slug>")`,
which is the `HubDatasetMixin` braindecode inherits from. The resulting repo
lays out:
```
EEGDash/<slug>/
├── README.md # Dataset card with load snippets
├── format_info.json # Version + compression metadata
└── sourcedata/braindecode/
├── dataset_description.json # BIDS-compliant
├── participants.tsv # BIDS-compliant
├── dataset.zarr/ # blosc-compressed windowed data
└── sub-<label>/eeg/
├── *_events.tsv
├── *_channels.tsv
└── *_eeg.json
```
Users then load it with:
```python
from braindecode.datasets import BaseConcatDataset
ds = BaseConcatDataset.pull_from_hub("EEGDash/ds002718")
```
## 5. Verify
- Space renders: <https://huggingface.co/spaces/EEGDash/catalog>.
- Org page shows the Space card + dataset repos: <https://huggingface.co/EEGDash>.
- At least one dataset loadable end-to-end via `pull_from_hub`.
## Troubleshooting
| Symptom | Likely cause |
|---|---|
| `on 🤗` column empty for everything | Space has no outbound network, or rate-limited; the Space caches once per process so redeploy to retry. |
| `push_to_hub` fails with `ImportError` | `pip install braindecode[hub]` (pulls in `zarr` + `huggingface_hub`). |
| Repo exists but Space doesn't flag it | `HfApi().list_datasets(author="EEGDash", limit=500)` caps at 500 — raise the limit in `app.py::_hf_repos` if the org grows beyond that. |
| `dataset_summary.csv` out of sync | Re-run step 3's refresh or add the workflow stub. |
|