Spaces:

EEGDash
/

catalog

Running

File size: 3,666 Bytes

e0a7464

# Deploying the EEGDash Space and datasets

One-time setup, per-push workflow, and how the dataset mirrors are kept in sync.

## 1. Create the org (one-time)

1. Sign in at <https://huggingface.co>.
2. Create org → handle **`EEGDash`**, display name *EEG-DaSh*, link
   `https://eegdash.org` and `https://github.com/eegdash/EEGDash`, upload
   `docs/source/_static/eegdash_image_only.svg` as the logo.
3. Add maintainers.
4. Generate a **write** access token (Settings → Access Tokens) and export it as
   `HF_TOKEN` locally and in CI.

## 2. Create the Space

```bash
huggingface-cli login            # paste the write token
huggingface-cli repo create \
    --type space --space_sdk gradio EEGDash/catalog
```

## 3. Push the Space

From the repo root:

```bash
cd huggingface-space

git init -b main
git remote add origin https://huggingface.co/spaces/EEGDash/catalog
git add README.md app.py requirements.txt dataset_summary.csv
git commit -m "Initial Space: searchable EEGDash catalog"
git push origin main
```

The Space will build and expose at <https://huggingface.co/spaces/EEGDash/catalog>.

### Keeping the catalog fresh

`dataset_summary.csv` in this folder is a snapshot of
`eegdash/dataset/dataset_summary.csv`. Refresh it whenever the source changes:

```bash
cp ../eegdash/dataset/dataset_summary.csv dataset_summary.csv
git add dataset_summary.csv
git commit -m "Refresh catalog snapshot"
git push
```

A GitHub Action that runs on pushes to `develop` can automate this — see the
stub in `.github/workflows/sync-hf-space.yml` (add when ready).

## 4. Mirror datasets to `EEGDash/<slug>`

This is what powers the `on 🤗` column. Push one or more datasets with the helper
script at `scripts/push_to_hf.py`:

```bash
# Single dataset
python scripts/push_to_hf.py --dataset ds002718

# Batch, skipping anything already on the Hub, capped at 5 GB
python scripts/push_to_hf.py \
    --from-csv eegdash/dataset/dataset_summary.csv \
    --max-size-gb 5 \
    --skip-existing
```

Under the hood this calls `EEGDashDataset(...).push_to_hub("EEGDash/<slug>")`,
which is the `HubDatasetMixin` braindecode inherits from. The resulting repo
lays out:

```
EEGDash/<slug>/
├── README.md                        # Dataset card with load snippets
├── format_info.json                 # Version + compression metadata
└── sourcedata/braindecode/
    ├── dataset_description.json     # BIDS-compliant
    ├── participants.tsv             # BIDS-compliant
    ├── dataset.zarr/                # blosc-compressed windowed data
    └── sub-<label>/eeg/
        ├── *_events.tsv
        ├── *_channels.tsv
        └── *_eeg.json
```

Users then load it with:

```python
from braindecode.datasets import BaseConcatDataset
ds = BaseConcatDataset.pull_from_hub("EEGDash/ds002718")
```

## 5. Verify

- Space renders: <https://huggingface.co/spaces/EEGDash/catalog>.
- Org page shows the Space card + dataset repos: <https://huggingface.co/EEGDash>.
- At least one dataset loadable end-to-end via `pull_from_hub`.

## Troubleshooting

| Symptom | Likely cause |
|---|---|
| `on 🤗` column empty for everything | Space has no outbound network, or rate-limited; the Space caches once per process so redeploy to retry. |
| `push_to_hub` fails with `ImportError` | `pip install braindecode[hub]` (pulls in `zarr` + `huggingface_hub`). |
| Repo exists but Space doesn't flag it | `HfApi().list_datasets(author="EEGDash", limit=500)` caps at 500 — raise the limit in `app.py::_hf_repos` if the org grows beyond that. |
| `dataset_summary.csv` out of sync | Re-run step 3's refresh or add the workflow stub. |