catalog / DEPLOY.md
bruAristimunha's picture
Initial Space: searchable EEGDash catalog
e0a7464

A newer version of the Gradio SDK is available: 6.14.0

Upgrade

Deploying the EEGDash Space and datasets

One-time setup, per-push workflow, and how the dataset mirrors are kept in sync.

1. Create the org (one-time)

  1. Sign in at https://huggingface.co.
  2. Create org → handle EEGDash, display name EEG-DaSh, link https://eegdash.org and https://github.com/eegdash/EEGDash, upload docs/source/_static/eegdash_image_only.svg as the logo.
  3. Add maintainers.
  4. Generate a write access token (Settings → Access Tokens) and export it as HF_TOKEN locally and in CI.

2. Create the Space

huggingface-cli login            # paste the write token
huggingface-cli repo create \
    --type space --space_sdk gradio EEGDash/catalog

3. Push the Space

From the repo root:

cd huggingface-space

git init -b main
git remote add origin https://huggingface.co/spaces/EEGDash/catalog
git add README.md app.py requirements.txt dataset_summary.csv
git commit -m "Initial Space: searchable EEGDash catalog"
git push origin main

The Space will build and expose at https://huggingface.co/spaces/EEGDash/catalog.

Keeping the catalog fresh

dataset_summary.csv in this folder is a snapshot of eegdash/dataset/dataset_summary.csv. Refresh it whenever the source changes:

cp ../eegdash/dataset/dataset_summary.csv dataset_summary.csv
git add dataset_summary.csv
git commit -m "Refresh catalog snapshot"
git push

A GitHub Action that runs on pushes to develop can automate this — see the stub in .github/workflows/sync-hf-space.yml (add when ready).

4. Mirror datasets to EEGDash/<slug>

This is what powers the on 🤗 column. Push one or more datasets with the helper script at scripts/push_to_hf.py:

# Single dataset
python scripts/push_to_hf.py --dataset ds002718

# Batch, skipping anything already on the Hub, capped at 5 GB
python scripts/push_to_hf.py \
    --from-csv eegdash/dataset/dataset_summary.csv \
    --max-size-gb 5 \
    --skip-existing

Under the hood this calls EEGDashDataset(...).push_to_hub("EEGDash/<slug>"), which is the HubDatasetMixin braindecode inherits from. The resulting repo lays out:

EEGDash/<slug>/
├── README.md                        # Dataset card with load snippets
├── format_info.json                 # Version + compression metadata
└── sourcedata/braindecode/
    ├── dataset_description.json     # BIDS-compliant
    ├── participants.tsv             # BIDS-compliant
    ├── dataset.zarr/                # blosc-compressed windowed data
    └── sub-<label>/eeg/
        ├── *_events.tsv
        ├── *_channels.tsv
        └── *_eeg.json

Users then load it with:

from braindecode.datasets import BaseConcatDataset
ds = BaseConcatDataset.pull_from_hub("EEGDash/ds002718")

5. Verify

Troubleshooting

Symptom Likely cause
on 🤗 column empty for everything Space has no outbound network, or rate-limited; the Space caches once per process so redeploy to retry.
push_to_hub fails with ImportError pip install braindecode[hub] (pulls in zarr + huggingface_hub).
Repo exists but Space doesn't flag it HfApi().list_datasets(author="EEGDash", limit=500) caps at 500 — raise the limit in app.py::_hf_repos if the org grows beyond that.
dataset_summary.csv out of sync Re-run step 3's refresh or add the workflow stub.