Spaces:

EEGDash
/

catalog

Running

App Files Files Community

catalog / DEPLOY.md

bruAristimunha

Initial Space: searchable EEGDash catalog

e0a7464 about 1 month ago

preview code

raw

history blame contribute delete

3.67 kB

A newer version of the Gradio SDK is available: 6.14.0

Upgrade

Deploying the EEGDash Space and datasets

One-time setup, per-push workflow, and how the dataset mirrors are kept in sync.

1. Create the org (one-time)

Sign in at https://huggingface.co.
Create org → handle EEGDash, display name EEG-DaSh, link https://eegdash.org and https://github.com/eegdash/EEGDash, upload docs/source/_static/eegdash_image_only.svg as the logo.
Add maintainers.
Generate a write access token (Settings → Access Tokens) and export it as HF_TOKEN locally and in CI.

2. Create the Space

huggingface-cli login            # paste the write token
huggingface-cli repo create \
    --type space --space_sdk gradio EEGDash/catalog

3. Push the Space

From the repo root:

cd huggingface-space

git init -b main
git remote add origin https://huggingface.co/spaces/EEGDash/catalog
git add README.md app.py requirements.txt dataset_summary.csv
git commit -m "Initial Space: searchable EEGDash catalog"
git push origin main

The Space will build and expose at https://huggingface.co/spaces/EEGDash/catalog.

Keeping the catalog fresh

dataset_summary.csv in this folder is a snapshot of eegdash/dataset/dataset_summary.csv. Refresh it whenever the source changes:

cp ../eegdash/dataset/dataset_summary.csv dataset_summary.csv
git add dataset_summary.csv
git commit -m "Refresh catalog snapshot"
git push

A GitHub Action that runs on pushes to develop can automate this — see the stub in .github/workflows/sync-hf-space.yml (add when ready).

4. Mirror datasets to `EEGDash/<slug>`

This is what powers the on 🤗 column. Push one or more datasets with the helper script at scripts/push_to_hf.py:

# Single dataset
python scripts/push_to_hf.py --dataset ds002718

# Batch, skipping anything already on the Hub, capped at 5 GB
python scripts/push_to_hf.py \
    --from-csv eegdash/dataset/dataset_summary.csv \
    --max-size-gb 5 \
    --skip-existing

Under the hood this calls EEGDashDataset(...).push_to_hub("EEGDash/<slug>"), which is the HubDatasetMixin braindecode inherits from. The resulting repo lays out:

EEGDash/<slug>/
├── README.md                        # Dataset card with load snippets
├── format_info.json                 # Version + compression metadata
└── sourcedata/braindecode/
    ├── dataset_description.json     # BIDS-compliant
    ├── participants.tsv             # BIDS-compliant
    ├── dataset.zarr/                # blosc-compressed windowed data
    └── sub-<label>/eeg/
        ├── *_events.tsv
        ├── *_channels.tsv
        └── *_eeg.json

Users then load it with:

from braindecode.datasets import BaseConcatDataset
ds = BaseConcatDataset.pull_from_hub("EEGDash/ds002718")

5. Verify

Space renders: https://huggingface.co/spaces/EEGDash/catalog.
Org page shows the Space card + dataset repos: https://huggingface.co/EEGDash.
At least one dataset loadable end-to-end via pull_from_hub.

Troubleshooting

Symptom	Likely cause
`on 🤗` column empty for everything	Space has no outbound network, or rate-limited; the Space caches once per process so redeploy to retry.
`push_to_hub` fails with `ImportError`	`pip install braindecode[hub]` (pulls in `zarr` + `huggingface_hub`).
Repo exists but Space doesn't flag it	`HfApi().list_datasets(author="EEGDash", limit=500)` caps at 500 — raise the limit in `app.py::_hf_repos` if the org grows beyond that.
`dataset_summary.csv` out of sync	Re-run step 3's refresh or add the workflow stub.