Spaces:

EEGDash
/

catalog

Running

App Files Files Community

catalog / DEPLOY.md

bruAristimunha

Initial Space: searchable EEGDash catalog

e0a7464 about 1 month ago

preview code

raw

history blame contribute delete

3.67 kB

	# Deploying the EEGDash Space and datasets

	One-time setup, per-push workflow, and how the dataset mirrors are kept in sync.

	## 1. Create the org (one-time)

	1. Sign in at <https://huggingface.co>.
	2. Create org → handle `EEGDash`, display name EEG-DaSh, link
	`https://eegdash.org` and `https://github.com/eegdash/EEGDash`, upload
	`docs/source/_static/eegdash_image_only.svg` as the logo.
	3. Add maintainers.
	4. Generate a write access token (Settings → Access Tokens) and export it as
	`HF_TOKEN` locally and in CI.

	## 2. Create the Space

	```bash
	huggingface-cli login # paste the write token
	huggingface-cli repo create \
	--type space --space_sdk gradio EEGDash/catalog
	```

	## 3. Push the Space

	From the repo root:

	```bash
	cd huggingface-space

	git init -b main
	git remote add origin https://huggingface.co/spaces/EEGDash/catalog
	git add README.md app.py requirements.txt dataset_summary.csv
	git commit -m "Initial Space: searchable EEGDash catalog"
	git push origin main
	```

	The Space will build and expose at <https://huggingface.co/spaces/EEGDash/catalog>.

	### Keeping the catalog fresh

	`dataset_summary.csv` in this folder is a snapshot of
	`eegdash/dataset/dataset_summary.csv`. Refresh it whenever the source changes:

	```bash
	cp ../eegdash/dataset/dataset_summary.csv dataset_summary.csv
	git add dataset_summary.csv
	git commit -m "Refresh catalog snapshot"
	git push
	```

	A GitHub Action that runs on pushes to `develop` can automate this — see the
	stub in `.github/workflows/sync-hf-space.yml` (add when ready).

	## 4. Mirror datasets to `EEGDash/<slug>`

	This is what powers the `on 🤗` column. Push one or more datasets with the helper
	script at `scripts/push_to_hf.py`:

	```bash
	# Single dataset
	python scripts/push_to_hf.py --dataset ds002718

	# Batch, skipping anything already on the Hub, capped at 5 GB
	python scripts/push_to_hf.py \
	--from-csv eegdash/dataset/dataset_summary.csv \
	--max-size-gb 5 \
	--skip-existing
	```

	Under the hood this calls `EEGDashDataset(...).push_to_hub("EEGDash/<slug>")`,
	which is the `HubDatasetMixin` braindecode inherits from. The resulting repo
	lays out:

	```
	EEGDash/<slug>/
	├── README.md # Dataset card with load snippets
	├── format_info.json # Version + compression metadata
	└── sourcedata/braindecode/
	├── dataset_description.json # BIDS-compliant
	├── participants.tsv # BIDS-compliant
	├── dataset.zarr/ # blosc-compressed windowed data
	└── sub-<label>/eeg/
	├── *_events.tsv
	├── *_channels.tsv
	└── *_eeg.json
	```

	Users then load it with:

	```python
	from braindecode.datasets import BaseConcatDataset
	ds = BaseConcatDataset.pull_from_hub("EEGDash/ds002718")
	```

	## 5. Verify

	- Space renders: <https://huggingface.co/spaces/EEGDash/catalog>.
	- Org page shows the Space card + dataset repos: <https://huggingface.co/EEGDash>.
	- At least one dataset loadable end-to-end via `pull_from_hub`.

	## Troubleshooting

	\| Symptom \| Likely cause \|
	\|---\|---\|
	\| `on 🤗` column empty for everything \| Space has no outbound network, or rate-limited; the Space caches once per process so redeploy to retry. \|
	\| `push_to_hub` fails with `ImportError` \| `pip install braindecode[hub]` (pulls in `zarr` + `huggingface_hub`). \|
	\| Repo exists but Space doesn't flag it \| `HfApi().list_datasets(author="EEGDash", limit=500)` caps at 500 — raise the limit in `app.py::_hf_repos` if the org grows beyond that. \|
	\| `dataset_summary.csv` out of sync \| Re-run step 3's refresh or add the workflow stub. \|