Spaces:

owenisas
/

stable-audio-3-lab

Running on Zero

App Files Files Community

stable-audio-3-lab / README.md

owenisas

Clean up optimization status metadata

b493d6c verified 2 days ago

preview code

raw

history blame contribute delete

2.87 kB

	---
	title: Stable Audio 3 Lab
	colorFrom: blue
	colorTo: indigo
	sdk: gradio
	sdk_version: 6.3.0
	app_file: app.py
	python_version: "3.10"
	suggested_hardware: zero-a10g
	pinned: false
	license: mit
	hf_oauth: true
	hf_oauth_scopes:
	- gated-repos
	---

	# Stable Audio 3 Lab

	Gradio Space for testing Stability AI's Stable Audio 3 collections:

	- Standard collection: `stabilityai/stable-audio-3-small-music`, `stabilityai/stable-audio-3-small-sfx`, `stabilityai/stable-audio-3-medium`
	- Extra collection generation checkpoints: `small-music-base`, `small-sfx-base`, `medium-base`
	- Extra collection autoencoders: `SAME-S`, `SAME-L`

	The optimized repo (`stabilityai/stable-audio-3-optimized`) currently ships MLX and TensorRT assets rather than a generic `model_config.json` + `model.safetensors` checkpoint. This Space lists it in Coverage, but does not run it through the PyTorch `stable_audio_3` path.

	## Access

	This Space requires Hugging Face authentication. Users can either sign in with
	Hugging Face OAuth or paste a Hugging Face access token into the password field.
	The pasted token is used only for that request path and is not returned in run
	metadata.

	The post-trained Stable Audio 3 checkpoints are gated on Hugging Face, so each
	user must:

	1. Sign in with Hugging Face.
	2. Or use a read token from their own Hugging Face account.
	3. Accept the terms on each gated model page from that account.

	Base checkpoints are not gated, but they are intended mainly for fine-tuning and may not sound as polished.

	## Hardware

	- ZeroGPU is enabled through the `spaces.GPU` decorator on generation and autoencoder actions.
	- Small models can run on CPU, but GPU is still preferred.
	- Medium and Medium Base are GPU-first.
	- `SAME-L` is GPU-first; `SAME-S` can be used for CPU autoencoder round trips.

	The Space is configured with `suggested_hardware: zero-a10g`.

	## Runtime note

	The upstream `stable-audio-3` Python package is vendored in this Space from
	Stability AI's public MIT-licensed repository because its package metadata pins
	Torch 2.7.1. ZeroGPU currently provides Torch 2.8.0, so installing the upstream
	package through normal dependency resolution would downgrade Torch and break the
	ZeroGPU runtime.

	## Optimization notes

	- Repeated runs with the same selected model reuse the loaded model inside the
	ZeroGPU worker when the worker stays warm. Run metadata includes `cache_hit`
	and `load_elapsed_s` so this is visible.
	- Successful gated-repo access checks are cached briefly inside the worker per
	token digest and repo ID to avoid a Hugging Face `HEAD` request on every
	generation.
	- The `stable-audio-3-optimized` repo currently provides MLX, ONNX, and
	TensorRT assets. This Space keeps the portable PyTorch path because the
	TensorRT engines are prebuilt for `sm_90`, while the current ZeroGPU host is
	a Blackwell GPU, and MLX is Apple-only.