Spaces:

multimodalart
/

scenema-audio

Running on Zero

multimodalart

Resolve model paths absolute; drop persistent-storage assumption

14a5337 10 days ago

1.07 kB

	---
	title: Scenema Audio
	emoji: 🎙️
	colorFrom: pink
	colorTo: red
	sdk: gradio
	sdk_version: 6.14.0
	python_version: '3.12'
	app_file: app.py
	pinned: false
	hardware: zero-a10g
	short_description: Zero-shot expressive voice cloning and speech generation
	---

	# Scenema Audio (ZeroGPU)

	Gradio wrapper around [ScenemaAI/scenema-audio](https://github.com/ScenemaAI/scenema-audio).

	Zero-shot expressive voice cloning and speech generation with emotion, pacing,
	and breath control, built on an audio diffusion transformer extracted from
	[LTX 2.3](https://github.com/Lightricks/LTX-2).

	## Cold start

	First request downloads ~38 GB of model weights:
	- `scenema-audio-transformer-int8.safetensors` (~4.9 GB)
	- `scenema-audio-pipeline.safetensors` (~6.7 GB)
	- `google/gemma-3-12b-it` (~24 GB, gated — requires `HF_TOKEN` secret)
	- SeedVC + BigVGAN + Whisper checkpoints (~3 GB)
	- MelBandRoFormer (~436 MB)

	Set `HF_TOKEN` in the Space secrets with access to `google/gemma-3-12b-it`.

	## License

	- Model weights: LTX-2 Community License Agreement
	- Code: MIT