Spaces:

multimodalart
/

scenema-audio

Running on Zero

App Files Files Community

scenema-audio / README.md

multimodalart

Resolve model paths absolute; drop persistent-storage assumption

14a5337 10 days ago

preview code

raw

history blame contribute delete

1.07 kB

metadata

title: Scenema Audio
emoji: 🎙️
colorFrom: pink
colorTo: red
sdk: gradio
sdk_version: 6.14.0
python_version: '3.12'
app_file: app.py
pinned: false
hardware: zero-a10g
short_description: Zero-shot expressive voice cloning and speech generation

Scenema Audio (ZeroGPU)

Gradio wrapper around ScenemaAI/scenema-audio.

Zero-shot expressive voice cloning and speech generation with emotion, pacing, and breath control, built on an audio diffusion transformer extracted from LTX 2.3.

Cold start

First request downloads ~38 GB of model weights:

scenema-audio-transformer-int8.safetensors (~4.9 GB)
scenema-audio-pipeline.safetensors (~6.7 GB)
google/gemma-3-12b-it (~24 GB, gated — requires HF_TOKEN secret)
SeedVC + BigVGAN + Whisper checkpoints (~3 GB)
MelBandRoFormer (~436 MB)

Set HF_TOKEN in the Space secrets with access to google/gemma-3-12b-it.

License

Model weights: LTX-2 Community License Agreement
Code: MIT