Spaces:
Running on Zero
Running on Zero
| title: Scenema Audio | |
| emoji: 🎙️ | |
| colorFrom: pink | |
| colorTo: red | |
| sdk: gradio | |
| sdk_version: 6.14.0 | |
| python_version: '3.12' | |
| app_file: app.py | |
| pinned: false | |
| hardware: zero-a10g | |
| short_description: Zero-shot expressive voice cloning and speech generation | |
| # Scenema Audio (ZeroGPU) | |
| Gradio wrapper around [ScenemaAI/scenema-audio](https://github.com/ScenemaAI/scenema-audio). | |
| Zero-shot expressive voice cloning and speech generation with emotion, pacing, | |
| and breath control, built on an audio diffusion transformer extracted from | |
| [LTX 2.3](https://github.com/Lightricks/LTX-2). | |
| ## Cold start | |
| First request downloads ~38 GB of model weights: | |
| - `scenema-audio-transformer-int8.safetensors` (~4.9 GB) | |
| - `scenema-audio-pipeline.safetensors` (~6.7 GB) | |
| - `google/gemma-3-12b-it` (~24 GB, **gated** — requires `HF_TOKEN` secret) | |
| - SeedVC + BigVGAN + Whisper checkpoints (~3 GB) | |
| - MelBandRoFormer (~436 MB) | |
| Set `HF_TOKEN` in the Space secrets with access to `google/gemma-3-12b-it`. | |
| ## License | |
| - **Model weights:** LTX-2 Community License Agreement | |
| - **Code:** MIT | |