--- title: Scenema Audio emoji: 🎙️ colorFrom: pink colorTo: red sdk: gradio sdk_version: 6.14.0 python_version: '3.12' app_file: app.py pinned: false hardware: zero-a10g short_description: Zero-shot expressive voice cloning and speech generation --- # Scenema Audio (ZeroGPU) Gradio wrapper around [ScenemaAI/scenema-audio](https://github.com/ScenemaAI/scenema-audio). Zero-shot expressive voice cloning and speech generation with emotion, pacing, and breath control, built on an audio diffusion transformer extracted from [LTX 2.3](https://github.com/Lightricks/LTX-2). ## Cold start First request downloads ~38 GB of model weights: - `scenema-audio-transformer-int8.safetensors` (~4.9 GB) - `scenema-audio-pipeline.safetensors` (~6.7 GB) - `google/gemma-3-12b-it` (~24 GB, **gated** — requires `HF_TOKEN` secret) - SeedVC + BigVGAN + Whisper checkpoints (~3 GB) - MelBandRoFormer (~436 MB) Set `HF_TOKEN` in the Space secrets with access to `google/gemma-3-12b-it`. ## License - **Model weights:** LTX-2 Community License Agreement - **Code:** MIT