Spaces:
Running on Zero
Running on Zero
metadata
title: Scenema Audio
emoji: 🎙️
colorFrom: pink
colorTo: red
sdk: gradio
sdk_version: 6.14.0
python_version: '3.12'
app_file: app.py
pinned: false
hardware: zero-a10g
short_description: Zero-shot expressive voice cloning and speech generation
Scenema Audio (ZeroGPU)
Gradio wrapper around ScenemaAI/scenema-audio.
Zero-shot expressive voice cloning and speech generation with emotion, pacing, and breath control, built on an audio diffusion transformer extracted from LTX 2.3.
Cold start
First request downloads ~38 GB of model weights:
scenema-audio-transformer-int8.safetensors(~4.9 GB)scenema-audio-pipeline.safetensors(~6.7 GB)google/gemma-3-12b-it(~24 GB, gated — requiresHF_TOKENsecret)- SeedVC + BigVGAN + Whisper checkpoints (~3 GB)
- MelBandRoFormer (~436 MB)
Set HF_TOKEN in the Space secrets with access to google/gemma-3-12b-it.
License
- Model weights: LTX-2 Community License Agreement
- Code: MIT