File size: 1,065 Bytes
c327e46
 
cdc4405
c327e46
 
 
 
cdc4405
c327e46
 
cdc4405
 
c327e46
 
cdc4405
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
---
title: Scenema Audio
emoji: 🎙️
colorFrom: pink
colorTo: red
sdk: gradio
sdk_version: 6.14.0
python_version: '3.12'
app_file: app.py
pinned: false
hardware: zero-a10g
short_description: Zero-shot expressive voice cloning and speech generation
---

# Scenema Audio (ZeroGPU)

Gradio wrapper around [ScenemaAI/scenema-audio](https://github.com/ScenemaAI/scenema-audio).

Zero-shot expressive voice cloning and speech generation with emotion, pacing,
and breath control, built on an audio diffusion transformer extracted from
[LTX 2.3](https://github.com/Lightricks/LTX-2).

## Cold start

First request downloads ~38 GB of model weights:
- `scenema-audio-transformer-int8.safetensors` (~4.9 GB)
- `scenema-audio-pipeline.safetensors` (~6.7 GB)
- `google/gemma-3-12b-it` (~24 GB, **gated** — requires `HF_TOKEN` secret)
- SeedVC + BigVGAN + Whisper checkpoints (~3 GB)
- MelBandRoFormer (~436 MB)

Set `HF_TOKEN` in the Space secrets with access to `google/gemma-3-12b-it`.

## License

- **Model weights:** LTX-2 Community License Agreement
- **Code:** MIT