videovoice-dramabox / README.md
github-actions[bot]
deploy: switch to dramabox requirements @ 93b2b51
ee10bf8
---
title: VideoVoice Dramabox
emoji: 🎭
colorFrom: red
colorTo: indigo
sdk: gradio
sdk_version: 5.7.1
app_file: app.py
python_version: "3.10"
pinned: true
short_description: Resemble Dramabox β€” directable speech for VideoVoice
---
<!--
ZeroGPU is enabled from the Space Settings UI (not via frontmatter).
This Space serves Resemble's Dramabox "directable speech engine" via
POST /api/tools/dramabox. The dub pipeline is reachable but rejects
voice_mode != "dramabox" (server.py), and the frontend never routes
dub requests here.
IMPORTANT β€” sdk_version is pinned to 5.7.1 to match the upstream
ResembleAI/Dramabox Space. Reasons:
- gradio 6.x bundles pydantic >= 2.11
- pydantic 2.11+ emits bool-shorthand `additionalProperties: True`
that crashes gradio_client schema parsing
- Dramabox needs pydantic 2.10.6 (per upstream requirements.txt)
- That pydantic is incompatible with gradio 6.x
Bumping this to match the other Spaces (6.12.0) breaks the build.
-->
# VideoVoice β€” Dramabox
Resemble AI's directable speech engine, mounted as a VideoVoice tool tab.
**Endpoint:** `POST /api/tools/dramabox`
**Frontend:** [/app/dramabox](https://videovoice.app/app/dramabox)
## What's different from the other Spaces
This Space is a **tools-only** Space:
- The `/api/tools/dramabox` endpoint runs Resemble Dramabox against a scene prompt
(quoted dialogue + stage directions outside quotes).
- Other pipeline endpoints (dub, voice-clone, subtitles, audio-cleanup) are
defensively reachable but the frontend never routes traffic here for them.
## Prompt grammar
```
<speaker description>, "<dialogue>" <action> "<more dialogue>"
```
- Inside quotes is **spoken**: `"Hello, how are you?"`, phonetics like `"Hahaha"`, `"Mmmmm"`.
- Outside quotes is a **stage direction**: `She sighs deeply.`, `He clears his throat.`
- Avoid writing onomatopoeia (`Sigh`, `Ahem`, `Gasp`) inside quotes β€” the model will
speak them literally.
## Setup notes
Required Space Secrets:
- `TTS_ENGINE=dramabox`
- `HF_TOKEN` (same as the other VideoVoice Spaces β€” for model downloads)
- `LTX_DTYPE=bf16` (optional, matches upstream default)
Required vendored source (committed to the BE repo, deployed via [deploy.sh](https://github.com/Video-Voice/VideoVoice-be/blob/main/deploy.sh)):
- `dramabox_src/` β€” copy of [ResembleAI/Dramabox `src/`](https://huggingface.co/spaces/ResembleAI/Dramabox/tree/main/src). The `tools_api/dramabox.py` worker adds this to `sys.path` lazily on first request.
## Acknowledgements
Built on [Resemble AI's Dramabox](https://huggingface.co/spaces/ResembleAI/Dramabox). All generated audio is invisibly watermarked with Resemble PerTh.