Spaces:
Running on Zero
Running on Zero
| title: VideoVoice Dramabox | |
| emoji: π | |
| colorFrom: red | |
| colorTo: indigo | |
| sdk: gradio | |
| sdk_version: 5.7.1 | |
| app_file: app.py | |
| python_version: "3.10" | |
| pinned: true | |
| short_description: Resemble Dramabox β directable speech for VideoVoice | |
| <!-- | |
| ZeroGPU is enabled from the Space Settings UI (not via frontmatter). | |
| This Space serves Resemble's Dramabox "directable speech engine" via | |
| POST /api/tools/dramabox. The dub pipeline is reachable but rejects | |
| voice_mode != "dramabox" (server.py), and the frontend never routes | |
| dub requests here. | |
| IMPORTANT β sdk_version is pinned to 5.7.1 to match the upstream | |
| ResembleAI/Dramabox Space. Reasons: | |
| - gradio 6.x bundles pydantic >= 2.11 | |
| - pydantic 2.11+ emits bool-shorthand `additionalProperties: True` | |
| that crashes gradio_client schema parsing | |
| - Dramabox needs pydantic 2.10.6 (per upstream requirements.txt) | |
| - That pydantic is incompatible with gradio 6.x | |
| Bumping this to match the other Spaces (6.12.0) breaks the build. | |
| --> | |
| # VideoVoice β Dramabox | |
| Resemble AI's directable speech engine, mounted as a VideoVoice tool tab. | |
| **Endpoint:** `POST /api/tools/dramabox` | |
| **Frontend:** [/app/dramabox](https://videovoice.app/app/dramabox) | |
| ## What's different from the other Spaces | |
| This Space is a **tools-only** Space: | |
| - The `/api/tools/dramabox` endpoint runs Resemble Dramabox against a scene prompt | |
| (quoted dialogue + stage directions outside quotes). | |
| - Other pipeline endpoints (dub, voice-clone, subtitles, audio-cleanup) are | |
| defensively reachable but the frontend never routes traffic here for them. | |
| ## Prompt grammar | |
| ``` | |
| <speaker description>, "<dialogue>" <action> "<more dialogue>" | |
| ``` | |
| - Inside quotes is **spoken**: `"Hello, how are you?"`, phonetics like `"Hahaha"`, `"Mmmmm"`. | |
| - Outside quotes is a **stage direction**: `She sighs deeply.`, `He clears his throat.` | |
| - Avoid writing onomatopoeia (`Sigh`, `Ahem`, `Gasp`) inside quotes β the model will | |
| speak them literally. | |
| ## Setup notes | |
| Required Space Secrets: | |
| - `TTS_ENGINE=dramabox` | |
| - `HF_TOKEN` (same as the other VideoVoice Spaces β for model downloads) | |
| - `LTX_DTYPE=bf16` (optional, matches upstream default) | |
| Required vendored source (committed to the BE repo, deployed via [deploy.sh](https://github.com/Video-Voice/VideoVoice-be/blob/main/deploy.sh)): | |
| - `dramabox_src/` β copy of [ResembleAI/Dramabox `src/`](https://huggingface.co/spaces/ResembleAI/Dramabox/tree/main/src). The `tools_api/dramabox.py` worker adds this to `sys.path` lazily on first request. | |
| ## Acknowledgements | |
| Built on [Resemble AI's Dramabox](https://huggingface.co/spaces/ResembleAI/Dramabox). All generated audio is invisibly watermarked with Resemble PerTh. | |