Spaces:

Rafii
/

videovoice-dramabox

Running on Zero

App Files Files Community

videovoice-dramabox / README.md

github-actions[bot]

deploy: switch to dramabox requirements @ 93b2b51

ee10bf8 8 days ago

preview code

raw

history blame contribute delete

2.7 kB

	---
	title: VideoVoice Dramabox
	emoji: 🎭
	colorFrom: red
	colorTo: indigo
	sdk: gradio
	sdk_version: 5.7.1
	app_file: app.py
	python_version: "3.10"
	pinned: true
	short_description: Resemble Dramabox — directable speech for VideoVoice
	---

	<!--
	ZeroGPU is enabled from the Space Settings UI (not via frontmatter).
	This Space serves Resemble's Dramabox "directable speech engine" via
	POST /api/tools/dramabox. The dub pipeline is reachable but rejects
	voice_mode != "dramabox" (server.py), and the frontend never routes
	dub requests here.

	IMPORTANT — sdk_version is pinned to 5.7.1 to match the upstream
	ResembleAI/Dramabox Space. Reasons:
	- gradio 6.x bundles pydantic >= 2.11
	- pydantic 2.11+ emits bool-shorthand `additionalProperties: True`
	that crashes gradio_client schema parsing
	- Dramabox needs pydantic 2.10.6 (per upstream requirements.txt)
	- That pydantic is incompatible with gradio 6.x
	Bumping this to match the other Spaces (6.12.0) breaks the build.
	-->

	# VideoVoice — Dramabox

	Resemble AI's directable speech engine, mounted as a VideoVoice tool tab.

	Endpoint: `POST /api/tools/dramabox`
	Frontend: [/app/dramabox](https://videovoice.app/app/dramabox)

	## What's different from the other Spaces

	This Space is a tools-only Space:
	- The `/api/tools/dramabox` endpoint runs Resemble Dramabox against a scene prompt
	(quoted dialogue + stage directions outside quotes).
	- Other pipeline endpoints (dub, voice-clone, subtitles, audio-cleanup) are
	defensively reachable but the frontend never routes traffic here for them.

	## Prompt grammar

	```
	<speaker description>, "<dialogue>" <action> "<more dialogue>"
	```

	- Inside quotes is spoken: `"Hello, how are you?"`, phonetics like `"Hahaha"`, `"Mmmmm"`.
	- Outside quotes is a stage direction: `She sighs deeply.`, `He clears his throat.`
	- Avoid writing onomatopoeia (`Sigh`, `Ahem`, `Gasp`) inside quotes — the model will
	speak them literally.

	## Setup notes

	Required Space Secrets:
	- `TTS_ENGINE=dramabox`
	- `HF_TOKEN` (same as the other VideoVoice Spaces — for model downloads)
	- `LTX_DTYPE=bf16` (optional, matches upstream default)

	Required vendored source (committed to the BE repo, deployed via [deploy.sh](https://github.com/Video-Voice/VideoVoice-be/blob/main/deploy.sh)):
	- `dramabox_src/` — copy of [ResembleAI/Dramabox `src/`](https://huggingface.co/spaces/ResembleAI/Dramabox/tree/main/src). The `tools_api/dramabox.py` worker adds this to `sys.path` lazily on first request.

	## Acknowledgements

	Built on [Resemble AI's Dramabox](https://huggingface.co/spaces/ResembleAI/Dramabox). All generated audio is invisibly watermarked with Resemble PerTh.