File size: 3,396 Bytes
f777900
 
 
 
 
 
 
 
 
 
 
 
c3ddb06
 
b35f655
 
 
c3ddb06
 
1d7b5cd
 
c3ddb06
 
 
 
 
 
 
79f58fb
 
 
 
c3ddb06
 
 
 
79f58fb
 
 
 
 
c3ddb06
 
 
 
 
 
 
 
1d7b5cd
 
 
c3ddb06
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
---
title: Chatterbox Voice Studio
emoji: "πŸŽ™"
colorFrom: yellow
colorTo: red
sdk: docker
app_port: 7860
pinned: false
license: mit
short_description: Voice cloning studio for the Chatterbox TTS family.
---

# Chatterbox Voice Studio

[![Open in Spaces](https://huggingface.co/datasets/huggingface/badges/resolve/main/open-in-hf-spaces-md.svg)](https://huggingface.co/spaces/techfreakworm/chatterbox-voice-studio)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](./LICENSE)

A multi-platform browser-based voice cloning studio for the Chatterbox TTS family
(English, Turbo, Multilingual). Runs locally on macOS (MPS), Linux (CUDA/CPU),
and Windows (CUDA/CPU). Deploys to Hugging Face Spaces (Docker SDK, Free CPU
by default; paid GPU tiers supported).

## Quick start (local)

### macOS / Linux

    ./scripts/start.sh

Prereqs: Python 3.11+ and Node.js 20+. If missing, the script will tell you
the one-line install command for your platform (`brew install python@3.11`
on macOS, `apt install python3.11 python3.11-venv` on Debian/Ubuntu).

### Windows

    scripts\start.bat

If Python 3.11 or Node.js LTS isn't installed, the script will detect that
and offer to install them via `winget` (built into Windows 10 1809+ and
Windows 11). Accept the prompt and re-run `scripts\start.bat` after install
finishes so the new PATH takes effect.

The script creates a venv, installs Python and Node deps, builds the SPA,
and opens the studio at http://127.0.0.1:7860.

## Hugging Face Spaces

This repo's `Dockerfile` is what HF Spaces uses to build the image.
On Free CPU it runs as-is β€” generation will be slow (30–90s per clip).

To get GPU on Spaces, switch the Space hardware to a paid tier (T4 small,
A10G, L4). ZeroGPU is **not** available on Docker Spaces β€” it's currently
restricted to the Gradio SDK only.

## Environment variables

| Var | Default | Purpose |
|---|---|---|
| `CHATTERBOX_DEVICE` | (auto) | Force `cuda` / `mps` / `cpu`. |
| `HF_HOME` | `/tmp/hf` | Hugging Face cache. |
| `CORS_ORIGINS` | `http://localhost:5173,...` | Comma list of allowed CORS origins. |
| `PYTORCH_ENABLE_MPS_FALLBACK` | `1` (mac) | CPU fallback for unimplemented MPS ops. |

## Models

| ID | Source | Languages | Tags |
|---|---|---|---|
| `chatterbox-en` | `chatterbox.tts.ChatterboxTTS` | English | β€” |
| `chatterbox-turbo` | `chatterbox.tts_turbo.ChatterboxTurboTTS` | English | `[laugh]` `[cough]` `[chuckle]` |
| `chatterbox-mtl` | `chatterbox.mtl_tts.ChatterboxMultilingualTTS` | 23 langs | (TBD) |

## Development

Backend tests:

    .venv/bin/pytest

Frontend tests:

    cd web && npm run test

Frontend dev server (with API proxy):

    cd web && npm run dev    # http://localhost:5173

## Smoke test

With the server running:

    scripts/smoke.sh

## Architecture

- **Backend:** FastAPI + uvicorn. Three Chatterbox model adapters behind a
  swap-on-demand registry. Server is **stateless**; nothing user-visible
  persists across server restarts.
- **Frontend:** React + Vite + Tailwind + shadcn/ui. Voice library and
  generation history live in the browser via IndexedDB (Dexie).
- **One-click:** `scripts/start.sh` (mac/linux) or `scripts/start.bat`
  (windows) handles venv, install, build, serve, and opens Chrome.
- **HF Spaces:** Dockerfile multi-stage build β€” Node stage builds the SPA,
  Python stage runs uvicorn with the bundled static files.