Kuroki Tomoko qwen3-tts-1.7b finetune. A wild Tomoko appears! sound sample

A single-voice English text-to-speech model trained against Qwen3-TTS-12Hz-1.7B-base and about four minutes of the English dub voice from Watamote's Kuroki Tomoko character. I trained to epoch 40 but found that epoch 20 was the best at capturing the Tomoko nuances. so this model is epoch 20.

install:

git clone https://github.com/andimarafioti/faster-qwen3-tts.git
cd faster-qwen3-tts
# make sure you have uv installed
# make sure you have sox installed
uv venv --python 3.12
source .venv/bin/activate
uv pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu130
uv pip install faster-qwen3-tts

test:

python examples/openai_server.py     --ref-text "$(cat tomoko88.txt)"  --ref-audio tomoko88.wav      --language English --port 8880

build docker container:

# use the Dockerfile provided here
docker build -t faster-qwen3-tts:latest .

run docker container:

# use the docker-compose.yml provided here
docker compose up -d
Downloads last month
34
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support