Kuroki Tomoko qwen3-tts-1.7b finetune.
A single-voice English text-to-speech model trained against Qwen3-TTS-12Hz-1.7B-base and about four minutes of the English dub voice from Watamote's Kuroki Tomoko character. I trained to epoch 40 but found that epoch 20 was the best at capturing the Tomoko nuances. so this model is epoch 20.
install:
git clone https://github.com/andimarafioti/faster-qwen3-tts.git
cd faster-qwen3-tts
# make sure you have uv installed
# make sure you have sox installed
uv venv --python 3.12
source .venv/bin/activate
uv pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu130
uv pip install faster-qwen3-tts
test:
python examples/openai_server.py --ref-text "$(cat tomoko88.txt)" --ref-audio tomoko88.wav --language English --port 8880
build docker container:
# use the Dockerfile provided here
docker build -t faster-qwen3-tts:latest .
run docker container:
# use the docker-compose.yml provided here
docker compose up -d
- Downloads last month
- 34
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support