Instructions to use Supertone/supertonic-3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Supertonic
How to use Supertone/supertonic-3 with Supertonic:
from supertonic import TTS tts = TTS(auto_download=True) style = tts.get_voice_style(voice_name="M1") text = "The train delay was announced at 4:45 PM on Wed, Apr 3, 2024 due to track maintenance." wav, duration = tts.synthesize(text, voice_style=style) tts.save_audio(wav, "output.wav")
- Notebooks
- Google Colab
- Kaggle
File size: 9,189 Bytes
724fb5a feb60dc f6ad992 8892ea1 3cadd1e 8892ea1 3cadd1e 8892ea1 3cadd1e 8892ea1 3cadd1e 8892ea1 3cadd1e 8892ea1 3cadd1e f6ad992 724fb5a | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 | ---
license: openrail
language:
- en
- ko
- ja
- ar
- bg
- cs
- da
- de
- el
- es
- et
- fi
- fr
- hi
- hr
- hu
- id
- it
- lt
- lv
- nl
- pl
- pt
- ro
- ru
- sk
- sl
- sv
- tr
- uk
- vi
pipeline_tag: text-to-speech
tags:
- text-to-speech
- speech-synthesis
- tts
- onnx
- multilingual
- on-device
library_name: supertonic
---
# Supertonic 3 | Lightning Fast, On-Device, Accurate TTS

<p align="center">
<a href="https://huggingface.co/spaces/Supertone/supertonic-3"><img src="https://img.shields.io/badge/Demo-Hugging_Face-yellow?style=for-the-badge" alt="Demo"></a>
<a href="https://github.com/supertone-inc/supertonic"><img src="https://img.shields.io/badge/Code-GitHub-black?style=for-the-badge&logo=github" alt="Code"></a>
<a href="https://pypi.org/project/supertonic/"><img src="https://img.shields.io/badge/Python-SDK-blue?style=for-the-badge&logo=python" alt="Python SDK"></a>
</p>
**Supertonic** is a lightweight text-to-speech system for local inference. It runs with ONNX Runtime entirely on your device, with no cloud call required for synthesis.
**Supertonic 3** expands the open-weight release from 5 to **31 languages**, improves reading stability, and reduces repeat/skip failures.
## Quick Start
Install the Python SDK and generate speech immediately. On first run, the SDK downloads the model assets from Hugging Face.
```bash
pip install supertonic
```
```python
from supertonic import TTS
tts = TTS(auto_download=True)
style = tts.get_voice_style(voice_name="M1")
text = "A gentle breeze moved through the open window while everyone listened to the story."
wav, duration = tts.synthesize(text, voice_style=style, lang="en")
tts.save_audio(wav, "output.wav")
print(f"Generated {duration:.2f}s of audio")
```
## What's New in Supertonic 3
- **31 languages**: expanded from the 5-language Supertonic 2 release.
- **More stable reading**: fewer repeat and skip failures, especially on short and long utterances.
- **Higher speaker similarity**: improved similarity across the shared-language set compared with Supertonic 2.
- **Expression tags**: supports simple tags such as `<laugh>`, `<breath>`, and `<sigh>`.
## Custom Voices and Audio Samples
The open-weight package includes fixed preset voice styles for immediate local inference. If you want to hear how Supertonic 3 performs with zero-shot custom voice styles, visit the [Audio Sample Demo](https://supertonic3.github.io/) to compare reference audio and generated speech across several use cases. To create your own Supertonic 3 voice-style JSON from reference audio, use [Supertonic Voice Builder](https://supertonic.supertone.ai/voice-builder); purchased Voice Builder styles include downloadable embeddings for both Supertonic 2 and Supertonic 3.
Here are a few reference/generated pairs from the audio sample demo:
**Call center, English**
Text: Good morning, thank you for calling. How can I help you today?
| Reference voice | Supertonic 3 output |
|---|---|
| <audio controls preload="metadata" src="https://huggingface.co/Supertone/supertonic-3/resolve/main/audio_samples/nora_reference.wav"></audio> | <audio controls preload="metadata" src="https://huggingface.co/Supertone/supertonic-3/resolve/main/audio_samples/nora_supertonic3.wav"></audio> |
**Character voice, Japanese**
Text: ใตใตใฃใ้ๅฑใใฆใใจใใใชใฎใใกใใใฉใใ้ใณ็ธๆใ่ฆใคใใใใโช
| Reference voice | Supertonic 3 output |
|---|---|
| <audio controls preload="metadata" src="https://huggingface.co/Supertone/supertonic-3/resolve/main/audio_samples/moka_reference.wav"></audio> | <audio controls preload="metadata" src="https://huggingface.co/Supertone/supertonic-3/resolve/main/audio_samples/moka_supertonic3.wav"></audio> |
**Elder character voice, Korean**
Text: ํผ์ ๋ ๋๊ธฐ์ ๊ธธ์ด ํํ๊ตฌ๋. ์ด ๋ก์ ๊ฒ์ ๊ฐ์ ธ๊ฐ๊ฑฐ๋ผ. ์ธ์ ๊ฐ ์ด๋ ์ด ๋ค ์ด๋ฆ์ ๋ถ๋ฅด๋๋ผ๋, ๋ถ๋ ๋น์ ์์ง ๋ง๊ฑฐ๋ผ.
| Reference voice | Supertonic 3 output |
|---|---|
| <audio controls preload="metadata" src="https://huggingface.co/Supertone/supertonic-3/resolve/main/audio_samples/alphonse_reference.wav"></audio> | <audio controls preload="metadata" src="https://huggingface.co/Supertone/supertonic-3/resolve/main/audio_samples/alphonse_supertonic3.wav"></audio> |
**Audiobook, English**
Text: I was not afraid of silence. I had lived with it long enough to know that, sometimes, it speaks more honestly than people do.
| Reference voice | Supertonic 3 output |
|---|---|
| <audio controls preload="metadata" src="https://huggingface.co/Supertone/supertonic-3/resolve/main/audio_samples/luna_reference.wav"></audio> | <audio controls preload="metadata" src="https://huggingface.co/Supertone/supertonic-3/resolve/main/audio_samples/luna_supertonic3.wav"></audio> |
**Audiobook, Japanese**
Text: ใใฎๆใใญใณใใณใฎ้งใฏใใคใซใชใไฝใๅใใใใฆใใใ็งใฏใใ ใฎ่จชๅ่
ใ ใจๆใฃใฆใใใใใใผใ ใบใฎ็ฎใฏใใงใซๅฅใฎ็ต่ซใซใใฉใ็ใใฆใใใ
| Reference voice | Supertonic 3 output |
|---|---|
| <audio controls preload="metadata" src="https://huggingface.co/Supertone/supertonic-3/resolve/main/audio_samples/watson_reference.wav"></audio> | <audio controls preload="metadata" src="https://huggingface.co/Supertone/supertonic-3/resolve/main/audio_samples/watson_supertonic3.wav"></audio> |
**News, English**
Text: Hereโs a story worth paying attention to. Supertone has released Supertonic 3, its on-device TTS model. This version expands support to thirty-one languages and improves reading stability.
| Reference voice | Supertonic 3 output |
|---|---|
| <audio controls preload="metadata" src="https://huggingface.co/Supertone/supertonic-3/resolve/main/audio_samples/keld_reference.wav"></audio> | <audio controls preload="metadata" src="https://huggingface.co/Supertone/supertonic-3/resolve/main/audio_samples/keld_supertonic3.wav"></audio> |
## Performance Highlights
Supertonic 3 is designed for practical on-device inference: compact enough to run locally, while staying competitive with much larger open TTS systems.
### Reading Accuracy
<p align="center">
<img src="img/metrics/s3_vs_measured_wer_range_voxcpm2.png" alt="Supertonic 3 reading accuracy compared with measured model ranges and VoxCPM2">
</p>
Across measured languages, Supertonic 3 stays within a competitive WER/CER range against much larger open TTS models such as VoxCPM2, while preserving a lightweight on-device deployment path. Asterisked languages use CER; the others use WER.
### Supertonic 2 to Supertonic 3
<p align="center">
<img src="img/metrics/supertonic2_vs_3_comparison.png" alt="Supertonic 2 and Supertonic 3 comparison">
</p>
Compared with Supertonic 2, Supertonic 3 reduces repeat and skip failures, improves speaker similarity across the shared-language set, and expands language coverage from 5 to 31 languages.
### Runtime Footprint
<p align="center">
<img src="img/metrics/runtime_cpu_gpu_latency_memory.png" alt="Supertonic CPU runtime compared with GPU baselines">
</p>
Supertonic 3 runs fast on CPU, even compared with larger baselines measured on A100 GPU, and uses substantially less memory. It does not require a GPU, which makes local, browser, and edge deployment much easier.
### Model Size
<p align="center">
<img src="img/metrics/model_size_comparison.png" alt="Model size comparison">
</p>
At about 99M parameters across the public ONNX assets, Supertonic 3 is much smaller than 0.7B to 2B class open TTS systems. The smaller model size is a practical advantage for download size, startup time, and on-device inference.
## Supported Languages
| Code | Language | Code | Language | Code | Language | Code | Language |
|------|----------|------|----------|------|----------|------|----------|
| `en` | English | `ko` | Korean | `ja` | Japanese | `ar` | Arabic |
| `bg` | Bulgarian | `cs` | Czech | `da` | Danish | `de` | German |
| `el` | Greek | `es` | Spanish | `et` | Estonian | `fi` | Finnish |
| `fr` | French | `hi` | Hindi | `hr` | Croatian | `hu` | Hungarian |
| `id` | Indonesian | `it` | Italian | `lt` | Lithuanian | `lv` | Latvian |
| `nl` | Dutch | `pl` | Polish | `pt` | Portuguese | `ro` | Romanian |
| `ru` | Russian | `sk` | Slovak | `sl` | Slovenian | `sv` | Swedish |
| `tr` | Turkish | `uk` | Ukrainian | `vi` | Vietnamese | | |
## License
This project's sample code is released under the MIT License. See the [GitHub repository](https://github.com/supertone-inc/supertonic) for details.
The accompanying model is released under the OpenRAIL-M License. See the [LICENSE](https://huggingface.co/Supertone/supertonic-3/blob/main/LICENSE) file in this repository for details.
This model was trained using PyTorch, which is licensed under the BSD 3-Clause License but is not redistributed with this project. See the [PyTorch license](https://docs.pytorch.org/FBGEMM/general/License.html) for details.
Copyright (c) 2026 Supertone Inc.
|