Instructions to use Supertone/supertonic-3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Supertonic
How to use Supertone/supertonic-3 with Supertonic:
from supertonic import TTS tts = TTS(auto_download=True) style = tts.get_voice_style(voice_name="M1") text = "The train delay was announced at 4:45 PM on Wed, Apr 3, 2024 due to track maintenance." wav, duration = tts.synthesize(text, voice_style=style) tts.save_audio(wav, "output.wav")
- Notebooks
- Google Colab
- Kaggle
Add zero-shot audio samples
Browse files- .gitattributes +1 -0
- README.md +11 -0
- audio_samples/alphonse_reference.wav +3 -0
- audio_samples/alphonse_supertonic3.wav +3 -0
- audio_samples/keld_reference.wav +3 -0
- audio_samples/keld_supertonic3.wav +3 -0
- audio_samples/luna_reference.wav +3 -0
- audio_samples/luna_supertonic3.wav +3 -0
- audio_samples/moka_reference.wav +3 -0
- audio_samples/moka_supertonic3.wav +3 -0
- audio_samples/nora_reference.wav +3 -0
- audio_samples/nora_supertonic3.wav +3 -0
- audio_samples/watson_reference.wav +3 -0
- audio_samples/watson_supertonic3.wav +3 -0
.gitattributes
CHANGED
|
@@ -1,2 +1,3 @@
|
|
| 1 |
*.onnx filter=lfs diff=lfs merge=lfs -text
|
| 2 |
*.png filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
| 1 |
*.onnx filter=lfs diff=lfs merge=lfs -text
|
| 2 |
*.png filter=lfs diff=lfs merge=lfs -text
|
| 3 |
+
*.wav filter=lfs diff=lfs merge=lfs -text
|
README.md
CHANGED
|
@@ -89,6 +89,17 @@ print(f"Generated {duration:.2f}s of audio")
|
|
| 89 |
|
| 90 |
The open-weight package includes fixed preset voice styles for immediate local inference. If you want to hear how Supertonic 3 performs with zero-shot custom voice styles, visit the [Audio Sample Demo](https://supertonic3.github.io/) to compare reference audio and generated speech across several use cases. To create your own Supertonic 3 voice-style JSON from reference audio, use [Supertonic Voice Builder](https://supertonic.supertone.ai/voice-builder); purchased Voice Builder styles include downloadable embeddings for both Supertonic 2 and Supertonic 3.
|
| 91 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 92 |
## Performance Highlights
|
| 93 |
|
| 94 |
Supertonic 3 is designed for practical on-device inference: compact enough to run locally, while staying competitive with much larger open TTS systems.
|
|
|
|
| 89 |
|
| 90 |
The open-weight package includes fixed preset voice styles for immediate local inference. If you want to hear how Supertonic 3 performs with zero-shot custom voice styles, visit the [Audio Sample Demo](https://supertonic3.github.io/) to compare reference audio and generated speech across several use cases. To create your own Supertonic 3 voice-style JSON from reference audio, use [Supertonic Voice Builder](https://supertonic.supertone.ai/voice-builder); purchased Voice Builder styles include downloadable embeddings for both Supertonic 2 and Supertonic 3.
|
| 91 |
|
| 92 |
+
Here are a few reference/generated pairs from the audio sample demo:
|
| 93 |
+
|
| 94 |
+
| Use case | Text | Reference voice | Supertonic 3 output |
|
| 95 |
+
|----------|------|-----------------|---------------------|
|
| 96 |
+
| Call center, English | Good morning, thank you for calling. How can I help you today? | <audio controls preload="none" src="audio_samples/nora_reference.wav"></audio> | <audio controls preload="none" src="audio_samples/nora_supertonic3.wav"></audio> |
|
| 97 |
+
| Character voice, Japanese | ใตใตใฃใ้ๅฑใใฆใใจใใใชใฎใใกใใใฉใใ้ใณ็ธๆใ่ฆใคใใใใโช | <audio controls preload="none" src="audio_samples/moka_reference.wav"></audio> | <audio controls preload="none" src="audio_samples/moka_supertonic3.wav"></audio> |
|
| 98 |
+
| Elder character voice, Korean | ํผ์ ๋ ๋๊ธฐ์ ๊ธธ์ด ํํ๊ตฌ๋. ์ด ๋ก์ ๊ฒ์ ๊ฐ์ ธ๊ฐ๊ฑฐ๋ผ. ์ธ์ ๊ฐ ์ด๋ ์ด ๋ค ์ด๋ฆ์ ๋ถ๋ฅด๋๋ผ๋, ๋ถ๋ ๋น์ ์์ง ๋ง๊ฑฐ๋ผ. | <audio controls preload="none" src="audio_samples/alphonse_reference.wav"></audio> | <audio controls preload="none" src="audio_samples/alphonse_supertonic3.wav"></audio> |
|
| 99 |
+
| Audiobook, English | I was not afraid of silence. I had lived with it long enough to know that, sometimes, it speaks more honestly than people do. | <audio controls preload="none" src="audio_samples/luna_reference.wav"></audio> | <audio controls preload="none" src="audio_samples/luna_supertonic3.wav"></audio> |
|
| 100 |
+
| Audiobook, Japanese | ใใฎๆใใญใณใใณใฎ้งใฏใใคใซใชใไฝใๅใใใใฆใใใ็งใฏใใ ใฎ่จชๅ่
ใ ใจๆใฃใฆใใใใใใผใ ใบใฎ็ฎใฏใใงใซๅฅใฎ็ต่ซใซใใฉใ็ใใฆใใใ | <audio controls preload="none" src="audio_samples/watson_reference.wav"></audio> | <audio controls preload="none" src="audio_samples/watson_supertonic3.wav"></audio> |
|
| 101 |
+
| News, English | Hereโs a story worth paying attention to. Supertone has released Supertonic 3, its on-device TTS model. This version expands support to thirty-one languages and improves reading stability. | <audio controls preload="none" src="audio_samples/keld_reference.wav"></audio> | <audio controls preload="none" src="audio_samples/keld_supertonic3.wav"></audio> |
|
| 102 |
+
|
| 103 |
## Performance Highlights
|
| 104 |
|
| 105 |
Supertonic 3 is designed for practical on-device inference: compact enough to run locally, while staying competitive with much larger open TTS systems.
|
audio_samples/alphonse_reference.wav
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:09e613442124ee0cb31146431bbe0b3acd727569a55fc0ae4b4082b5de27c9f3
|
| 3 |
+
size 1152044
|
audio_samples/alphonse_supertonic3.wav
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:f3dacbfa5695d6416fb46a3ea72e202e16a467a4cb5ead9425073ef83596811c
|
| 3 |
+
size 970796
|
audio_samples/keld_reference.wav
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:8d7f3c2b15ee6ebafe88307658bf0787d3ca19f6347fd9e936e7bff786d1f70a
|
| 3 |
+
size 1152044
|
audio_samples/keld_supertonic3.wav
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:818c2b25bbf3e568ae8d4aeea5441605ed3e124ab28d746928972066902b35d9
|
| 3 |
+
size 1148972
|
audio_samples/luna_reference.wav
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:1fcec63c22ad76711ec9835a6ca21cb84227045e3aad808f841bcbec455a9470
|
| 3 |
+
size 1152044
|
audio_samples/luna_supertonic3.wav
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:34527bf7ed968ebc5024f985d48ccf06c5c4418fb2d7fc7e3fb7a873e367c273
|
| 3 |
+
size 712748
|
audio_samples/moka_reference.wav
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:ad50e1f13c18aa6af2e0f518714675c03ed98f0e85653a642527f56c2d181a56
|
| 3 |
+
size 480044
|
audio_samples/moka_supertonic3.wav
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:bec7d6a5261240935d8436f3ada17aa50774878e8e6ad1696d131e4ec4ecd5f3
|
| 3 |
+
size 479276
|
audio_samples/nora_reference.wav
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:a59576e7ffe84093439924f14d7070fb8174f8306c27049c00e63d747434a772
|
| 3 |
+
size 264644
|
audio_samples/nora_supertonic3.wav
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:6d11415a428949df69191361b9dc5f567aaf6eccd89c13013c2fbb381b353321
|
| 3 |
+
size 374828
|
audio_samples/watson_reference.wav
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:5449908651219298eab47a32a04b96291baceea00e772ea9cb16853a93127c30
|
| 3 |
+
size 2304044
|
audio_samples/watson_supertonic3.wav
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:e8ffb0051f00e3f6476694604a3d21f6e87b3570b9b73fc2fdb9560c9e6f1c67
|
| 3 |
+
size 1019948
|