juheon2 commited on
Commit
f6ad992
ยท
1 Parent(s): feb60dc

Add zero-shot audio samples

Browse files
.gitattributes CHANGED
@@ -1,2 +1,3 @@
1
  *.onnx filter=lfs diff=lfs merge=lfs -text
2
  *.png filter=lfs diff=lfs merge=lfs -text
 
 
1
  *.onnx filter=lfs diff=lfs merge=lfs -text
2
  *.png filter=lfs diff=lfs merge=lfs -text
3
+ *.wav filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -89,6 +89,17 @@ print(f"Generated {duration:.2f}s of audio")
89
 
90
  The open-weight package includes fixed preset voice styles for immediate local inference. If you want to hear how Supertonic 3 performs with zero-shot custom voice styles, visit the [Audio Sample Demo](https://supertonic3.github.io/) to compare reference audio and generated speech across several use cases. To create your own Supertonic 3 voice-style JSON from reference audio, use [Supertonic Voice Builder](https://supertonic.supertone.ai/voice-builder); purchased Voice Builder styles include downloadable embeddings for both Supertonic 2 and Supertonic 3.
91
 
 
 
 
 
 
 
 
 
 
 
 
92
  ## Performance Highlights
93
 
94
  Supertonic 3 is designed for practical on-device inference: compact enough to run locally, while staying competitive with much larger open TTS systems.
 
89
 
90
  The open-weight package includes fixed preset voice styles for immediate local inference. If you want to hear how Supertonic 3 performs with zero-shot custom voice styles, visit the [Audio Sample Demo](https://supertonic3.github.io/) to compare reference audio and generated speech across several use cases. To create your own Supertonic 3 voice-style JSON from reference audio, use [Supertonic Voice Builder](https://supertonic.supertone.ai/voice-builder); purchased Voice Builder styles include downloadable embeddings for both Supertonic 2 and Supertonic 3.
91
 
92
+ Here are a few reference/generated pairs from the audio sample demo:
93
+
94
+ | Use case | Text | Reference voice | Supertonic 3 output |
95
+ |----------|------|-----------------|---------------------|
96
+ | Call center, English | Good morning, thank you for calling. How can I help you today? | <audio controls preload="none" src="audio_samples/nora_reference.wav"></audio> | <audio controls preload="none" src="audio_samples/nora_supertonic3.wav"></audio> |
97
+ | Character voice, Japanese | ใตใตใฃใ€้€€ๅฑˆใ—ใฆใŸใจใ“ใ‚ใชใฎใ€‚ใกใ‚‡ใ†ใฉใ„ใ„้Šใณ็›ธๆ‰‹ใ€่ฆ‹ใคใ‘ใŸใ‹ใ‚‚โ™ช | <audio controls preload="none" src="audio_samples/moka_reference.wav"></audio> | <audio controls preload="none" src="audio_samples/moka_supertonic3.wav"></audio> |
98
+ | Elder character voice, Korean | ํ˜ผ์ž ๋– ๋‚˜๊ธฐ์—” ๊ธธ์ด ํ—˜ํ•˜๊ตฌ๋‚˜. ์ด ๋‚ก์€ ๊ฒ€์„ ๊ฐ€์ ธ๊ฐ€๊ฑฐ๋ผ. ์–ธ์  ๊ฐ€ ์–ด๋‘ ์ด ๋„ค ์ด๋ฆ„์„ ๋ถ€๋ฅด๋”๋ผ๋„, ๋ถ€๋”” ๋น›์„ ์žŠ์ง€ ๋ง๊ฑฐ๋ผ. | <audio controls preload="none" src="audio_samples/alphonse_reference.wav"></audio> | <audio controls preload="none" src="audio_samples/alphonse_supertonic3.wav"></audio> |
99
+ | Audiobook, English | I was not afraid of silence. I had lived with it long enough to know that, sometimes, it speaks more honestly than people do. | <audio controls preload="none" src="audio_samples/luna_reference.wav"></audio> | <audio controls preload="none" src="audio_samples/luna_supertonic3.wav"></audio> |
100
+ | Audiobook, Japanese | ใใฎๆœใ€ใƒญใƒณใƒ‰ใƒณใฎ้œงใฏใ„ใคใซใชใไฝŽใๅž‚ใ‚Œใ“ใ‚ใฆใ„ใŸใ€‚็งใฏใŸใ ใฎ่จชๅ•่€…ใ ใจๆ€ใฃใฆใ„ใŸใŒใ€ใƒ›ใƒผใƒ ใ‚บใฎ็›ฎใฏใ™ใงใซๅˆฅใฎ็ต่ซ–ใซใŸใฉใ‚Š็€ใ„ใฆใ„ใŸใ€‚ | <audio controls preload="none" src="audio_samples/watson_reference.wav"></audio> | <audio controls preload="none" src="audio_samples/watson_supertonic3.wav"></audio> |
101
+ | News, English | Hereโ€™s a story worth paying attention to. Supertone has released Supertonic 3, its on-device TTS model. This version expands support to thirty-one languages and improves reading stability. | <audio controls preload="none" src="audio_samples/keld_reference.wav"></audio> | <audio controls preload="none" src="audio_samples/keld_supertonic3.wav"></audio> |
102
+
103
  ## Performance Highlights
104
 
105
  Supertonic 3 is designed for practical on-device inference: compact enough to run locally, while staying competitive with much larger open TTS systems.
audio_samples/alphonse_reference.wav ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:09e613442124ee0cb31146431bbe0b3acd727569a55fc0ae4b4082b5de27c9f3
3
+ size 1152044
audio_samples/alphonse_supertonic3.wav ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f3dacbfa5695d6416fb46a3ea72e202e16a467a4cb5ead9425073ef83596811c
3
+ size 970796
audio_samples/keld_reference.wav ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8d7f3c2b15ee6ebafe88307658bf0787d3ca19f6347fd9e936e7bff786d1f70a
3
+ size 1152044
audio_samples/keld_supertonic3.wav ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:818c2b25bbf3e568ae8d4aeea5441605ed3e124ab28d746928972066902b35d9
3
+ size 1148972
audio_samples/luna_reference.wav ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1fcec63c22ad76711ec9835a6ca21cb84227045e3aad808f841bcbec455a9470
3
+ size 1152044
audio_samples/luna_supertonic3.wav ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:34527bf7ed968ebc5024f985d48ccf06c5c4418fb2d7fc7e3fb7a873e367c273
3
+ size 712748
audio_samples/moka_reference.wav ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ad50e1f13c18aa6af2e0f518714675c03ed98f0e85653a642527f56c2d181a56
3
+ size 480044
audio_samples/moka_supertonic3.wav ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bec7d6a5261240935d8436f3ada17aa50774878e8e6ad1696d131e4ec4ecd5f3
3
+ size 479276
audio_samples/nora_reference.wav ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a59576e7ffe84093439924f14d7070fb8174f8306c27049c00e63d747434a772
3
+ size 264644
audio_samples/nora_supertonic3.wav ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6d11415a428949df69191361b9dc5f567aaf6eccd89c13013c2fbb381b353321
3
+ size 374828
audio_samples/watson_reference.wav ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5449908651219298eab47a32a04b96291baceea00e772ea9cb16853a93127c30
3
+ size 2304044
audio_samples/watson_supertonic3.wav ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e8ffb0051f00e3f6476694604a3d21f6e87b3570b9b73fc2fdb9560c9e6f1c67
3
+ size 1019948