Instructions to use Supertone/supertonic-3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Supertonic
How to use Supertone/supertonic-3 with Supertonic:
from supertonic import TTS tts = TTS(auto_download=True) style = tts.get_voice_style(voice_name="M1") text = "The train delay was announced at 4:45 PM on Wed, Apr 3, 2024 due to track maintenance." wav, duration = tts.synthesize(text, voice_style=style) tts.save_audio(wav, "output.wav")
- Notebooks
- Google Colab
- Kaggle
Supports voice cloning?
Hi, nice work!
Does this model supports voice cloning?
Hi, thanks for your interest!
The open-weight Supertonic model does not support voice cloning directly. It includes a fixed set of pre-defined voice styles.
For Supertonic 2, we previously provided a Voice Builder service where users could purchase zero-shot voice cloning embeddings:
https://supertonic.supertone.ai/voice-builder
Supertonic 3 is not supported in Voice Builder yet. If we add Voice Builder or voice cloning support for Supertonic 3 in the future, we’ll share an update.
Good news — Voice Builder support for Supertonic 3 is now available.
The open-weight model itself still includes fixed preset voice styles and does not perform voice cloning directly from audio inside the model package. However, you can now use our Voice Builder service to create a custom Supertonic 3 voice style from a reference voice.
Also, if you previously purchased a Voice Builder style for Supertonic 2, we are providing the corresponding Supertonic 3 version free of charge.
You can access Voice Builder here:
https://supertonic.supertone.ai/voice-builder
Thanks again for your interest!
so you locked the most interesting feature behind a paywall? is there no intention of sharing a local version?
Hi @blallo27 ,
I understand the concern. Local voice creation is a very reasonable thing to ask for, especially since Supertonic itself is designed to run on-device.
For this release, though, the open-weight part of Supertonic 3 is the local inference model with the provided preset voice styles. Custom voice creation is currently handled through Voice Builder as a separate product. The resulting voice style can be used with Supertonic 3, but the extraction pipeline itself is not included in the open-weight release.
So the direct answer is: we do not currently plan to release a local voice-cloning / voice-style extraction pipeline.
I know that may not be the answer everyone wants, but I wanted to be clear about the current scope rather than give a vague maybe. We’ll keep listening to feedback here as we think about future releases.
Thanks for raising it.