Instructions to use mlx-community/svara-tts-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use mlx-community/svara-tts-v1 with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir svara-tts-v1 mlx-community/svara-tts-v1
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
| base_model: kenpath/svara-tts-v1 | |
| models: | |
| - kenpath/svara-tts-v1 | |
| - canopylabs/3b-hi-ft-research_release | |
| - mlx-community/svara-tts-v1-4bit | |
| - mlx-community/svara-tts-v1-8bit | |
| license: apache-2.0 | |
| language: | |
| - hi | |
| - bn | |
| - mr | |
| - te | |
| - kn | |
| - bho | |
| - mag | |
| - hne | |
| - mai | |
| - as | |
| - brx | |
| - doi | |
| - gu | |
| - ml | |
| - pa | |
| - ta | |
| - ne | |
| - sa | |
| - en | |
| tags: | |
| - text-to-speech | |
| - speech-synthesis | |
| - multilingual | |
| - indic | |
| - orpheus | |
| - snac | |
| - mlx | |
| - mlx-audio | |
| task_categories: | |
| - text-to-speech | |
| pipeline_tag: text-to-speech | |
| pretty_name: Svara-TTS v1 (MLX, bfloat16) | |
| datasets: | |
| - SYSPIN | |
| - RASA | |
| - IndicTTS | |
| - SPICOR | |
| library_name: mlx | |
| # Svara-TTS v1 — MLX bfloat16 | |
| > **Parent model:** [`kenpath/svara-tts-v1`](https://huggingface.co/kenpath/svara-tts-v1) — full upstream weights, model card, training data, and evaluation. All credit for the model itself goes to the [Kenpath](https://huggingface.co/kenpath) team. This repo only contains an MLX-format conversion for inference on Apple Silicon. | |
| > | |
| > **Orpheus base:** [`canopylabs/3b-hi-ft-research_release`](https://huggingface.co/canopylabs/3b-hi-ft-research_release) — Canopy Labs' Orpheus Hindi research release, which Svara was fine-tuned from. | |
| Full-precision (bfloat16) MLX port of [`kenpath/svara-tts-v1`](https://huggingface.co/kenpath/svara-tts-v1) — an autoregressive multilingual text-to-speech model for 19 Indian languages, in the Orpheus / SNAC family. Same numerical precision as upstream, repackaged in MLX-native format (~6.6 GB sharded safetensors). | |
| For smaller memory footprints, use the 4-bit or 8-bit quantized variants linked below. | |
| Built for [mlx-audio](https://github.com/Blaizzy/mlx-audio) on Apple Silicon. | |
| ## Usage | |
| Requires `mlx-audio` with TTS extras: | |
| ```bash | |
| pip install "mlx-audio[tts]" | |
| ``` | |
| ### Python | |
| ```python | |
| import numpy as np | |
| import soundfile as sf | |
| import mlx.core as mx | |
| from mlx_audio.tts.utils import load_model | |
| model = load_model("mlx-community/svara-tts-v1") | |
| chunks = [] | |
| for result in model.generate( | |
| text="नमस्ते, आप कैसे हैं? मैं ठीक हूँ।", | |
| voice="Hindi (Female)", | |
| temperature=0.75, | |
| top_p=0.9, | |
| top_k=40, | |
| repetition_penalty=1.1, | |
| max_tokens=1200, | |
| ): | |
| chunks.append(result.audio) | |
| audio = mx.concatenate(chunks, axis=0) | |
| sf.write("hello_hi.wav", np.asarray(audio), model.sample_rate) # 24 kHz | |
| ``` | |
| ### CLI | |
| ```bash | |
| mlx_audio.tts.generate \ | |
| --model mlx-community/svara-tts-v1 \ | |
| --text "नमस्ते, आप कैसे हैं?" \ | |
| --voice "Hindi (Female)" \ | |
| --temperature 0.75 \ | |
| --top_p 0.9 | |
| ``` | |
| ## Voices | |
| Use a string of the form `"<Language Name> (<Gender>)"`: | |
| | Language | Voices | | |
| |--------------|-------------------------------------| | |
| | Hindi | `Hindi (Male)`, `Hindi (Female)` | | |
| | Bengali | `Bengali (Male)`, `Bengali (Female)`| | |
| | Marathi | `Marathi (Male)`, `Marathi (Female)`| | |
| | Telugu | `Telugu (Male)`, `Telugu (Female)` | | |
| | Kannada | `Kannada (Male)`, `Kannada (Female)`| | |
| | Tamil | `Tamil (Male)`, `Tamil (Female)` | | |
| | Malayalam | `Malayalam (Male)`, `Malayalam (Female)` | | |
| | Gujarati | `Gujarati (Male)`, `Gujarati (Female)` | | |
| | Punjabi | `Punjabi (Male)`, `Punjabi (Female)` | | |
| | Assamese | `Assamese (Male)`, `Assamese (Female)` | | |
| | Bhojpuri | `Bhojpuri (Male)`, `Bhojpuri (Female)` | | |
| | Magahi | `Magahi (Male)`, `Magahi (Female)` | | |
| | Maithili | `Maithili (Male)`, `Maithili (Female)` | | |
| | Chhattisgarhi| `Chhattisgarhi (Male)`, `Chhattisgarhi (Female)` | | |
| | Bodo | `Bodo (Male)`, `Bodo (Female)` | | |
| | Dogri | `Dogri (Male)`, `Dogri (Female)` | | |
| | Nepali | `Nepali (Male)`, `Nepali (Female)` | | |
| | Sanskrit | `Sanskrit (Male)`, `Sanskrit (Female)` | | |
| | English (Indian) | `English (Indian) (Male)`, `English (Indian) (Female)` | | |
| Total: **38 voices** across 19 languages. | |
| ## Sampling Recommendations | |
| The upstream `svara-tts-inference` repo uses these defaults; they're a good starting point: | |
| | Parameter | Value | | |
| |-----------|-------| | |
| | `temperature` | 0.75 | | |
| | `top_p` | 0.9 | | |
| | `top_k` | 40 | | |
| | `repetition_penalty` | 1.1 | | |
| | `max_tokens` | 1200–2048 | | |
| ## Architecture | |
| - **Backbone:** Llama-3.2-3B (fine-tuned from [`canopylabs/3b-hi-ft-research_release`](https://huggingface.co/canopylabs/3b-hi-ft-research_release), Canopy's Orpheus Hindi base). | |
| - **Codec:** [SNAC 24 kHz](https://huggingface.co/hubertsiuzdak/snac_24khz) — 3-level hierarchical RVQ, 7 codes per ~10 ms frame. Loaded automatically by `mlx-audio`. | |
| - **Output:** 24 kHz mono PCM. | |
| ## Other Quants | |
| - 8-bit MLX: [`mlx-community/svara-tts-v1-8bit`](https://huggingface.co/mlx-community/svara-tts-v1-8bit) (~3.5 GB) | |
| - 4-bit MLX: [`mlx-community/svara-tts-v1-4bit`](https://huggingface.co/mlx-community/svara-tts-v1-4bit) (~1.9 GB) | |
| ## License | |
| Apache 2.0 — see [base model card](https://huggingface.co/kenpath/svara-tts-v1) for full details. | |