OmniVoice (Mæstræa Mirror)
Multi-Lingual TTS & Voice Cloning — 600+ Languages
Original Model by k2-fsa (Next-gen Kaldi) · Apache 2.0
This is a mirror of the OmniVoice model weights for use with Mæstræa AI Workstation. All credits go to the original authors.
What's in This Repo
| Path | Description | Size |
|---|---|---|
model.safetensors |
Main OmniVoice model | ~3 GB |
audio_tokenizer/model.safetensors |
Audio tokenizer | ~260 MB |
tokenizer.json |
Text tokenizer | ~17 MB |
config.json |
Model configuration | < 1 KB |
What OmniVoice Does
OmniVoice is a multi-lingual TTS and voice cloning model supporting 600+ languages with near real-time inference (RTF ~0.025). It supports three modes:
- Auto Voice — Generate speech from text with a default voice
- Voice Cloning — Clone any voice from a 3–15s reference audio sample
- Voice Design — Describe the desired voice characteristics in text
Key Features
- 600+ language support
- Near real-time inference
- Long-form text auto-chunking for constant VRAM usage
- ~3–8 GB VRAM depending on mode
Usage with Mæstræa
These models are automatically downloaded by the Mæstræa AI Workstation backend. They can also be loaded manually:
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("AEmotionStudio/omnivoice-models")
tokenizer = AutoTokenizer.from_pretrained("AEmotionStudio/omnivoice-models")
License
Apache 2.0 — same as the original OmniVoice release.
Credits
- Model: k2-fsa/OmniVoice
- Paper: See original repo for citation
- Mirror by: AEmotionStudio
- Downloads last month
- 5