OmniVoice (Mæstræa Mirror)

Multi-Lingual TTS & Voice Cloning — 600+ Languages

Original Model by k2-fsa (Next-gen Kaldi) · Apache 2.0

This is a mirror of the OmniVoice model weights for use with Mæstræa AI Workstation. All credits go to the original authors.

What's in This Repo

Path	Description	Size
`model.safetensors`	Main OmniVoice model	~3 GB
`audio_tokenizer/model.safetensors`	Audio tokenizer	~260 MB
`tokenizer.json`	Text tokenizer	~17 MB
`config.json`	Model configuration	< 1 KB

What OmniVoice Does

OmniVoice is a multi-lingual TTS and voice cloning model supporting 600+ languages with near real-time inference (RTF ~0.025). It supports three modes:

Auto Voice — Generate speech from text with a default voice
Voice Cloning — Clone any voice from a 3–15s reference audio sample
Voice Design — Describe the desired voice characteristics in text

Key Features

600+ language support
Near real-time inference
Long-form text auto-chunking for constant VRAM usage
~3–8 GB VRAM depending on mode

Usage with Mæstræa

These models are automatically downloaded by the Mæstræa AI Workstation backend. They can also be loaded manually:

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("AEmotionStudio/omnivoice-models")
tokenizer = AutoTokenizer.from_pretrained("AEmotionStudio/omnivoice-models")

License

Apache 2.0 — same as the original OmniVoice release.

Credits

Model: k2-fsa/OmniVoice
Paper: See original repo for citation
Mirror by: AEmotionStudio

Downloads last month: 5

Model tree for AEmotionStudio/omnivoice-models

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

Qwen/Qwen3-0.6B

Finetuned

k2-fsa/OmniVoice

Finetuned

(8)

this model