OmniVoice (Mæstræa Mirror)

Multi-Lingual TTS & Voice Cloning — 600+ Languages

Original Model by k2-fsa (Next-gen Kaldi) · Apache 2.0

This is a mirror of the OmniVoice model weights for use with Mæstræa AI Workstation. All credits go to the original authors.

What's in This Repo

Path Description Size
model.safetensors Main OmniVoice model ~3 GB
audio_tokenizer/model.safetensors Audio tokenizer ~260 MB
tokenizer.json Text tokenizer ~17 MB
config.json Model configuration < 1 KB

What OmniVoice Does

OmniVoice is a multi-lingual TTS and voice cloning model supporting 600+ languages with near real-time inference (RTF ~0.025). It supports three modes:

  • Auto Voice — Generate speech from text with a default voice
  • Voice Cloning — Clone any voice from a 3–15s reference audio sample
  • Voice Design — Describe the desired voice characteristics in text

Key Features

  • 600+ language support
  • Near real-time inference
  • Long-form text auto-chunking for constant VRAM usage
  • ~3–8 GB VRAM depending on mode

Usage with Mæstræa

These models are automatically downloaded by the Mæstræa AI Workstation backend. They can also be loaded manually:

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("AEmotionStudio/omnivoice-models")
tokenizer = AutoTokenizer.from_pretrained("AEmotionStudio/omnivoice-models")

License

Apache 2.0 — same as the original OmniVoice release.

Credits

Downloads last month
5
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AEmotionStudio/omnivoice-models

Finetuned
Qwen/Qwen3-0.6B
Finetuned
k2-fsa/OmniVoice
Finetuned
(8)
this model