YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Kokoro-82M TTS Model

High-quality text-to-speech model with 82M parameters and 24 voices (American/British English).

Model Variants

Variant File Size Quality Recommended For
FP32 kokoro-v1.0.onnx 310MB Baseline Development/reference
FP16 kokoro-v1.0.fp16.onnx 169MB Near-identical Default for deployment
INT8 kokoro-v1.0.int8.onnx 88MB Slight degradation Mobile/edge devices

FP16 is the default - best balance of quality and size (45% smaller than FP32).

Download Model Files

# FP16 model (recommended default - 169MB)
wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files-v1.0/kokoro-v1.0.fp16.onnx

# Voice embeddings (required)
wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files-v1.0/voices-v1.0.bin
mv voices-v1.0.bin voices.bin

# Misaki dictionaries (for MisakiDictionary backend - default)
mkdir -p misaki
wget -O misaki/us_gold.json https://raw.githubusercontent.com/hexgrad/misaki/refs/heads/main/misaki/resources/en/us_gold.json
wget -O misaki/us_silver.json https://raw.githubusercontent.com/hexgrad/misaki/refs/heads/main/misaki/resources/en/us_silver.json

# Optional: INT8 for mobile (88MB, 72% smaller)
wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files-v1.0/kokoro-v1.0.int8.onnx

# Optional: FP32 for reference (310MB)
wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files-v1.0/kokoro-v1.0.onnx

Switching Variants

To use a different variant, update model_metadata.json:

{
  "execution_template": {
    "type": "SimpleMode",
    "model_file": "kokoro-v1.0.fp16.onnx"  // or .int8.onnx or .onnx
  }
}

Phonemization Backends

Kokoro requires text-to-phoneme conversion before audio synthesis. Xybrid supports multiple backends:

MisakiDictionary (Default)

Zero external dependencies - uses dictionary lookup for phonemization.

{
  "preprocessing": [{
    "type": "Phonemize",
    "backend": "MisakiDictionary",
    "tokens_file": "tokens.txt"
  }]
}

Pros:

  • No system dependencies - works on mobile/embedded
  • Fast dictionary lookup
  • Self-contained deployment

Cons:

  • May not handle unusual words/names perfectly
  • Falls back to basic phonemization for out-of-vocabulary words

Required files (included):

  • misaki/us_gold.json (2.9MB) - High-confidence dictionary
  • misaki/us_silver.json (3.0MB) - Extended vocabulary

EspeakNG (Alternative)

Uses the espeak-ng system command for phonemization.

{
  "preprocessing": [{
    "type": "Phonemize",
    "backend": "EspeakNG",
    "language": "en-us",
    "tokens_file": "tokens.txt"
  }]
}

Pros:

  • Higher quality phonemization
  • Better handling of unusual words, numbers, abbreviations

Cons:

  • Requires espeak-ng installed on the system
  • Not suitable for mobile/embedded deployment

Installation:

# macOS
brew install espeak-ng

# Ubuntu/Debian
apt-get install espeak-ng

# Windows
# Download from https://github.com/espeak-ng/espeak-ng/releases

Switching Backends

Update model_metadata.json to change the phonemization backend:

{
  "preprocessing": [{
    "type": "Phonemize",
    "backend": "MisakiDictionary",  // or "EspeakNG"
    "tokens_file": "tokens.txt"
  }]
}

Available Voices

Voice Description
af_bella American Female - Bella
af_nicole American Female - Nicole
af_sarah American Female - Sarah
af_sky American Female - Sky
am_adam American Male - Adam
am_michael American Male - Michael
bf_emma British Female - Emma
bf_isabella British Female - Isabella
bm_george British Male - George
bm_lewis British Male - Lewis

Voice naming convention: {region}{gender}_{name}

  • Region: a = American, b = British
  • Gender: f = Female, m = Male

Usage

Run through xybrid execution system:

use xybrid_core::template_executor::TemplateExecutor;

let executor = TemplateExecutor::from_metadata_file("test_models/kokoro-82m/model_metadata.json")?;
let audio = executor.run_text("Hello, world!")?;

License

Apache-2.0

Source

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support