Configuration Parsing Warning:Invalid JSON for config file config.json
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Kokoro-82M CoreML
3-stage CoreML pipeline for Kokoro-82M text-to-speech, optimized for Apple Neural Engine. Requires iOS 18+ / macOS 15+.
Pipeline
| Stage | Model | Input | Output | Size |
|---|---|---|---|---|
| 1. Duration | duration.mlmodelc |
Phoneme tokens + voice + speed | Durations, prosody features, text encoding | 39 MB |
| 2. Prosody | prosody.mlmodelc |
Aligned prosody features + style | F0 (pitch) + noise predictions | 17 MB |
| 3. Decoder | decoder_*.mlmodelc |
Aligned text + F0 + noise + style | 24 kHz audio waveform | 107 MB |
Swift builds an alignment matrix between stages 1 and 2 from predicted durations.
Decoder Buckets
| Bucket | Max Frames | Max Audio |
|---|---|---|
decoder_5s |
200 | 5.0s |
decoder_10s |
400 | 10.0s |
decoder_15s |
600 | 15.0s |
Voices
54 preset voices across 10 languages: English (US/UK), Spanish, French, Hindi, Italian, Japanese, Korean, Portuguese, Chinese.
Usage
[1;38;5;196mWelcome to Swift![0m
[1mSubcommands:[0m
[1mswift build[0m Build Swift packages [1mswift package[0m Create and work on packages [1mswift run[0m Run a program from a package [1mswift test[0m Run package tests [1mswift repl[0m Experiment with Swift code interactively
Use [1mswift --version[0m for Swift version information.
Use [1mswift --help[0m for descriptions of available options and flags.
Use [1mswift help <subcommand>[0m for more information about a subcommand.
Conversion
WARNING: Defaulting repo_id to hexgrad/Kokoro-82M. Pass repo_id='hexgrad/Kokoro-82M' to suppress this warning. Loaded Kokoro-82M (81.8M params)
3-Stage Verification: Reference audio: 42000 samples 3-stage audio: 117600 samples Diff: max=0.9962, mean=0.0493 Duration diff: max=12.0 PASS: 3-stage pipeline matches reference
=== Converting Duration Model === Phoneme buckets: [16, 32, 64, 128] Tracing...
License
- Model weights: Apache-2.0 (hexgrad/Kokoro-82M)
- CoreML conversion + Swift inference: Apache-2.0
- Dictionaries and G2P: Apache-2.0
- Guide: soniqo.audio/guides/kokoro
- Docs: soniqo.audio
- GitHub: soniqo/speech-swift
- Downloads last month
- 2,498