Kokoro-82M CoreML

3-stage CoreML pipeline for Kokoro-82M text-to-speech, optimized for Apple Neural Engine. Requires iOS 18+ / macOS 15+.

Pipeline

Stage	Model	Input	Output	Size
1. Duration	`duration.mlmodelc`	Phoneme tokens + voice + speed	Durations, prosody features, text encoding	39 MB
2. Prosody	`prosody.mlmodelc`	Aligned prosody features + style	F0 (pitch) + noise predictions	17 MB
3. Decoder	`decoder_*.mlmodelc`	Aligned text + F0 + noise + style	24 kHz audio waveform	107 MB

Swift builds an alignment matrix between stages 1 and 2 from predicted durations.

Decoder Buckets

Bucket	Max Frames	Max Audio
`decoder_5s`	200	5.0s
`decoder_10s`	400	10.0s
`decoder_15s`	600	15.0s

Voices

54 preset voices across 10 languages: English (US/UK), Spanish, French, Hindi, Italian, Japanese, Korean, Portuguese, Chinese.

Usage

[1;38;5;196mWelcome to Swift![0m

[1mSubcommands:[0m

[1mswift build[0m Build Swift packages [1mswift package[0m Create and work on packages [1mswift run[0m Run a program from a package [1mswift test[0m Run package tests [1mswift repl[0m Experiment with Swift code interactively

Use [1mswift --version[0m for Swift version information.

Use [1mswift --help[0m for descriptions of available options and flags.

Use [1mswift help <subcommand>[0m for more information about a subcommand.

Conversion

WARNING: Defaulting repo_id to hexgrad/Kokoro-82M. Pass repo_id='hexgrad/Kokoro-82M' to suppress this warning. Loaded Kokoro-82M (81.8M params)

3-Stage Verification: Reference audio: 42000 samples 3-stage audio: 117600 samples Diff: max=0.9962, mean=0.0493 Duration diff: max=12.0 PASS: 3-stage pipeline matches reference

=== Converting Duration Model === Phoneme buckets: [16, 32, 64, 128] Tracing...

License

Model weights: Apache-2.0 (hexgrad/Kokoro-82M)
CoreML conversion + Swift inference: Apache-2.0
Dictionaries and G2P: Apache-2.0

Downloads last month: 2,498

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including aufklarer/Kokoro-82M-CoreML

CoreML Speech Models

Collection

Speech AI models for Apple Neural Engine via CoreML. iOS/macOS ready. ASR, TTS, VAD, diarization. • 17 items • Updated about 15 hours ago • 1