Instructions to use wdga/kokoro-82m-litert-runtime-preview with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LiteRT
How to use wdga/kokoro-82m-litert-runtime-preview with LiteRT:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
Kokoro 82M LiteRT Runtime Preview
This repository packages the current Kokoro 82M LiteRT/TFLite runtime used by the Reachy edge robot-agent project.
It is sourced from hexgrad/Kokoro-82M
and contains the accepted text-to-decoder-input frontend bucket plus the accepted
merged decoder/vocoder graph.
Runtime Shape
text
-> Kokoro KPipeline G2P/tokenization
-> frontend/kokoro_full_frontend_masked_b48_f128_f0256.tflite
-> kokoro_decoder_source_stft_merged.tflite + KokoroSourceStft
-> WAV bytes
The runtime still uses the kokoro Python package for KPipeline.g2p() and
KPipeline.en_tokenize(). It must not instantiate Kokoro KModel in the
request path. Neural inference is served by the LiteRT frontend bucket and the
LiteRT decoder/vocoder.
Included Artifacts
kokoro_litert_manifest.json
config.json
voices/af_heart.npz
frontend/kokoro_full_frontend_masked_b48_f128_f0256.tflite
kokoro_decoder_source_stft_merged.tflite
custom_ops/kokoro_source_stft_custom_op_native.cc
custom_ops/linux-x86_64/kokoro_source_stft_custom_op_native.so
custom_ops/linux-aarch64/kokoro_source_stft_custom_op_native.so
reports/kokoro_bucketed_frontend_litert_parity_report.json
reports/kokoro_decoder_source_stft_merged_probe.json
The current frontend bucket is T=48, with max 128 decoder frames and 256
F0/noise frames. Longer or multi-segment text must be deterministically chunked
and repacked before inference.
Jetson / ARM64 Status
The package includes custom op builds for local Linux x86-64 development and Jetson/Linux aarch64 deployment:
custom_ops/linux-x86_64/kokoro_source_stft_custom_op_native.so
custom_ops/linux-aarch64/kokoro_source_stft_custom_op_native.so
The aarch64 binary was cross-compiled from
custom_ops/kokoro_source_stft_custom_op_native.cc with:
aarch64-linux-gnu-g++ -std=c++17 -O2 -fPIC \
-fno-math-errno \
-fno-trapping-math \
-ffp-contract=fast \
-static-libstdc++ \
-static-libgcc \
-Wl,--exclude-libs,ALL \
-shared \
custom_ops/kokoro_source_stft_custom_op_native.cc \
-o custom_ops/linux-aarch64/kokoro_source_stft_custom_op_native.so
The expected aarch64 SHA-256 is recorded in kokoro_litert_manifest.json under
decoder_vocoder.custom_op.linux_aarch64_sha256. Jetson target-device loading
and synthesis benchmarking are still required.
Validation
Frontend bucket acceptance is recorded in:
reports/kokoro_bucketed_frontend_litert_parity_report.json
The local acceptance result for this package:
passed: true
bucket: T=48
max observed frontend float abs error: 0.000812530517578125
pred_dur exact: true
alignment exact: true
valid_frames exact: true
Decoder/vocoder acceptance is recorded in:
reports/kokoro_decoder_source_stft_merged_probe.json
The merged decoder is a one-interpreter graph connected through the
KokoroSourceStft custom op. The custom op remains a CPU custom-op island unless
implemented as a GPU-capable custom kernel or delegate.
Minimal Local Smoke
In the Reachy robot-agent repo:
PYTHONPATH=src uv run --extra tts --extra kokoro-frontend \
python scripts/kokoro_litert_runtime_smoke.py \
--text "Hi Will." \
--output /tmp/robot-kokoro-litert/runtime_smoke.wav
Expected output is a mono 24 kHz WAV file.
License
The upstream Kokoro model card lists hexgrad/Kokoro-82M under Apache-2.0. This
converted runtime package is distributed under Apache-2.0 as a derived runtime
form. See LICENSE and NOTICE.
- Downloads last month
- 48