Kokoro 82M LiteRT Runtime Preview

This repository packages the current Kokoro 82M LiteRT/TFLite runtime used by the Reachy edge robot-agent project.

It is sourced from hexgrad/Kokoro-82M and contains the accepted text-to-decoder-input frontend bucket plus the accepted merged decoder/vocoder graph.

Runtime Shape

text
  -> Kokoro KPipeline G2P/tokenization
  -> frontend/kokoro_full_frontend_masked_b48_f128_f0256.tflite
  -> kokoro_decoder_source_stft_merged.tflite + KokoroSourceStft
  -> WAV bytes

The runtime still uses the kokoro Python package for KPipeline.g2p() and KPipeline.en_tokenize(). It must not instantiate Kokoro KModel in the request path. Neural inference is served by the LiteRT frontend bucket and the LiteRT decoder/vocoder.

Included Artifacts

kokoro_litert_manifest.json
config.json
voices/af_heart.npz
frontend/kokoro_full_frontend_masked_b48_f128_f0256.tflite
kokoro_decoder_source_stft_merged.tflite
custom_ops/kokoro_source_stft_custom_op_native.cc
custom_ops/linux-x86_64/kokoro_source_stft_custom_op_native.so
custom_ops/linux-aarch64/kokoro_source_stft_custom_op_native.so
reports/kokoro_bucketed_frontend_litert_parity_report.json
reports/kokoro_decoder_source_stft_merged_probe.json

The current frontend bucket is T=48, with max 128 decoder frames and 256 F0/noise frames. Longer or multi-segment text must be deterministically chunked and repacked before inference.

Jetson / ARM64 Status

The package includes custom op builds for local Linux x86-64 development and Jetson/Linux aarch64 deployment:

custom_ops/linux-x86_64/kokoro_source_stft_custom_op_native.so
custom_ops/linux-aarch64/kokoro_source_stft_custom_op_native.so

The aarch64 binary was cross-compiled from custom_ops/kokoro_source_stft_custom_op_native.cc with:

aarch64-linux-gnu-g++ -std=c++17 -O2 -fPIC \
  -fno-math-errno \
  -fno-trapping-math \
  -ffp-contract=fast \
  -static-libstdc++ \
  -static-libgcc \
  -Wl,--exclude-libs,ALL \
  -shared \
  custom_ops/kokoro_source_stft_custom_op_native.cc \
  -o custom_ops/linux-aarch64/kokoro_source_stft_custom_op_native.so

The expected aarch64 SHA-256 is recorded in kokoro_litert_manifest.json under decoder_vocoder.custom_op.linux_aarch64_sha256. Jetson target-device loading and synthesis benchmarking are still required.

Validation

Frontend bucket acceptance is recorded in:

reports/kokoro_bucketed_frontend_litert_parity_report.json

The local acceptance result for this package:

passed: true
bucket: T=48
max observed frontend float abs error: 0.000812530517578125
pred_dur exact: true
alignment exact: true
valid_frames exact: true

Decoder/vocoder acceptance is recorded in:

reports/kokoro_decoder_source_stft_merged_probe.json

The merged decoder is a one-interpreter graph connected through the KokoroSourceStft custom op. The custom op remains a CPU custom-op island unless implemented as a GPU-capable custom kernel or delegate.

Minimal Local Smoke

In the Reachy robot-agent repo:

PYTHONPATH=src uv run --extra tts --extra kokoro-frontend \
  python scripts/kokoro_litert_runtime_smoke.py \
  --text "Hi Will." \
  --output /tmp/robot-kokoro-litert/runtime_smoke.wav

Expected output is a mono 24 kHz WAV file.

License

The upstream Kokoro model card lists hexgrad/Kokoro-82M under Apache-2.0. This converted runtime package is distributed under Apache-2.0 as a derived runtime form. See LICENSE and NOTICE.

Downloads last month: 48

Model tree for wdga/kokoro-82m-litert-runtime-preview

Base model

yl4579/StyleTTS2-LJSpeech

Finetuned

hexgrad/Kokoro-82M

Finetuned

(25)

this model