YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Pocket-TTS ExecuTorch Models
GPU-accelerated Pocket-TTS models exported to ExecuTorch PTE format for mobile inference.
Model Pipeline
Text β [Tokenizer] β Tokens
β
text_conditioner.pte β Text Embeddings (B, T, 1024)
β
flow_lm_main_bundled.pte β Conditioning (B, 1024) + EOS logit
β
flow_net.pte (ODE steps) β Audio Codes (B, 32)
β
mimi_decoder.pte β Audio Samples (Float32, 24kHz)
Files
| File | Size | Description |
|---|---|---|
text_conditioner.pte |
16 MB | Phoneme tokens β text embeddings |
flow_lm_main_bundled.pte |
96 MB | Bundled backbone (forward_0, forward_32, forward_64, forward_128) |
flow_net.pte |
37 MB | Flow matching ODE step |
mimi_encoder.pte |
69 MB | Voice reference encoder |
mimi_decoder.pte |
39 MB | Audio codes β waveform |
Usage (Python)
from executorch.runtime import Runtime
import torch
runtime = Runtime.get()
# Load models
tc = runtime.load_program("text_conditioner.pte").load_method("forward")
backbone = runtime.load_program("flow_lm_main_bundled.pte")
bb_0 = backbone.load_method("forward_0")
bb_32 = backbone.load_method("forward_32")
flow = runtime.load_program("flow_net.pte").load_method("forward")
decoder = runtime.load_program("mimi_decoder.pte").load_method("forward")
# Inference
tokens = torch.randint(0, 100, (1, 20), dtype=torch.int64)
text_emb = tc.execute([tokens])[0] # (1, 20, 1024)
# Backbone step 0
seq = torch.randn(1, 1, 32) # Initial audio latent
k_cache = torch.zeros(6, 1, 512, 16, 64)
v_cache = torch.zeros(6, 1, 512, 16, 64)
conditioning, eos, k_new, v_new = bb_0.execute([seq, k_cache, v_cache])
# Flow step (ODE)
c = conditioning
s, t, x = torch.tensor([[0.0]]), torch.tensor([[1.0]]), torch.randn(1, 32)
flow_dir = flow.execute([c, s, t, x])[0]
# Decode to audio
audio = decoder.execute([final_codes])[0] # (samples,)
Android Integration
// In PocketTtsVulkanEngine.kt
val module = Module.load(context.filesDir.resolve("pocket/pte/flow_lm_main_bundled.pte"))
val output = module.forward(EValue.from(inputTensor))
Requirements
- ExecuTorch 0.6.0+
- Android:
org.pytorch:executorch-android:0.6.0
Credits
Based on Pocket-TTS by Kyutai. Export patterns inspired by Kokoro ExecuTorch.
License
Apache 2.0
- Downloads last month
- 14
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support