YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Pocket-TTS ExecuTorch Models

GPU-accelerated Pocket-TTS models exported to ExecuTorch PTE format for mobile inference.

Model Pipeline

Text β†’ [Tokenizer] β†’ Tokens
         ↓
    text_conditioner.pte β†’ Text Embeddings (B, T, 1024)
         ↓
    flow_lm_main_bundled.pte β†’ Conditioning (B, 1024) + EOS logit
         ↓
    flow_net.pte (ODE steps) β†’ Audio Codes (B, 32)
         ↓
    mimi_decoder.pte β†’ Audio Samples (Float32, 24kHz)

Files

File Size Description
text_conditioner.pte 16 MB Phoneme tokens β†’ text embeddings
flow_lm_main_bundled.pte 96 MB Bundled backbone (forward_0, forward_32, forward_64, forward_128)
flow_net.pte 37 MB Flow matching ODE step
mimi_encoder.pte 69 MB Voice reference encoder
mimi_decoder.pte 39 MB Audio codes β†’ waveform

Usage (Python)

from executorch.runtime import Runtime
import torch

runtime = Runtime.get()

# Load models
tc = runtime.load_program("text_conditioner.pte").load_method("forward")
backbone = runtime.load_program("flow_lm_main_bundled.pte")
bb_0 = backbone.load_method("forward_0")
bb_32 = backbone.load_method("forward_32")
flow = runtime.load_program("flow_net.pte").load_method("forward")
decoder = runtime.load_program("mimi_decoder.pte").load_method("forward")

# Inference
tokens = torch.randint(0, 100, (1, 20), dtype=torch.int64)
text_emb = tc.execute([tokens])[0]  # (1, 20, 1024)

# Backbone step 0
seq = torch.randn(1, 1, 32)  # Initial audio latent
k_cache = torch.zeros(6, 1, 512, 16, 64)
v_cache = torch.zeros(6, 1, 512, 16, 64)
conditioning, eos, k_new, v_new = bb_0.execute([seq, k_cache, v_cache])

# Flow step (ODE)
c = conditioning
s, t, x = torch.tensor([[0.0]]), torch.tensor([[1.0]]), torch.randn(1, 32)
flow_dir = flow.execute([c, s, t, x])[0]

# Decode to audio
audio = decoder.execute([final_codes])[0]  # (samples,)

Android Integration

// In PocketTtsVulkanEngine.kt
val module = Module.load(context.filesDir.resolve("pocket/pte/flow_lm_main_bundled.pte"))
val output = module.forward(EValue.from(inputTensor))

Requirements

  • ExecuTorch 0.6.0+
  • Android: org.pytorch:executorch-android:0.6.0

Credits

Based on Pocket-TTS by Kyutai. Export patterns inspired by Kokoro ExecuTorch.

License

Apache 2.0

Downloads last month
14
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support