mlx-community
/

MiMo-Audio-Tokenizer

@@ -52,6 +52,7 @@ This repository is the MLX export used by `mlx-community/MiMo-V2.5-ASR-MLX`.
 - Default precision is `fp32`.
 - This export keeps the encoder and RVQ path used by MiMo ASR.
 - Decoder and vocoder weights are omitted here because they are not used in the ASR pipeline.
 ## Introduction
@@ -67,6 +68,8 @@ Existing audio language models typically rely on task-specific fine-tuning to ac
 MiMo-Audio-Tokenizer is a 1.2B-parameter Transformer operating at 25 Hz. It employs an eight-layer RVQ stack to generate 200 tokens per second. By jointly optimizing semantic and reconstruction objectives, we train MiMo-Audio-Tokenizer from scratch on a 10-million-hour corpus, achieving superior reconstruction quality and facilitating downstream language modeling.
 <p align="center">
   <img width="95%" src="https://github.com/XiaomiMiMo/MiMo-Audio/blob/main/assets/tokenizer.png?raw=true">
 </p>

 - Default precision is `fp32`.
 - This export keeps the encoder and RVQ path used by MiMo ASR.
 - Decoder and vocoder weights are omitted here because they are not used in the ASR pipeline.
+- The published MLX weights are therefore an ASR-focused inference subset, not a byte-for-byte mirror of the full official tokenizer release.
 ## Introduction
 MiMo-Audio-Tokenizer is a 1.2B-parameter Transformer operating at 25 Hz. It employs an eight-layer RVQ stack to generate 200 tokens per second. By jointly optimizing semantic and reconstruction objectives, we train MiMo-Audio-Tokenizer from scratch on a 10-million-hour corpus, achieving superior reconstruction quality and facilitating downstream language modeling.
+For clarity: the official Xiaomi release above describes the full tokenizer stack. This MLX repository publishes the encoder/RVQ subset used by `MiMo-V2.5-ASR`, which is why the Hugging Face file summary for this repo is about `0.64B` parameters instead of the full `1.2B`.
 <p align="center">
   <img width="95%" src="https://github.com/XiaomiMiMo/MiMo-Audio/blob/main/assets/tokenizer.png?raw=true">
 </p>