Clarify MLX tokenizer subset scope
Browse files
README.md
CHANGED
|
@@ -52,6 +52,7 @@ This repository is the MLX export used by `mlx-community/MiMo-V2.5-ASR-MLX`.
|
|
| 52 |
- Default precision is `fp32`.
|
| 53 |
- This export keeps the encoder and RVQ path used by MiMo ASR.
|
| 54 |
- Decoder and vocoder weights are omitted here because they are not used in the ASR pipeline.
|
|
|
|
| 55 |
|
| 56 |
## Introduction
|
| 57 |
|
|
@@ -67,6 +68,8 @@ Existing audio language models typically rely on task-specific fine-tuning to ac
|
|
| 67 |
|
| 68 |
MiMo-Audio-Tokenizer is a 1.2B-parameter Transformer operating at 25 Hz. It employs an eight-layer RVQ stack to generate 200 tokens per second. By jointly optimizing semantic and reconstruction objectives, we train MiMo-Audio-Tokenizer from scratch on a 10-million-hour corpus, achieving superior reconstruction quality and facilitating downstream language modeling.
|
| 69 |
|
|
|
|
|
|
|
| 70 |
<p align="center">
|
| 71 |
<img width="95%" src="https://github.com/XiaomiMiMo/MiMo-Audio/blob/main/assets/tokenizer.png?raw=true">
|
| 72 |
</p>
|
|
|
|
| 52 |
- Default precision is `fp32`.
|
| 53 |
- This export keeps the encoder and RVQ path used by MiMo ASR.
|
| 54 |
- Decoder and vocoder weights are omitted here because they are not used in the ASR pipeline.
|
| 55 |
+
- The published MLX weights are therefore an ASR-focused inference subset, not a byte-for-byte mirror of the full official tokenizer release.
|
| 56 |
|
| 57 |
## Introduction
|
| 58 |
|
|
|
|
| 68 |
|
| 69 |
MiMo-Audio-Tokenizer is a 1.2B-parameter Transformer operating at 25 Hz. It employs an eight-layer RVQ stack to generate 200 tokens per second. By jointly optimizing semantic and reconstruction objectives, we train MiMo-Audio-Tokenizer from scratch on a 10-million-hour corpus, achieving superior reconstruction quality and facilitating downstream language modeling.
|
| 70 |
|
| 71 |
+
For clarity: the official Xiaomi release above describes the full tokenizer stack. This MLX repository publishes the encoder/RVQ subset used by `MiMo-V2.5-ASR`, which is why the Hugging Face file summary for this repo is about `0.64B` parameters instead of the full `1.2B`.
|
| 72 |
+
|
| 73 |
<p align="center">
|
| 74 |
<img width="95%" src="https://github.com/XiaomiMiMo/MiMo-Audio/blob/main/assets/tokenizer.png?raw=true">
|
| 75 |
</p>
|