ailuntz commited on
Commit
1ef520d
·
verified ·
1 Parent(s): 2bff574

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +50 -4
README.md CHANGED
@@ -52,6 +52,52 @@ Current variant: `4bit`
52
 
53
  This repository is a community MLX conversion of the official `XiaomiMiMo/MiMo-V2.5-ASR` release for Apple silicon. The original model description below is preserved from the official release, and the MLX-specific material in this page is added as an incremental note for local MLX deployment.
54
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
55
  ## Introduction
56
 
57
  **MiMo-V2.5-ASR** is a state-of-the-art end-to-end automatic speech recognition (ASR) model developed by the Xiaomi MiMo team. It is built to deliver accurate and robust transcription across Mandarin Chinese and English, multiple Chinese dialects, code-switched speech, song lyrics, knowledge-intensive content, noisy acoustic environments, and multi-speaker conversations. MiMo-V2.5-ASR achieves state-of-the-art results on a wide range of public benchmarks.
@@ -106,10 +152,9 @@ The following repositories are MLX conversions derived from the official release
106
  MLX conversion notes:
107
 
108
  - Base model: `XiaomiMiMo/MiMo-V2.5-ASR`
109
- - One-line MLX loading: `fromPretrained("mlx-community/MiMo-V2.5-ASR-MLX")` auto-resolves the tokenizer mirror
110
  - Tokenizer resolution: automatic via `mlx-community/MiMo-Audio-Tokenizer`
111
  - Conversion date: `2026-05-12`
112
- - Local validation runtime: `mlx-audio-swift`
113
  - Recommended default: `MiMo-V2.5-ASR-MLX`
114
 
115
  Example downloads:
@@ -121,9 +166,10 @@ hf download mlx-community/MiMo-V2.5-ASR-MLX-8bit --local-dir ./models/MiMo-V2.5-
121
 
122
  ## Validation
123
 
124
- Local smoke validation was run with `mlx-audio-swift` on `Tests/media/intention.wav`.
125
 
126
- - Output: `Intention.`
 
127
 
128
  ## Getting Started
129
 
 
52
 
53
  This repository is a community MLX conversion of the official `XiaomiMiMo/MiMo-V2.5-ASR` release for Apple silicon. The original model description below is preserved from the official release, and the MLX-specific material in this page is added as an incremental note for local MLX deployment.
54
 
55
+ ## MLX Usage
56
+
57
+ Current MLX usage is documented in the GitHub forks below:
58
+
59
+ - [ailuntx/MiMo-V2.5-ASR](https://github.com/ailuntx/MiMo-V2.5-ASR)
60
+ - [ailuntx/MiMo-Audio-Tokenizer](https://github.com/ailuntx/MiMo-Audio-Tokenizer)
61
+
62
+ Install the current MLX path:
63
+
64
+ ```bash
65
+ pip install git+https://github.com/ailuntx/mlx-audio@feat/mimo-v25-asr
66
+ ```
67
+
68
+ Download the MLX checkpoints:
69
+
70
+ ```bash
71
+ hf download mlx-community/MiMo-Audio-Tokenizer --local-dir ./models/MiMo-Audio-Tokenizer
72
+ hf download mlx-community/MiMo-V2.5-ASR-MLX --local-dir ./models/MiMo-V2.5-ASR-MLX
73
+ ```
74
+
75
+ Run transcription from the helper script in `ailuntx/MiMo-V2.5-ASR`:
76
+
77
+ ```bash
78
+ git clone https://github.com/ailuntx/MiMo-V2.5-ASR.git
79
+ cd MiMo-V2.5-ASR
80
+ python run_mimo_asr_mlx.py \
81
+ --model ./models/MiMo-V2.5-ASR-MLX \
82
+ --audio path/to/audio.wav
83
+ ```
84
+
85
+ Python:
86
+
87
+ ```python
88
+ from mlx_audio.stt import load
89
+
90
+ model = load("./models/MiMo-V2.5-ASR-MLX")
91
+ result = model.generate("path/to/audio.wav", language="en")
92
+ print(result.text)
93
+ ```
94
+
95
+ Notes:
96
+
97
+ - `mlx-community/MiMo-V2.5-ASR-MLX` resolves `mlx-community/MiMo-Audio-Tokenizer` through `mlx_manifest.json`.
98
+ - The current install path depends on the MiMo support branch in `ailuntx/mlx-audio`.
99
+ - The usage section here will be simplified once MiMo lands in upstream `mlx-audio` and `mlx-audio-swift`.
100
+
101
  ## Introduction
102
 
103
  **MiMo-V2.5-ASR** is a state-of-the-art end-to-end automatic speech recognition (ASR) model developed by the Xiaomi MiMo team. It is built to deliver accurate and robust transcription across Mandarin Chinese and English, multiple Chinese dialects, code-switched speech, song lyrics, knowledge-intensive content, noisy acoustic environments, and multi-speaker conversations. MiMo-V2.5-ASR achieves state-of-the-art results on a wide range of public benchmarks.
 
152
  MLX conversion notes:
153
 
154
  - Base model: `XiaomiMiMo/MiMo-V2.5-ASR`
 
155
  - Tokenizer resolution: automatic via `mlx-community/MiMo-Audio-Tokenizer`
156
  - Conversion date: `2026-05-12`
157
+ - Local validation runtimes: `mlx-audio` and `mlx-audio-swift`
158
  - Recommended default: `MiMo-V2.5-ASR-MLX`
159
 
160
  Example downloads:
 
166
 
167
  ## Validation
168
 
169
+ Local smoke validation was run with `mlx-audio` and `mlx-audio-swift`.
170
 
171
+ - `intention.wav` -> `Intention.`
172
+ - `conversational_a.wav` -> expected coffee / Kaldi paragraph
173
 
174
  ## Getting Started
175