Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -52,6 +52,52 @@ Current variant: `4bit`
|
|
| 52 |
|
| 53 |
This repository is a community MLX conversion of the official `XiaomiMiMo/MiMo-V2.5-ASR` release for Apple silicon. The original model description below is preserved from the official release, and the MLX-specific material in this page is added as an incremental note for local MLX deployment.
|
| 54 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 55 |
## Introduction
|
| 56 |
|
| 57 |
**MiMo-V2.5-ASR** is a state-of-the-art end-to-end automatic speech recognition (ASR) model developed by the Xiaomi MiMo team. It is built to deliver accurate and robust transcription across Mandarin Chinese and English, multiple Chinese dialects, code-switched speech, song lyrics, knowledge-intensive content, noisy acoustic environments, and multi-speaker conversations. MiMo-V2.5-ASR achieves state-of-the-art results on a wide range of public benchmarks.
|
|
@@ -106,10 +152,9 @@ The following repositories are MLX conversions derived from the official release
|
|
| 106 |
MLX conversion notes:
|
| 107 |
|
| 108 |
- Base model: `XiaomiMiMo/MiMo-V2.5-ASR`
|
| 109 |
-
- One-line MLX loading: `fromPretrained("mlx-community/MiMo-V2.5-ASR-MLX")` auto-resolves the tokenizer mirror
|
| 110 |
- Tokenizer resolution: automatic via `mlx-community/MiMo-Audio-Tokenizer`
|
| 111 |
- Conversion date: `2026-05-12`
|
| 112 |
-
- Local validation
|
| 113 |
- Recommended default: `MiMo-V2.5-ASR-MLX`
|
| 114 |
|
| 115 |
Example downloads:
|
|
@@ -121,9 +166,10 @@ hf download mlx-community/MiMo-V2.5-ASR-MLX-8bit --local-dir ./models/MiMo-V2.5-
|
|
| 121 |
|
| 122 |
## Validation
|
| 123 |
|
| 124 |
-
Local smoke validation was run with `mlx-audio
|
| 125 |
|
| 126 |
-
-
|
|
|
|
| 127 |
|
| 128 |
## Getting Started
|
| 129 |
|
|
|
|
| 52 |
|
| 53 |
This repository is a community MLX conversion of the official `XiaomiMiMo/MiMo-V2.5-ASR` release for Apple silicon. The original model description below is preserved from the official release, and the MLX-specific material in this page is added as an incremental note for local MLX deployment.
|
| 54 |
|
| 55 |
+
## MLX Usage
|
| 56 |
+
|
| 57 |
+
Current MLX usage is documented in the GitHub forks below:
|
| 58 |
+
|
| 59 |
+
- [ailuntx/MiMo-V2.5-ASR](https://github.com/ailuntx/MiMo-V2.5-ASR)
|
| 60 |
+
- [ailuntx/MiMo-Audio-Tokenizer](https://github.com/ailuntx/MiMo-Audio-Tokenizer)
|
| 61 |
+
|
| 62 |
+
Install the current MLX path:
|
| 63 |
+
|
| 64 |
+
```bash
|
| 65 |
+
pip install git+https://github.com/ailuntx/mlx-audio@feat/mimo-v25-asr
|
| 66 |
+
```
|
| 67 |
+
|
| 68 |
+
Download the MLX checkpoints:
|
| 69 |
+
|
| 70 |
+
```bash
|
| 71 |
+
hf download mlx-community/MiMo-Audio-Tokenizer --local-dir ./models/MiMo-Audio-Tokenizer
|
| 72 |
+
hf download mlx-community/MiMo-V2.5-ASR-MLX --local-dir ./models/MiMo-V2.5-ASR-MLX
|
| 73 |
+
```
|
| 74 |
+
|
| 75 |
+
Run transcription from the helper script in `ailuntx/MiMo-V2.5-ASR`:
|
| 76 |
+
|
| 77 |
+
```bash
|
| 78 |
+
git clone https://github.com/ailuntx/MiMo-V2.5-ASR.git
|
| 79 |
+
cd MiMo-V2.5-ASR
|
| 80 |
+
python run_mimo_asr_mlx.py \
|
| 81 |
+
--model ./models/MiMo-V2.5-ASR-MLX \
|
| 82 |
+
--audio path/to/audio.wav
|
| 83 |
+
```
|
| 84 |
+
|
| 85 |
+
Python:
|
| 86 |
+
|
| 87 |
+
```python
|
| 88 |
+
from mlx_audio.stt import load
|
| 89 |
+
|
| 90 |
+
model = load("./models/MiMo-V2.5-ASR-MLX")
|
| 91 |
+
result = model.generate("path/to/audio.wav", language="en")
|
| 92 |
+
print(result.text)
|
| 93 |
+
```
|
| 94 |
+
|
| 95 |
+
Notes:
|
| 96 |
+
|
| 97 |
+
- `mlx-community/MiMo-V2.5-ASR-MLX` resolves `mlx-community/MiMo-Audio-Tokenizer` through `mlx_manifest.json`.
|
| 98 |
+
- The current install path depends on the MiMo support branch in `ailuntx/mlx-audio`.
|
| 99 |
+
- The usage section here will be simplified once MiMo lands in upstream `mlx-audio` and `mlx-audio-swift`.
|
| 100 |
+
|
| 101 |
## Introduction
|
| 102 |
|
| 103 |
**MiMo-V2.5-ASR** is a state-of-the-art end-to-end automatic speech recognition (ASR) model developed by the Xiaomi MiMo team. It is built to deliver accurate and robust transcription across Mandarin Chinese and English, multiple Chinese dialects, code-switched speech, song lyrics, knowledge-intensive content, noisy acoustic environments, and multi-speaker conversations. MiMo-V2.5-ASR achieves state-of-the-art results on a wide range of public benchmarks.
|
|
|
|
| 152 |
MLX conversion notes:
|
| 153 |
|
| 154 |
- Base model: `XiaomiMiMo/MiMo-V2.5-ASR`
|
|
|
|
| 155 |
- Tokenizer resolution: automatic via `mlx-community/MiMo-Audio-Tokenizer`
|
| 156 |
- Conversion date: `2026-05-12`
|
| 157 |
+
- Local validation runtimes: `mlx-audio` and `mlx-audio-swift`
|
| 158 |
- Recommended default: `MiMo-V2.5-ASR-MLX`
|
| 159 |
|
| 160 |
Example downloads:
|
|
|
|
| 166 |
|
| 167 |
## Validation
|
| 168 |
|
| 169 |
+
Local smoke validation was run with `mlx-audio` and `mlx-audio-swift`.
|
| 170 |
|
| 171 |
+
- `intention.wav` -> `Intention.`
|
| 172 |
+
- `conversational_a.wav` -> expected coffee / Kaldi paragraph
|
| 173 |
|
| 174 |
## Getting Started
|
| 175 |
|