Refine README to follow official model card structure
Browse files
README.md
CHANGED
|
@@ -19,11 +19,7 @@ language:
|
|
| 19 |
|
| 20 |
Current variant: `4bit`
|
| 21 |
|
| 22 |
-
MLX conversion of `XiaomiMiMo/MiMo-V2.5-ASR` for local inference on Apple silicon.
|
| 23 |
-
|
| 24 |
-
## Overview
|
| 25 |
-
|
| 26 |
-
MiMo-V2.5-ASR is an end-to-end speech recognition model from the Xiaomi MiMo team. The official release focuses on robust transcription across Mandarin Chinese, English, Chinese dialects, code-switched speech, lyrics, noisy recordings, meetings, and knowledge-intensive content. This repository keeps the original model scope and packages it as an MLX-ready variant built from the official release.
|
| 27 |
|
| 28 |
Official resources:
|
| 29 |
|
|
@@ -33,7 +29,39 @@ Official resources:
|
|
| 33 |
- Blog: `mimo.xiaomi.com/mimo-v2-5-asr`
|
| 34 |
- Code: `XiaomiMiMo/MiMo-V2.5-ASR`
|
| 35 |
|
| 36 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
|
| 38 |
| Variant | Precision | Size | Local smoke time | Smoke result |
|
| 39 |
| --- | --- | ---: | ---: | --- |
|
|
@@ -43,11 +71,28 @@ Official resources:
|
|
| 43 |
| `MiMo-V2.5-ASR-MLX-bf16` | bf16 | 15 GB | - | dense reference export |
|
| 44 |
| `MiMo-V2.5-ASR-MLX-fp32` | fp32 | 30 GB | - | dense reference export |
|
| 45 |
|
| 46 |
-
##
|
| 47 |
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
-
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
|
| 20 |
Current variant: `4bit`
|
| 21 |
|
| 22 |
+
This repository is a community MLX conversion of the official `XiaomiMiMo/MiMo-V2.5-ASR` release for local inference on Apple silicon. The original model, tokenizer, benchmark claims, demo, and project materials remain with the Xiaomi MiMo team. The MLX-specific notes in this repository are added as an incremental deployment layer on top of the official release.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 23 |
|
| 24 |
Official resources:
|
| 25 |
|
|
|
|
| 29 |
- Blog: `mimo.xiaomi.com/mimo-v2-5-asr`
|
| 30 |
- Code: `XiaomiMiMo/MiMo-V2.5-ASR`
|
| 31 |
|
| 32 |
+
## Introduction
|
| 33 |
+
|
| 34 |
+
**MiMo-V2.5-ASR** is an end-to-end automatic speech recognition model developed by the Xiaomi MiMo team. It is designed for robust transcription across Mandarin Chinese and English, Chinese dialects, code-switched speech, lyrics, noisy recordings, meetings, and knowledge-intensive content.
|
| 35 |
+
|
| 36 |
+
The official release highlights the following capabilities:
|
| 37 |
+
|
| 38 |
+
- Native support for Chinese dialects including Wu, Cantonese, Hokkien, and Sichuanese.
|
| 39 |
+
- Seamless Chinese-English code-switching transcription without language tags.
|
| 40 |
+
- Lyrics transcription for Chinese and English songs.
|
| 41 |
+
- Robust recognition under heavy noise and far-field capture.
|
| 42 |
+
- Accurate transcription for multi-speaker and overlapping conversations.
|
| 43 |
+
- Strong performance on complex English meeting-style benchmarks.
|
| 44 |
+
- Reliable handling of terminology, names, places, and other knowledge-dense material.
|
| 45 |
+
- Native punctuation generation without a separate post-processing stage.
|
| 46 |
+
|
| 47 |
+
## Results
|
| 48 |
+
|
| 49 |
+
For benchmark charts, qualitative examples, and the original project presentation, please refer to the official model page and blog:
|
| 50 |
+
|
| 51 |
+
- Official model card: `XiaomiMiMo/MiMo-V2.5-ASR`
|
| 52 |
+
- Official blog: `mimo.xiaomi.com/mimo-v2-5-asr`
|
| 53 |
+
|
| 54 |
+
## MLX Conversion
|
| 55 |
+
|
| 56 |
+
This repository packages the official release as an MLX-ready model family for Apple silicon. The conversion was built from the official model weights together with `XiaomiMiMo/MiMo-Audio-Tokenizer`.
|
| 57 |
+
|
| 58 |
+
- Base model: `XiaomiMiMo/MiMo-V2.5-ASR`
|
| 59 |
+
- Required tokenizer: `XiaomiMiMo/MiMo-Audio-Tokenizer`
|
| 60 |
+
- Conversion date: `2026-05-12`
|
| 61 |
+
- Runtime used for validation: `mlx-audio-swift`
|
| 62 |
+
- Recommended default: `MiMo-V2.5-ASR-MLX`
|
| 63 |
+
|
| 64 |
+
## Variant Summary
|
| 65 |
|
| 66 |
| Variant | Precision | Size | Local smoke time | Smoke result |
|
| 67 |
| --- | --- | ---: | ---: | --- |
|
|
|
|
| 71 |
| `MiMo-V2.5-ASR-MLX-bf16` | bf16 | 15 GB | - | dense reference export |
|
| 72 |
| `MiMo-V2.5-ASR-MLX-fp32` | fp32 | 30 GB | - | dense reference export |
|
| 73 |
|
| 74 |
+
## Validation
|
| 75 |
|
| 76 |
+
Local smoke validation was run with `mlx-audio-swift` on `Tests/media/intention.wav`.
|
| 77 |
+
|
| 78 |
+
- Output: `Intention.`
|
| 79 |
+
|
| 80 |
+
## Citation
|
| 81 |
+
|
| 82 |
+
If you use the original model, please cite the official project:
|
| 83 |
+
|
| 84 |
+
```bibtex
|
| 85 |
+
@misc{coreteam2026mimov25asr,
|
| 86 |
+
title={MiMo-V2.5-ASR: Robust Speech Recognition Across Languages, Dialects, and Complex Acoustic Scenarios},
|
| 87 |
+
author={LLM-Core-Team Xiaomi},
|
| 88 |
+
year={2026},
|
| 89 |
+
url={https://github.com/XiaomiMiMo/MiMo-V2.5-ASR},
|
| 90 |
+
}
|
| 91 |
+
```
|
| 92 |
+
|
| 93 |
+
## Contact
|
| 94 |
+
|
| 95 |
+
For questions about the original model, please refer to the official project channels:
|
| 96 |
+
|
| 97 |
+
- `mimo@xiaomi.com`
|
| 98 |
+
- `XiaomiMiMo/MiMo-V2.5-ASR`
|