ailuntz commited on
Commit
8d7680f
·
verified ·
1 Parent(s): 08fe294

Refine README to follow official model card structure

Browse files
Files changed (1) hide show
  1. README.md +58 -13
README.md CHANGED
@@ -19,11 +19,7 @@ language:
19
 
20
  Current variant: `4bit` (default entry)
21
 
22
- MLX conversion of `XiaomiMiMo/MiMo-V2.5-ASR` for local inference on Apple silicon.
23
-
24
- ## Overview
25
-
26
- MiMo-V2.5-ASR is an end-to-end speech recognition model from the Xiaomi MiMo team. The official release focuses on robust transcription across Mandarin Chinese, English, Chinese dialects, code-switched speech, lyrics, noisy recordings, meetings, and knowledge-intensive content. This repository keeps the original model scope and packages it as an MLX-ready variant built from the official release.
27
 
28
  Official resources:
29
 
@@ -33,7 +29,39 @@ Official resources:
33
  - Blog: `mimo.xiaomi.com/mimo-v2-5-asr`
34
  - Code: `XiaomiMiMo/MiMo-V2.5-ASR`
35
 
36
- ## MLX Variants
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
 
38
  | Variant | Precision | Size | Local smoke time | Smoke result |
39
  | --- | --- | ---: | ---: | --- |
@@ -43,11 +71,28 @@ Official resources:
43
  | `MiMo-V2.5-ASR-MLX-bf16` | bf16 | 15 GB | - | dense reference export |
44
  | `MiMo-V2.5-ASR-MLX-fp32` | fp32 | 30 GB | - | dense reference export |
45
 
46
- ## Notes
47
 
48
- - Base model: `XiaomiMiMo/MiMo-V2.5-ASR`
49
- - Required tokenizer: `XiaomiMiMo/MiMo-Audio-Tokenizer`
50
- - Conversion date: `2026-05-12`
51
- - Local validation: `mlx-audio-swift` on `Tests/media/intention.wav`
52
- - Recommended default: `MiMo-V2.5-ASR-MLX`
53
- - This repository is a community MLX conversion. For benchmark tables, demos, and the original project description, see the official release.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
 
20
  Current variant: `4bit` (default entry)
21
 
22
+ This repository is a community MLX conversion of the official `XiaomiMiMo/MiMo-V2.5-ASR` release for local inference on Apple silicon. The original model, tokenizer, benchmark claims, demo, and project materials remain with the Xiaomi MiMo team. The MLX-specific notes in this repository are added as an incremental deployment layer on top of the official release.
 
 
 
 
23
 
24
  Official resources:
25
 
 
29
  - Blog: `mimo.xiaomi.com/mimo-v2-5-asr`
30
  - Code: `XiaomiMiMo/MiMo-V2.5-ASR`
31
 
32
+ ## Introduction
33
+
34
+ **MiMo-V2.5-ASR** is an end-to-end automatic speech recognition model developed by the Xiaomi MiMo team. It is designed for robust transcription across Mandarin Chinese and English, Chinese dialects, code-switched speech, lyrics, noisy recordings, meetings, and knowledge-intensive content.
35
+
36
+ The official release highlights the following capabilities:
37
+
38
+ - Native support for Chinese dialects including Wu, Cantonese, Hokkien, and Sichuanese.
39
+ - Seamless Chinese-English code-switching transcription without language tags.
40
+ - Lyrics transcription for Chinese and English songs.
41
+ - Robust recognition under heavy noise and far-field capture.
42
+ - Accurate transcription for multi-speaker and overlapping conversations.
43
+ - Strong performance on complex English meeting-style benchmarks.
44
+ - Reliable handling of terminology, names, places, and other knowledge-dense material.
45
+ - Native punctuation generation without a separate post-processing stage.
46
+
47
+ ## Results
48
+
49
+ For benchmark charts, qualitative examples, and the original project presentation, please refer to the official model page and blog:
50
+
51
+ - Official model card: `XiaomiMiMo/MiMo-V2.5-ASR`
52
+ - Official blog: `mimo.xiaomi.com/mimo-v2-5-asr`
53
+
54
+ ## MLX Conversion
55
+
56
+ This repository packages the official release as an MLX-ready model family for Apple silicon. The conversion was built from the official model weights together with `XiaomiMiMo/MiMo-Audio-Tokenizer`.
57
+
58
+ - Base model: `XiaomiMiMo/MiMo-V2.5-ASR`
59
+ - Required tokenizer: `XiaomiMiMo/MiMo-Audio-Tokenizer`
60
+ - Conversion date: `2026-05-12`
61
+ - Runtime used for validation: `mlx-audio-swift`
62
+ - Recommended default: `MiMo-V2.5-ASR-MLX`
63
+
64
+ ## Variant Summary
65
 
66
  | Variant | Precision | Size | Local smoke time | Smoke result |
67
  | --- | --- | ---: | ---: | --- |
 
71
  | `MiMo-V2.5-ASR-MLX-bf16` | bf16 | 15 GB | - | dense reference export |
72
  | `MiMo-V2.5-ASR-MLX-fp32` | fp32 | 30 GB | - | dense reference export |
73
 
74
+ ## Validation
75
 
76
+ Local smoke validation was run with `mlx-audio-swift` on `Tests/media/intention.wav`.
77
+
78
+ - Output: `Intention.`
79
+
80
+ ## Citation
81
+
82
+ If you use the original model, please cite the official project:
83
+
84
+ ```bibtex
85
+ @misc{coreteam2026mimov25asr,
86
+ title={MiMo-V2.5-ASR: Robust Speech Recognition Across Languages, Dialects, and Complex Acoustic Scenarios},
87
+ author={LLM-Core-Team Xiaomi},
88
+ year={2026},
89
+ url={https://github.com/XiaomiMiMo/MiMo-V2.5-ASR},
90
+ }
91
+ ```
92
+
93
+ ## Contact
94
+
95
+ For questions about the original model, please refer to the official project channels:
96
+
97
+ - `mimo@xiaomi.com`
98
+ - `XiaomiMiMo/MiMo-V2.5-ASR`