mlx-community
/

MiMo-V2.5-ASR-MLX

@@ -19,11 +19,7 @@ language:
 Current variant: `4bit` (default entry)
-MLX conversion of `XiaomiMiMo/MiMo-V2.5-ASR` for local inference on Apple silicon.
-## Overview
-MiMo-V2.5-ASR is an end-to-end speech recognition model from the Xiaomi MiMo team. The official release focuses on robust transcription across Mandarin Chinese, English, Chinese dialects, code-switched speech, lyrics, noisy recordings, meetings, and knowledge-intensive content. This repository keeps the original model scope and packages it as an MLX-ready variant built from the official release.
 Official resources:
@@ -33,7 +29,39 @@ Official resources:
 - Blog: `mimo.xiaomi.com/mimo-v2-5-asr`
 - Code: `XiaomiMiMo/MiMo-V2.5-ASR`
-## MLX Variants
 | Variant | Precision | Size | Local smoke time | Smoke result |
 | --- | --- | ---: | ---: | --- |
@@ -43,11 +71,28 @@ Official resources:
 | `MiMo-V2.5-ASR-MLX-bf16` | bf16 | 15 GB | - | dense reference export |
 | `MiMo-V2.5-ASR-MLX-fp32` | fp32 | 30 GB | - | dense reference export |
-## Notes
-- Base model: `XiaomiMiMo/MiMo-V2.5-ASR`
-- Required tokenizer: `XiaomiMiMo/MiMo-Audio-Tokenizer`
-- Conversion date: `2026-05-12`
-- Local validation: `mlx-audio-swift` on `Tests/media/intention.wav`
-- Recommended default: `MiMo-V2.5-ASR-MLX`
-- This repository is a community MLX conversion. For benchmark tables, demos, and the original project description, see the official release.

 Current variant: `4bit` (default entry)
+This repository is a community MLX conversion of the official `XiaomiMiMo/MiMo-V2.5-ASR` release for local inference on Apple silicon. The original model, tokenizer, benchmark claims, demo, and project materials remain with the Xiaomi MiMo team. The MLX-specific notes in this repository are added as an incremental deployment layer on top of the official release.
 Official resources:
 - Blog: `mimo.xiaomi.com/mimo-v2-5-asr`
 - Code: `XiaomiMiMo/MiMo-V2.5-ASR`
+## Introduction
+**MiMo-V2.5-ASR** is an end-to-end automatic speech recognition model developed by the Xiaomi MiMo team. It is designed for robust transcription across Mandarin Chinese and English, Chinese dialects, code-switched speech, lyrics, noisy recordings, meetings, and knowledge-intensive content.
+The official release highlights the following capabilities:
+- Native support for Chinese dialects including Wu, Cantonese, Hokkien, and Sichuanese.
+- Seamless Chinese-English code-switching transcription without language tags.
+- Lyrics transcription for Chinese and English songs.
+- Robust recognition under heavy noise and far-field capture.
+- Accurate transcription for multi-speaker and overlapping conversations.
+- Strong performance on complex English meeting-style benchmarks.
+- Reliable handling of terminology, names, places, and other knowledge-dense material.
+- Native punctuation generation without a separate post-processing stage.
+## Results
+For benchmark charts, qualitative examples, and the original project presentation, please refer to the official model page and blog:
+- Official model card: `XiaomiMiMo/MiMo-V2.5-ASR`
+- Official blog: `mimo.xiaomi.com/mimo-v2-5-asr`
+## MLX Conversion
+This repository packages the official release as an MLX-ready model family for Apple silicon. The conversion was built from the official model weights together with `XiaomiMiMo/MiMo-Audio-Tokenizer`.
+- Base model: `XiaomiMiMo/MiMo-V2.5-ASR`
+- Required tokenizer: `XiaomiMiMo/MiMo-Audio-Tokenizer`
+- Conversion date: `2026-05-12`
+- Runtime used for validation: `mlx-audio-swift`
+- Recommended default: `MiMo-V2.5-ASR-MLX`
+## Variant Summary
 | Variant | Precision | Size | Local smoke time | Smoke result |
 | --- | --- | ---: | ---: | --- |
 | `MiMo-V2.5-ASR-MLX-bf16` | bf16 | 15 GB | - | dense reference export |
 | `MiMo-V2.5-ASR-MLX-fp32` | fp32 | 30 GB | - | dense reference export |
+## Validation
+Local smoke validation was run with `mlx-audio-swift` on `Tests/media/intention.wav`.
+- Output: `Intention.`
+## Citation
+If you use the original model, please cite the official project:
+```bibtex
+@misc{coreteam2026mimov25asr,
+      title={MiMo-V2.5-ASR: Robust Speech Recognition Across Languages, Dialects, and Complex Acoustic Scenarios},
+      author={LLM-Core-Team Xiaomi},
+      year={2026},
+      url={https://github.com/XiaomiMiMo/MiMo-V2.5-ASR},
+}
+```
+## Contact
+For questions about the original model, please refer to the official project channels:
+- `mimo@xiaomi.com`
+- `XiaomiMiMo/MiMo-V2.5-ASR`