Reza2kn commited on
Commit
f90b5a2
Β·
verified Β·
1 Parent(s): ff6fdb6

Add README

Browse files
Files changed (1) hide show
  1. README.md +146 -0
README.md ADDED
@@ -0,0 +1,146 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ - zh
6
+ - ja
7
+ - ko
8
+ - multilingual
9
+ library_name: coremltools
10
+ tags:
11
+ - coreml
12
+ - ane
13
+ - apple-neural-engine
14
+ - automatic-speech-recognition
15
+ - asr
16
+ - speech-recognition
17
+ - robust-asr
18
+ - quantized
19
+ - int4
20
+ - 4bit
21
+ - lut
22
+ - palettize
23
+ - on-device
24
+ - apple-silicon
25
+ - ios
26
+ - macos
27
+ - qwen3
28
+ - qwen3-asr
29
+ - mega-asr
30
+ - anemll
31
+ pipeline_tag: automatic-speech-recognition
32
+ base_model: zhifeixie/Mega-ASR
33
+ base_model_relation: quantized
34
+ ---
35
+
36
+ # Mega-ASR β€” CoreML LUT-4 (Apple Neural Engine)
37
+
38
+ CoreML LUT-4 (4-bit lookup-table palettized) export of the LLM portion of
39
+ [zhifeixie/Mega-ASR](https://huggingface.co/zhifeixie/Mega-ASR) (Qwen3-ASR-1.7B
40
+ base), produced via [ANEMLL](https://github.com/Anemll/Anemll) β€” the Apple
41
+ Neural Engine reference converter β€” with `--chunk 4 --lut 4 --context-length 512`.
42
+
43
+ The resulting `.mlpackage` is a stateful CoreML model with native ANE
44
+ attention layouts, in-model KV cache state, and 16-way split LM head for
45
+ efficient ANE residency.
46
+
47
+ ## What's in this repo
48
+
49
+ | File | Size | Role |
50
+ | --- | ---: | --- |
51
+ | `coreml/mega-asr-llm_lut4.mlpackage/` | **974 MB** | Qwen3 1.7B LLM, ANE-targeted, LUT-4 palettized weights, stateful KV cache |
52
+ | `onnx/audio_encoder_fp32.onnx` | 1.27 GB | 24-layer Whisper-style audio encoder (ONNX fp32, run via onnxruntime β€” CoreML port pending) |
53
+ | `tokenizer/*` | β€” | Original Qwen3-ASR tokenizer (`<\|audio_pad\|>`, `<asr_text>`, etc.) |
54
+ | `examples/*.wav` | ~3 MB | 8 noisy benchmark clips from Voices-in-the-Wild-Bench |
55
+
56
+ ## Model I/O
57
+
58
+ The `mega-asr-llm_lut4.mlpackage` follows ANEMLL's stateful step-decoder layout:
59
+
60
+ **Inputs** (single-token step):
61
+ | name | shape | dtype |
62
+ | --- | --- | --- |
63
+ | `input_ids` | `(1, 1)` | int32 |
64
+ | `position_ids` | `(1,)` | int32 |
65
+ | `causal_mask` | `(1, 1, 1, 512)` | float16 |
66
+ | `current_pos` | `(1,)` | int32 |
67
+ | `update_mask` | `(1, 1, 512, 1)` | float16 |
68
+
69
+ **Outputs**: `logits1` … `logits16`, each `(1, 1, 9496)` float16 β€” concat
70
+ along last axis to get the 151936-dim vocabulary.
71
+
72
+ **State**: `model_model_kv_cache_0` β€” shape `(56, 8, 512, 128)` float16 (28
73
+ layers Γ— 2 (K/V) Γ— 8 KV heads Γ— 512 max context Γ— 128 head dim). Create with
74
+ `model.make_state()` and pass to every `predict()`.
75
+
76
+ ## Quick run (Python)
77
+
78
+ ```python
79
+ import coremltools as ct
80
+ import numpy as np
81
+
82
+ m = ct.models.MLModel("coreml/mega-asr-llm_lut4.mlpackage",
83
+ compute_units=ct.ComputeUnit.CPU_AND_NE)
84
+ state = m.make_state()
85
+ out = m.predict({
86
+ "input_ids": np.array([[40]], dtype=np.int32), # token 'I'
87
+ "position_ids": np.array([0], dtype=np.int32),
88
+ "causal_mask": np.zeros((1, 1, 1, 512), dtype=np.float16),
89
+ "current_pos": np.array([0], dtype=np.int32),
90
+ "update_mask": np.zeros((1, 1, 512, 1), dtype=np.float16),
91
+ }, state=state)
92
+ all_logits = np.concatenate([out[f"logits{i}"][0, 0] for i in range(1, 17)])
93
+ ```
94
+
95
+ ## ASR limitation (current)
96
+
97
+ This conversion exports the **standard text-LLM interface** (`input_ids` β†’
98
+ internal `embed_tokens` β†’ forward). End-to-end ASR requires scattering
99
+ **audio embeddings** at `<|audio_pad|>` placeholder positions, which means
100
+ the model needs to accept `input_embeddings` *instead of* `input_ids`.
101
+
102
+ That requires forking ANEMLL's `qwen_model.py` to expose pre-embedded
103
+ hidden_states as the entry point, then re-running the conversion. (See
104
+ [`aoiandroid/Qwen3-ASR-1.7B-CoreML`](https://huggingface.co/aoiandroid/Qwen3-ASR-1.7B-CoreML)
105
+ for a prior community attempt of the same pattern; their decoder is named
106
+ `qwen3_asr_decoder_f32_anemll_int8-mixed.mlpackage` and pairs with a
107
+ separately stored `qwen3_asr_embeddings.bin`.)
108
+
109
+ Until the input_embeddings variant lands, this artifact is usable as:
110
+ - A standalone Qwen3 1.7B CoreML LLM (e.g. text-only chat with the same
111
+ prompt format the base model expects).
112
+ - A starting point for building an ANE-targeted Mega-ASR ASR pipeline by
113
+ re-converting with the embedding bypass.
114
+
115
+ ## Conversion details
116
+
117
+ ```bash
118
+ # After cloning ANEMLL (https://github.com/Anemll/Anemll):
119
+ python -m anemll.ane_converter.qwen_converter \
120
+ --model /path/to/Qwen3-ASR-1.7B-llm-only \
121
+ --prefix mega-asr-llm --lut 4 \
122
+ --context-length 512 --batch-size 64 --chunk 4 \
123
+ --output /path/to/out
124
+ ```
125
+
126
+ The Qwen3-ASR-1.7B LLM weights were first extracted from `zhifeixie/Mega-ASR`
127
+ by stripping the `thinker.model.` prefix and dropping the tied lm_head
128
+ (see [Reza2kn/mega-asr-mlx](https://huggingface.co/Reza2kn/mega-asr-mlx) for
129
+ the extraction script).
130
+
131
+ Coremltools 9.0 needed one local patch: the `_cast` op handler in
132
+ `coremltools/converters/mil/frontend/torch/ops.py` does not handle numpy
133
+ arrays of size 1 β€” fixed by extracting the scalar via `.flatten()[0].item()`
134
+ before the dtype coercion.
135
+
136
+ ## Companion repos
137
+
138
+ - [Reza2kn/mega-asr-onnx](https://huggingface.co/Reza2kn/mega-asr-onnx) β€” full ONNX pipeline (GPTQ-INT4 decoder, 92.7% on VITW)
139
+ - [Reza2kn/mega-asr-mlx](https://huggingface.co/Reza2kn/mega-asr-mlx) β€” MLX 4-bit (mixed8/4 attention/MLP, 92.2% on VITW)
140
+ - [Reza2kn/mega-asr-bench](https://huggingface.co/spaces/Reza2kn/mega-asr-bench) β€” live browser demo (WebGPU)
141
+
142
+ ## Credits
143
+
144
+ - Original model: [zhifeixie/Mega-ASR](https://huggingface.co/zhifeixie/Mega-ASR) (1.7B params, Apache-2.0)
145
+ - CoreML conversion via [ANEMLL](https://github.com/Anemll/Anemll) (Apple Neural Engine LLM port toolkit)
146
+ - Benchmark: [Voices-in-the-Wild-Bench](https://github.com/xzf-thu/Voices-in-the-Wild-Bench)