zhifeixie commited on
Commit
02abc58
·
verified ·
1 Parent(s): aa12b42

Add files using upload-large-folder tool

Browse files
Qwen3-ASR-1.7B/.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
Qwen3-ASR-1.7B/README.md ADDED
@@ -0,0 +1,1393 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ pipeline_tag: automatic-speech-recognition
4
+ ---
5
+
6
+ # Qwen3-ASR
7
+
8
+ ## Overview
9
+
10
+ ### Introduction
11
+
12
+ <p align="center">
13
+ <img src="https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-ASR-Repo/qwen3_asr_introduction.png" width="90%"/>
14
+ <p>
15
+
16
+ The Qwen3-ASR family includes Qwen3-ASR-1.7B and Qwen3-ASR-0.6B, which support language identification and ASR for 52 languages and dialects. Both leverage large-scale speech training data and the strong audio understanding capability of their foundation model, Qwen3-Omni. Experiments show that the 1.7B version achieves state-of-the-art performance among open-source ASR models and is competitive with the strongest proprietary commercial APIs. Here are the main features:
17
+
18
+ * **All-in-one**: Qwen3-ASR-1.7B and Qwen3-ASR-0.6B support language identification and speech recognition for 30 languages and 22 Chinese dialects, so as to English accents from multiple countries and regions.
19
+
20
+ * **Excellent and Fast**: The Qwen3-ASR family ASR models maintains high-quality and robust recognition under complex acoustic environments and challenging text patterns. Qwen3-ASR-1.7B achieves strong performance on both open-sourced and internal benchmarks. While the 0.6B version achieves accuracy-efficient trade-off, it reaches 2000 times throughput at a concurrency of 128. They both achieve streaming / offline unified inference with single model and support transcribe long audio.
21
+
22
+ * **Novel and strong forced alignment Solution**: We introduce Qwen3-ForcedAligner-0.6B, which supports timestamp prediction for arbitrary units within up to 5 minutes of speech in 11 languages. Evaluations show its timestamp accuracy surpasses E2E based forced-alignment models.
23
+
24
+ * **Comprehensive inference toolkit**: In addition to open-sourcing the architectures and weights of the Qwen3-ASR series, we also release a powerful, full-featured inference framework that supports vLLM-based batch inference, asynchronous serving, streaming inference, timestamp prediction, and more.
25
+
26
+ ### Model Architecture
27
+
28
+ <p align="center">
29
+ <img src="https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-ASR-Repo/overview.jpg" width="100%"/>
30
+ <p>
31
+
32
+
33
+ ### Released Models Description and Download
34
+
35
+ Below is an introduction and download information for the Qwen3-ASR models. Please select and download the model that fits your needs.
36
+
37
+ | Model | Supported Languages | Supported Dialects | Inference Mode | Audio Types |
38
+ |---|---|---|---|---|
39
+ | Qwen3-ASR-1.7B & Qwen3-ASR-0.6B | Chinese (zh), English (en), Cantonese (yue), Arabic (ar), German (de), French (fr), Spanish (es), Portuguese (pt), Indonesian (id), Italian (it), Korean (ko), Russian (ru), Thai (th), Vietnamese (vi), Japanese (ja), Turkish (tr), Hindi (hi), Malay (ms), Dutch (nl), Swedish (sv), Danish (da), Finnish (fi), Polish (pl), Czech (cs), Filipino (fil), Persian (fa), Greek (el), Hungarian (hu), Macedonian (mk), Romanian (ro) | Anhui, Dongbei, Fujian, Gansu, Guizhou, Hebei, Henan, Hubei, Hunan, Jiangxi, Ningxia, Shandong, Shaanxi, Shanxi, Sichuan, Tianjin, Yunnan, Zhejiang, Cantonese (Hong Kong accent), Cantonese (Guangdong accent), Wu language, Minnan language. | Offline / Streaming | Speech, Singing Voice, Songs with BGM |
40
+ | Qwen3-ForcedAligner-0.6B | Chinese, English, Cantonese, French, German, Italian, Japanese, Korean, Portuguese, Russian, Spanish | -- | NAR | Speech |
41
+
42
+ During model loading in the `qwen-asr` package or vLLM, model weights will be downloaded automatically based on the model name. However, if your runtime environment does not allow downloading weights during execution, you can use the following commands to manually download the model weights to a local directory:
43
+
44
+ ```bash
45
+ # Download through ModelScope (recommended for users in Mainland China)
46
+ pip install -U modelscope
47
+ modelscope download --model Qwen/Qwen3-ASR-1.7B --local_dir ./Qwen3-ASR-1.7B
48
+ modelscope download --model Qwen/Qwen3-ASR-0.6B --local_dir ./Qwen3-ASR-0.6B
49
+ modelscope download --model Qwen/Qwen3-ForcedAligner-0.6B --local_dir ./Qwen3-ForcedAligner-0.6B
50
+ # Download through Hugging Face
51
+ pip install -U "huggingface_hub[cli]"
52
+ huggingface-cli download Qwen/Qwen3-ASR-1.7B --local-dir ./Qwen3-ASR-1.7B
53
+ huggingface-cli download Qwen/Qwen3-ASR-0.6B --local-dir ./Qwen3-ASR-0.6B
54
+ huggingface-cli download Qwen/Qwen3-ForcedAligner-0.6B --local-dir ./Qwen3-ForcedAligner-0.6B
55
+ ```
56
+
57
+
58
+ ## Quickstart
59
+
60
+ ### Environment Setup
61
+
62
+ The easiest way to use Qwen3-ASR is to install the `qwen-asr` Python package from PyPI. This will pull in the required runtime dependencies and allow you to load any released Qwen3-ASR model. If you’d like to simplify environment setup further, you can also use our official [Docker image](#docker). The `qwen-asr` package provides two backends: the transformers backend and the vLLM backend. For usage instructions for different backends, please refer to [Python Package Usage](#python-package-usage). We recommend using a **fresh, isolated environment** to avoid dependency conflicts with existing packages. You can create a clean Python 3.12 environment like this:
63
+
64
+ ```bash
65
+ conda create -n qwen3-asr python=3.12 -y
66
+ conda activate qwen3-asr
67
+ ```
68
+
69
+ Run the following command to get the minimal installation with transformers-backend support:
70
+
71
+ ```bash
72
+ pip install -U qwen-asr
73
+ ```
74
+
75
+ To enable the vLLM backend for faster inference and streaming support, run:
76
+
77
+ ```bash
78
+ pip install -U qwen-asr[vllm]
79
+ ```
80
+
81
+ If you want to develop or modify the code locally, install from source in editable mode:
82
+
83
+ ```bash
84
+ git clone https://github.com/QwenLM/Qwen3-ASR.git
85
+ cd Qwen3-ASR
86
+ pip install -e .
87
+ # support vLLM backend
88
+ # pip install -e ".[vllm]"
89
+ ```
90
+
91
+ Additionally, we recommend using FlashAttention 2 to reduce GPU memory usage and accelerate inference speed, especially for long inputs and large batch sizes.
92
+
93
+ ```bash
94
+ pip install -U flash-attn --no-build-isolation
95
+ ```
96
+
97
+ If your machine has less than 96GB of RAM and lots of CPU cores, run:
98
+
99
+ ```bash
100
+ MAX_JOBS=4 pip install -U flash-attn --no-build-isolation
101
+ ```
102
+
103
+ Also, you should have hardware that is compatible with FlashAttention 2. Read more about it in the official documentation of the [FlashAttention repository](https://github.com/Dao-AILab/flash-attention). FlashAttention 2 can only be used when a model is loaded in `torch.float16` or `torch.bfloat16`.
104
+
105
+ ### Python Package Usage
106
+
107
+ #### Quick Inference
108
+
109
+ The `qwen-asr` package provides two backends: **transformers backend** and **vLLM backend**. You can pass audio inputs as a local path, a URL, base64 data, or a `(np.ndarray, sr)` tuple, and run batch inference. To quickly try Qwen3-ASR, you can use `Qwen3ASRModel.from_pretrained(...)` for the transformers backend with the following code:
110
+
111
+ ```python
112
+ import torch
113
+ from qwen_asr import Qwen3ASRModel
114
+
115
+ model = Qwen3ASRModel.from_pretrained(
116
+ "Qwen/Qwen3-ASR-1.7B",
117
+ dtype=torch.bfloat16,
118
+ device_map="cuda:0",
119
+ # attn_implementation="flash_attention_2",
120
+ max_inference_batch_size=32, # Batch size limit for inference. -1 means unlimited. Smaller values can help avoid OOM.
121
+ max_new_tokens=256, # Maximum number of tokens to generate. Set a larger value for long audio input.
122
+ )
123
+
124
+ results = model.transcribe(
125
+ audio="https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-ASR-Repo/asr_en.wav",
126
+ language=None, # set "English" to force the language
127
+ )
128
+
129
+ print(results[0].language)
130
+ print(results[0].text)
131
+ ```
132
+
133
+ If you want to return timestamps, pass `forced_aligner` and its init kwargs. Here is an example of batch inference with timestamps output:
134
+
135
+ ```python
136
+ import torch
137
+ from qwen_asr import Qwen3ASRModel
138
+
139
+ model = Qwen3ASRModel.from_pretrained(
140
+ "Qwen/Qwen3-ASR-1.7B",
141
+ dtype=torch.bfloat16,
142
+ device_map="cuda:0",
143
+ # attn_implementation="flash_attention_2",
144
+ max_inference_batch_size=32, # Batch size limit for inference. -1 means unlimited. Smaller values can help avoid OOM.
145
+ max_new_tokens=256, # Maximum number of tokens to generate. Set a larger value for long audio input.
146
+ forced_aligner="Qwen/Qwen3-ForcedAligner-0.6B",
147
+ forced_aligner_kwargs=dict(
148
+ dtype=torch.bfloat16,
149
+ device_map="cuda:0",
150
+ # attn_implementation="flash_attention_2",
151
+ ),
152
+ )
153
+
154
+ results = model.transcribe(
155
+ audio=[
156
+ "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-ASR-Repo/asr_zh.wav",
157
+ "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-ASR-Repo/asr_en.wav",
158
+ ],
159
+ language=["Chinese", "English"], # can also be set to None for automatic language detection
160
+ return_time_stamps=True,
161
+ )
162
+
163
+ for r in results:
164
+ print(r.language, r.text, r.time_stamps[0])
165
+ ```
166
+
167
+ For more detailed usage examples, please refer to the [example code](https://github.com/QwenLM/Qwen3-ASR/blob/main/examples/example_qwen3_asr_transformers.py) for the transformers backend.
168
+
169
+ #### vLLM Backend
170
+
171
+ If you want the fastest inference speed with Qwen3-ASR, we strongly recommend using the vLLM backend by initializing the model with `Qwen3ASRModel.LLM(...)`. Example code is provided below. Note that you must install it via `pip install -U qwen-asr[vllm]`. If you want the model to output timestamps, it’s best to install FlashAttention via `pip install -U flash-attn --no-build-isolation` to speed up inference for the forced aligner model. Remember to wrap your code under `if __name__ == '__main__':` to avoid the `spawn` error described in [vLLM Troubleshooting](https://docs.vllm.ai/en/latest/usage/troubleshooting/#python-multiprocessing).
172
+
173
+ ```python
174
+ import torch
175
+ from qwen_asr import Qwen3ASRModel
176
+
177
+ if __name__ == '__main__':
178
+ model = Qwen3ASRModel.LLM(
179
+ model="Qwen/Qwen3-ASR-1.7B",
180
+ gpu_memory_utilization=0.7,
181
+ max_inference_batch_size=128, # Batch size limit for inference. -1 means unlimited. Smaller values can help avoid OOM.
182
+ max_new_tokens=4096, # Maximum number of tokens to generate. Set a larger value for long audio input.
183
+ forced_aligner="Qwen/Qwen3-ForcedAligner-0.6B",
184
+ forced_aligner_kwargs=dict(
185
+ dtype=torch.bfloat16,
186
+ device_map="cuda:0",
187
+ # attn_implementation="flash_attention_2",
188
+ ),
189
+ )
190
+
191
+ results = model.transcribe(
192
+ audio=[
193
+ "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-ASR-Repo/asr_zh.wav",
194
+ "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-ASR-Repo/asr_en.wav",
195
+ ],
196
+ language=["Chinese", "English"], # can also be set to None for automatic language detection
197
+ return_time_stamps=True,
198
+ )
199
+
200
+ for r in results:
201
+ print(r.language, r.text, r.time_stamps[0])
202
+ ```
203
+
204
+ For more detailed usage examples, please refer to the [example code](https://github.com/QwenLM/Qwen3-ASR/blob/main/examples/example_qwen3_asr_vllm.py) for the vLLM backend. In addition, you can start a vLLM server via the `qwen-asr-serve` command, which is a wrapper around `vllm serve`. You can pass any arguments supported by `vllm serve`, for example:
205
+
206
+ ```bash
207
+ qwen-asr-serve Qwen/Qwen3-ASR-1.7B --gpu-memory-utilization 0.8 --host 0.0.0.0 --port 8000
208
+ ```
209
+
210
+ And send requests to the server via:
211
+
212
+ ```python
213
+ import requests
214
+
215
+ url = "http://localhost:8000/v1/chat/completions"
216
+ headers = {"Content-Type": "application/json"}
217
+
218
+ data = {
219
+ "messages": [
220
+ {
221
+ "role": "user",
222
+ "content": [
223
+ {
224
+ "type": "audio_url",
225
+ "audio_url": {
226
+ "url": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-ASR-Repo/asr_en.wav"
227
+ },
228
+ }
229
+ ],
230
+ }
231
+ ]
232
+ }
233
+
234
+ response = requests.post(url, headers=headers, json=data, timeout=300)
235
+ response.raise_for_status()
236
+ content = response.json()['choices'][0]['message']['content']
237
+ print(content)
238
+
239
+ # parse ASR output if you want
240
+ from qwen_asr import parse_asr_output
241
+ language, text = parse_asr_output(content)
242
+ print(language)
243
+ print(text)
244
+ ```
245
+
246
+ #### Streaming Inference
247
+
248
+ Qwen3-ASR fully supports streaming inference. Currently, streaming inference is only available with the vLLM backend. Note that streaming inference does not support batch inference or returning timestamps. Please refer to the [example code](https://github.com/QwenLM/Qwen3-ASR/blob/main/examples/example_qwen3_asr_vllm_streaming.py) for details. You can also launch a streaming web demo through the [guide](#streaming-demo) to experience Qwen3-ASR’s streaming transcription capabilities.
249
+
250
+ #### ForcedAligner Usage
251
+
252
+ `Qwen3-ForcedAligner-0.6B` can align text–speech pairs and return word or character level timestamps. Here is an example of using the forced aligner directly:
253
+
254
+ ```python
255
+ import torch
256
+ from qwen_asr import Qwen3ForcedAligner
257
+
258
+ model = Qwen3ForcedAligner.from_pretrained(
259
+ "Qwen/Qwen3-ForcedAligner-0.6B",
260
+ dtype=torch.bfloat16,
261
+ device_map="cuda:0",
262
+ # attn_implementation="flash_attention_2",
263
+ )
264
+
265
+ results = model.align(
266
+ audio="https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-ASR-Repo/asr_zh.wav",
267
+ text="甚至出现交易几乎停滞的情况。",
268
+ language="Chinese",
269
+ )
270
+
271
+ print(results[0])
272
+ print(results[0][0].text, results[0][0].start_time, results[0][0].end_time)
273
+ ```
274
+
275
+ In addition, the forced aligner supports local paths / URLs / base64 data / `(np.ndarray, sr)` inputs and batch inference. Please refer to the [example code](https://github.com/QwenLM/Qwen3-ASR/blob/main/examples/example_qwen3_forced_aligner.py) for details.
276
+
277
+ ### DashScope API Usage
278
+
279
+ To further explore Qwen3-ASR, we encourage you to try our DashScope API for a faster and more efficient experience. For detailed API information and documentation, please refer to the following:
280
+
281
+ | API Description | API Documentation (Mainland China) | API Documentation (International) |
282
+ |------------------|-----------------------------------|------------------------------------|
283
+ | Real-time API for Qwen3-ASR. | [https://help.aliyun.com/zh/model-studio/qwen-real-time-speech-recognition](https://help.aliyun.com/zh/model-studio/qwen-real-time-speech-recognition) | [https://www.alibabacloud.com/help/en/model-studio/qwen-real-time-speech-recognition](https://www.alibabacloud.com/help/en/model-studio/qwen-real-time-speech-recognition) |
284
+ | FileTrans API for Qwen3-ASR. | [https://help.aliyun.com/zh/model-studio/qwen-speech-recognition](https://help.aliyun.com/zh/model-studio/qwen-speech-recognition) | [https://www.alibabacloud.com/help/en/model-studio/qwen-speech-recognition](https://www.alibabacloud.com/help/en/model-studio/qwen-speech-recognition) |
285
+
286
+
287
+ ## Launch Local Web UI Demo
288
+
289
+ ### Gradio Demo
290
+
291
+ To launch the Qwen3-ASR web UI gradio demo, install the `qwen-asr` package and run `qwen-asr-demo`. Use the command below for help:
292
+
293
+ ```bash
294
+ qwen-asr-demo --help
295
+ ```
296
+
297
+ To launch the demo, you can use the following commands:
298
+
299
+ ```bash
300
+ # Transformers backend
301
+ qwen-asr-demo \
302
+ --asr-checkpoint Qwen/Qwen3-ASR-1.7B \
303
+ --backend transformers \
304
+ --cuda-visible-devices 0 \
305
+ --ip 0.0.0.0 --port 8000
306
+
307
+ # Transformers backend + Forced Aligner (enable timestamps)
308
+ qwen-asr-demo \
309
+ --asr-checkpoint Qwen/Qwen3-ASR-1.7B \
310
+ --aligner-checkpoint Qwen/Qwen3-ForcedAligner-0.6B \
311
+ --backend transformers \
312
+ --cuda-visible-devices 0 \
313
+ --backend-kwargs '{"device_map":"cuda:0","dtype":"bfloat16","max_inference_batch_size":8,"max_new_tokens":256}' \
314
+ --aligner-kwargs '{"device_map":"cuda:0","dtype":"bfloat16"}' \
315
+ --ip 0.0.0.0 --port 8000
316
+
317
+ # vLLM backend + Forced Aligner (enable timestamps)
318
+ qwen-asr-demo \
319
+ --asr-checkpoint Qwen/Qwen3-ASR-1.7B \
320
+ --aligner-checkpoint Qwen/Qwen3-ForcedAligner-0.6B \
321
+ --backend vllm \
322
+ --cuda-visible-devices 0 \
323
+ --backend-kwargs '{"gpu_memory_utilization":0.7,"max_inference_batch_size":8,"max_new_tokens":2048}' \
324
+ --aligner-kwargs '{"device_map":"cuda:0","dtype":"bfloat16"}' \
325
+ --ip 0.0.0.0 --port 8000
326
+ ```
327
+
328
+ Then open `http://<your-ip>:8000`, or access it via port forwarding in tools like VS Code.
329
+
330
+ #### Backend Notes
331
+
332
+ This demo supports two backends: transformers and vLLM. All backend-specific initialization parameters should be passed via `--backend-kwargs` as a JSON dict. If not provided, the demo will use sensible defaults.
333
+
334
+ ```bash
335
+ # Example: override transformers init args without flash attention
336
+ --backend-kwargs '{"device_map":"cuda:0","dtype":"bfloat16"}'
337
+
338
+ # Example: override vLLM init args with 65% GPU memory
339
+ --backend-kwargs '{"gpu_memory_utilization":0.65}'
340
+ ```
341
+
342
+ #### CUDA Device Notes
343
+
344
+ Because vLLM does not follow `cuda:0` style device selection, this demo selects GPUs by setting `CUDA_VISIBLE_DEVICES` via `--cuda-visible-devices`.
345
+
346
+ ```bash
347
+ # Use GPU 0
348
+ --cuda-visible-devices 0
349
+
350
+ # Use GPU 1
351
+ --cuda-visible-devices 1
352
+ ```
353
+
354
+ #### Timestamps Notes
355
+
356
+ Timestamps are only available when `--aligner-checkpoint` is provided. If you launch the demo without a forced aligner, the timestamps UI will be hidden automatically.
357
+
358
+ ```bash
359
+ # No forced aligner
360
+ qwen-asr-demo --asr-checkpoint Qwen/Qwen3-ASR-1.7B
361
+
362
+ # With forced aligner
363
+ qwen-asr-demo \
364
+ --asr-checkpoint Qwen/Qwen3-ASR-1.7B \
365
+ --aligner-checkpoint Qwen/Qwen3-ForcedAligner-0.6B
366
+ ```
367
+
368
+ #### HTTPS Notes
369
+
370
+ To avoid browser microphone permission issues after deploying the server, it is recommended/required to run the gradio service over HTTPS (especially when accessed remotely or behind modern browsers/gateways). Use `--ssl-certfile` and `--ssl-keyfile` to enable HTTPS. First, generate a private key and a self-signed certificate (valid for 365 days):
371
+
372
+ ```bash
373
+ openssl req -x509 -newkey rsa:2048 \
374
+ -keyout key.pem -out cert.pem \
375
+ -days 365 -nodes \
376
+ -subj "/CN=localhost"
377
+ ```
378
+
379
+ Then run the demo with HTTPS:
380
+
381
+ ```bash
382
+ qwen-asr-demo \
383
+ --asr-checkpoint Qwen/Qwen3-ASR-1.7B \
384
+ --backend transformers \
385
+ --cuda-visible-devices 0 \
386
+ --ip 0.0.0.0 --port 8000 \
387
+ --ssl-certfile cert.pem \
388
+ --ssl-keyfile key.pem \
389
+ --no-ssl-verify
390
+ ```
391
+
392
+ Then open `https://<your-ip>:8000` to use it. If your browser shows a warning, that’s expected for self-signed certificates. For production, use a real certificate.
393
+
394
+ ### Streaming Demo
395
+
396
+ To experience Qwen3-ASR’s streaming transcription capability in a web UI, we provide a minimal Flask-based streaming demo. The demo captures microphone audio in the browser, resamples it to 16,000 Hz, and continuously pushes PCM chunks to the model. Run the demo with the following command:
397
+
398
+ ```bash
399
+ qwen-asr-demo-streaming \
400
+ --asr-model-path Qwen/Qwen3-ASR-1.7B \
401
+ --host 0.0.0.0 \
402
+ --port 8000 \
403
+ --gpu-memory-utilization 0.9
404
+ ```
405
+
406
+ Then open `http://<your-ip>:8000`, or access it via port forwarding in tools like VS Code.
407
+
408
+ ## Deployment with vLLM
409
+
410
+ vLLM officially provides day-0 model support for Qwen3-ASR for efficient inference.
411
+
412
+ ### Installation
413
+ You can run Qwen3-ASR with vLLM nightly wheel or docker image. To install the nightly version of vLLM, we recommend using `uv` as the environment manager
414
+ ```bash
415
+ uv venv
416
+ source .venv/bin/activate
417
+ uv pip install -U vllm --pre \
418
+ --extra-index-url https://wheels.vllm.ai/nightly/cu129 \
419
+ --extra-index-url https://download.pytorch.org/whl/cu129 \
420
+ --index-strategy unsafe-best-match
421
+ uv pip install "vllm[audio]" # For additional audio dependencies
422
+ ```
423
+
424
+ ### Online Serving
425
+ You can easily deploy Qwen3-ASR with vLLM by running the following command
426
+ ```bash
427
+ vllm serve Qwen/Qwen3-ASR-1.7B
428
+ ```
429
+ After the model server is successfully deployed, you can interact with it in multiple ways.
430
+
431
+ #### Using OpenAI SDK
432
+ ```python
433
+ import base64
434
+ import httpx
435
+ from openai import OpenAI
436
+
437
+ # Initialize client
438
+ client = OpenAI(
439
+ base_url="http://localhost:8000/v1",
440
+ api_key="EMPTY"
441
+ )
442
+
443
+ # Create multimodal chat completion request
444
+ response = client.chat.completions.create(
445
+ model="Qwen/Qwen3-ASR-1.7B",
446
+ messages=[
447
+ {
448
+ "role": "user",
449
+ "content": [
450
+ {
451
+ "type": "audio_url",
452
+ "audio_url": {
453
+ {"url": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-ASR-Repo/asr_en.wav"}
454
+ }
455
+ }
456
+ ]
457
+ }
458
+ ],
459
+ )
460
+
461
+ print(response.choices[0].message.content)
462
+ ```
463
+ This model is also supported on vLLM with OpenAI transcription API.
464
+ ```python
465
+ import httpx
466
+ from openai import OpenAI
467
+
468
+ # Initialize client
469
+ client = OpenAI(
470
+ base_url="http://localhost:8000/v1",
471
+ api_key="EMPTY"
472
+ )
473
+ audio_url = "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-ASR-Repo/asr_en.wav"
474
+ audio_file = httpx.get(audio_url).content
475
+
476
+ transcription = client.audio.transcriptions.create(
477
+ model="Qwen/Qwen3-ASR-1.7B",
478
+ file=audio_file,
479
+ )
480
+
481
+ print(transcription.text)
482
+ ```
483
+
484
+ #### Using cURL
485
+ ```bash
486
+ curl http://localhost:8000/v1/chat/completions \
487
+ -H "Content-Type: application/json" \
488
+ -d '{
489
+ "messages": [
490
+ {"role": "user", "content": [
491
+ {"type": "audio_url", "audio_url": {"url": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-ASR-Repo/asr_en.wav"}}
492
+ ]}
493
+ ]
494
+ }'
495
+ ```
496
+
497
+ ### Offline Inference
498
+ See the following example on using vLLM to run offline inference with Qwen3-ASR
499
+ ```python
500
+ from vllm import LLM, SamplingParams
501
+ from vllm.assets.audio import AudioAsset
502
+ import base64
503
+ import requests
504
+
505
+ # Initialize the LLM
506
+ llm = LLM(
507
+ model="Qwen/Qwen3-ASR-1.7B"
508
+ )
509
+
510
+ # Load audio
511
+ audio_asset = AudioAsset("winning_call")
512
+
513
+ # Create conversation with audio content
514
+ conversation = [
515
+ {
516
+ "role": "user",
517
+ "content": [
518
+ {
519
+ "type": "audio_url",
520
+ "audio_url": {"url": audio_asset.url}
521
+ }
522
+ ]
523
+ }
524
+ ]
525
+
526
+ sampling_params = SamplingParams(temperature=0.01, max_tokens=256)
527
+
528
+ # Run inference using .chat()
529
+ outputs = llm.chat(conversation, sampling_params=sampling_params)
530
+ print(outputs[0].outputs[0].text)
531
+ ```
532
+
533
+
534
+ ## Docker
535
+
536
+ To make it easier to use our `qwen-asr` Python package, we provide a pre-built Docker image: [qwenllm/qwen3-asr](https://hub.docker.com/r/qwenllm/qwen3-asr). You only need to install the GPU driver and download the model files to run the code. Please follow the [NVIDIA Container Toolkit installation guide](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html) to ensure Docker can access your GPU. If you are in Mainland China and have trouble reaching Docker Hub, you may use a registry mirror to accelerate image pulls.
537
+
538
+ First, pull the image and start a container:
539
+
540
+ ```bash
541
+ LOCAL_WORKDIR=/path/to/your/workspace
542
+ HOST_PORT=8000
543
+ CONTAINER_PORT=80
544
+ docker run --gpus all --name qwen3-asr \
545
+ -v /var/run/docker.sock:/var/run/docker.sock -p $HOST_PORT:$CONTAINER_PORT \
546
+ --mount type=bind,source=$LOCAL_WORKDIR,target=/data/shared/Qwen3-ASR \
547
+ --shm-size=4gb \
548
+ -it qwenllm/qwen3-asr:latest
549
+ ```
550
+
551
+ After running the command, you will enter the container’s bash shell. Your local workspace (**replace** `/path/to/your/workspace` **with the actual path**) will be mounted inside the container at `/data/shared/Qwen3-ASR`. Port `8000` on the host is mapped to port `80` in the container, so you can access services running in the container via `http://<host-ip>:8000`. Note that services inside the container must bind to `0.0.0.0` (not `127.0.0.1`) for port forwarding to work.
552
+
553
+ If you exit the container, you can start it again and re-enter it with:
554
+
555
+ ```bash
556
+ docker start qwen3-asr
557
+ docker exec -it qwen3-asr bash
558
+ ```
559
+
560
+ To remove the container completely, run:
561
+
562
+ ```bash
563
+ docker rm -f qwen3-asr
564
+ ```
565
+
566
+
567
+ ## Evaluation
568
+
569
+ During evaluation, we ran inference for all models with `dtype=torch.bfloat16` and set `max_new_tokens=1024` using vLLM. Greedy search was used for all decoding, and none of the tests specified a language parameter. The detailed evaluation results are shown below.
570
+
571
+ <details>
572
+ <summary>ASR Benchmarks on Public Datasets (WER ↓)</summary>
573
+
574
+ <table>
575
+ <thead>
576
+ <tr>
577
+ <th colspan="2" style="text-align: left;"></th>
578
+ <th style="text-align: center;">GPT-4o<br>-Transcribe</th>
579
+ <th style="text-align: center;">Gemini-2.5<br>-Pro</th>
580
+ <th style="text-align: center;">Doubao-ASR</th>
581
+ <th style="text-align: center;">Whisper<br>-large-v3</th>
582
+ <th style="text-align: center;">Fun-ASR<br>-MLT-Nano</th>
583
+ <th style="text-align: center;">Qwen3-ASR<br>-0.6B</th>
584
+ <th style="text-align: center;">Qwen3-ASR<br>-1.7B</th>
585
+ </tr>
586
+ </thead>
587
+ <tbody>
588
+ <tr>
589
+ <td colspan="9" style="text-align: left; font-style: italic; border-top: 1px solid #ddd; border-bottom: 1px solid #ddd;">English (en)</td>
590
+ </tr>
591
+ <tr>
592
+ <td colspan="2" style="text-align: left;">Librispeech<br>clean | other</td>
593
+ <td style="text-align: center;"><strong>1.39</strong> | 3.75</td>
594
+ <td style="text-align: center;">2.89 | 3.56</td>
595
+ <td style="text-align: center;">2.78 | 5.70</td>
596
+ <td style="text-align: center;">1.51 | 3.97</td>
597
+ <td style="text-align: center;">1.68 | 4.03</td>
598
+ <td style="text-align: center;">2.11 | 4.55</td>
599
+ <td style="text-align: center;">1.63 | <strong>3.38</strong></td>
600
+ </tr>
601
+ <tr>
602
+ <td colspan="2" style="text-align: left;">GigaSpeech</td>
603
+ <td style="text-align: center;">25.50</td>
604
+ <td style="text-align: center;">9.37</td>
605
+ <td style="text-align: center;">9.55</td>
606
+ <td style="text-align: center;">9.76</td>
607
+ <td style="text-align: center;">-</td>
608
+ <td style="text-align: center;">8.88</td>
609
+ <td style="text-align: center;"><strong>8.45</strong></td>
610
+ </tr>
611
+ <tr>
612
+ <td colspan="2" style="text-align: left;">CV-en</td>
613
+ <td style="text-align: center;">9.08</td>
614
+ <td style="text-align: center;">14.49</td>
615
+ <td style="text-align: center;">13.78</td>
616
+ <td style="text-align: center;">9.90</td>
617
+ <td style="text-align: center;">9.90</td>
618
+ <td style="text-align: center;">9.92</td>
619
+ <td style="text-align: center;"><strong>7.39</strong></td>
620
+ </tr>
621
+ <tr>
622
+ <td colspan="2" style="text-align: left;">Fleurs-en</td>
623
+ <td style="text-align: center;"><strong>2.40</strong></td>
624
+ <td style="text-align: center;">2.94</td>
625
+ <td style="text-align: center;">6.31</td>
626
+ <td style="text-align: center;">4.08</td>
627
+ <td style="text-align: center;">5.49</td>
628
+ <td style="text-align: center;">4.39</td>
629
+ <td style="text-align: center;">3.35</td>
630
+ </tr>
631
+ <tr>
632
+ <td colspan="2" style="text-align: left;">MLS-en</td>
633
+ <td style="text-align: center;">5.12</td>
634
+ <td style="text-align: center;"><strong>3.68</strong></td>
635
+ <td style="text-align: center;">7.09</td>
636
+ <td style="text-align: center;">4.87</td>
637
+ <td style="text-align: center;">-</td>
638
+ <td style="text-align: center;">6.00</td>
639
+ <td style="text-align: center;">4.58</td>
640
+ </tr>
641
+ <tr>
642
+ <td colspan="2" style="text-align: left;">Tedlium</td>
643
+ <td style="text-align: center;">7.69</td>
644
+ <td style="text-align: center;">6.15</td>
645
+ <td style="text-align: center;">4.91</td>
646
+ <td style="text-align: center;">6.84</td>
647
+ <td style="text-align: center;">-</td>
648
+ <td style="text-align: center;"><strong>3.85<strong></td>
649
+ <td style="text-align: center;"><strong>4.50</strong></td>
650
+ </tr>
651
+ <tr>
652
+ <td colspan="2" style="text-align: left;">VoxPopuli</td>
653
+ <td style="text-align: center;">10.29</td>
654
+ <td style="text-align: center;">11.36</td>
655
+ <td style="text-align: center;">12.12</td>
656
+ <td style="text-align: center;">12.05</td>
657
+ <td style="text-align: center;">-</td>
658
+ <td style="text-align: center;"><strong>9.96<strong></td>
659
+ <td style="text-align: center;"><strong>9.15</strong></td>
660
+ </tr>
661
+ <tr>
662
+ <td colspan="9" style="text-align: left; font-style: italic; border-top: 1px solid #ddd; border-bottom: 1px solid #ddd;">Chinese (zh)</td>
663
+ </tr>
664
+ <tr>
665
+ <td colspan="2" style="text-align: left;">WenetSpeech<br>net | meeting</td>
666
+ <td style="text-align: center;">15.30 | 32.27</td>
667
+ <td style="text-align: center;">14.43 | 13.47</td>
668
+ <td style="text-align: center;">N/A</td>
669
+ <td style="text-align: center;">9.86 | 19.11</td>
670
+ <td style="text-align: center;">6.35 | -</td>
671
+ <td style="text-align: center;">5.97 | 6.88</td>
672
+ <td style="text-align: center;"><strong>4.97</strong> | <strong>5.88</strong></td>
673
+ </tr>
674
+ <tr>
675
+ <td colspan="2" style="text-align: left;">AISHELL-2-test</td>
676
+ <td style="text-align: center;">4.24</td>
677
+ <td style="text-align: center;">11.62</td>
678
+ <td style="text-align: center;">2.85</td>
679
+ <td style="text-align: center;">5.06</td>
680
+ <td style="text-align: center;">-</td>
681
+ <td style="text-align: center;">3.15</td>
682
+ <td style="text-align: center;"><strong>2.71</strong></td>
683
+ </tr>
684
+ <tr>
685
+ <td colspan="2" style="text-align: left;">SpeechIO</td>
686
+ <td style="text-align: center;">12.86</td>
687
+ <td style="text-align: center;">5.30</td>
688
+ <td style="text-align: center;">2.93</td>
689
+ <td style="text-align: center;">7.56</td>
690
+ <td style="text-align: center;">-</td>
691
+ <td style="text-align: center;">3.44</td>
692
+ <td style="text-align: center;"><strong>2.88</strong></td>
693
+ </tr>
694
+ <tr>
695
+ <td colspan="2" style="text-align: left;">Fleurs-zh</td>
696
+ <td style="text-align: center;">2.44</td>
697
+ <td style="text-align: center;">2.71</td>
698
+ <td style="text-align: center;">2.69</td>
699
+ <td style="text-align: center;">4.09</td>
700
+ <td style="text-align: center;">3.51</td>
701
+ <td style="text-align: center;">2.88</td>
702
+ <td style="text-align: center;"><strong>2.41</strong></td>
703
+ </tr>
704
+ <tr>
705
+ <td colspan="2" style="text-align: left;">CV-zh</td>
706
+ <td style="text-align: center;">6.32</td>
707
+ <td style="text-align: center;">7.70</td>
708
+ <td style="text-align: center;">5.95</td>
709
+ <td style="text-align: center;">12.91</td>
710
+ <td style="text-align: center;">6.20</td>
711
+ <td style="text-align: center;">6.89</td>
712
+ <td style="text-align: center;"><strong>5.35</strong></td>
713
+ </tr>
714
+ <tr>
715
+ <td colspan="9" style="text-align: left; font-style: italic; border-top: 1px solid #ddd; border-bottom: 1px solid #ddd;">Chinese Dialect</td>
716
+ </tr>
717
+ <tr>
718
+ <td colspan="2" style="text-align: left;">KeSpeech</td>
719
+ <td style="text-align: center;">26.87</td>
720
+ <td style="text-align: center;">24.71</td>
721
+ <td style="text-align: center;">5.27</td>
722
+ <td style="text-align: center;">28.79</td>
723
+ <td style="text-align: center;">-</td>
724
+ <td style="text-align: center;">7.08</td>
725
+ <td style="text-align: center;"><strong>5.10</strong></td>
726
+ </tr>
727
+ <tr>
728
+ <td colspan="2" style="text-align: left;">Fleurs-yue</td>
729
+ <td style="text-align: center;">4.98</td>
730
+ <td style="text-align: center;">9.43</td>
731
+ <td style="text-align: center;">4.98</td>
732
+ <td style="text-align: center;">9.18</td>
733
+ <td style="text-align: center;">-</td>
734
+ <td style="text-align: center;">5.79</td>
735
+ <td style="text-align: center;"><strong>3.98</strong></td>
736
+ </tr>
737
+ <tr>
738
+ <td colspan="2" style="text-align: left;">CV-yue</td>
739
+ <td style="text-align: center;">11.36</td>
740
+ <td style="text-align: center;">18.76</td>
741
+ <td style="text-align: center;">13.20</td>
742
+ <td style="text-align: center;">16.23</td>
743
+ <td style="text-align: center;">-</td>
744
+ <td style="text-align: center;">9.50</td>
745
+ <td style="text-align: center;"><strong>7.57</strong></td>
746
+ </tr>
747
+ <tr>
748
+ <td colspan="2" style="text-align: left;">CV-zh-tw</td>
749
+ <td style="text-align: center;">6.32</td>
750
+ <td style="text-align: center;">7.31</td>
751
+ <td style="text-align: center;">4.06</td>
752
+ <td style="text-align: center;">7.84</td>
753
+ <td style="text-align: center;">-</td>
754
+ <td style="text-align: center;">5.59</td>
755
+ <td style="text-align: center;"><strong>3.77</strong></td>
756
+ </tr>
757
+ <tr>
758
+ <td colspan="2" style="text-align: left;">WenetSpeech-Yue<br>short | long</td>
759
+ <td style="text-align: center;">15.62 | 25.29</td>
760
+ <td style="text-align: center;">25.19 | 11.23</td>
761
+ <td style="text-align: center;">9.74 | 11.40</td>
762
+ <td style="text-align: center;">32.26 | 46.64</td>
763
+ <td style="text-align: center;">- | -</td>
764
+ <td style="text-align: center;">7.54 | 9.92</td>
765
+ <td style="text-align: center;"><strong>5.82</strong> | <strong>8.85</strong></td>
766
+ </tr>
767
+ <tr>
768
+ <td colspan="2" style="text-align: left;">WenetSpeech-Chuan<br>easy | hard</td>
769
+ <td style="text-align: center;">34.81 | 53.98</td>
770
+ <td style="text-align: center;">43.79 | 67.30</td>
771
+ <td style="text-align: center;"><strong>11.40<strong> | <strong>20.20</strong></td>
772
+ <td style="text-align: center;">14.35 | 26.80</td>
773
+ <td style="text-align: center;">- | -</td>
774
+ <td style="text-align: center;">13.92 | 24.45</td>
775
+ <td style="text-align: center;">11.99 | 21.63</td>
776
+ </tr>
777
+ </tbody>
778
+ </table>
779
+
780
+ </details>
781
+
782
+ <details>
783
+ <summary>ASR Benchmarks on Internal Datasets (WER ↓)</summary>
784
+
785
+ <table>
786
+ <thead>
787
+ <tr>
788
+ <th style="text-align: left;"></th>
789
+ <th style="text-align: center;">GPT-4o<br>-Transcribe</th>
790
+ <th style="text-align: center;">Gemini-2.5<br>-Pro</th>
791
+ <th style="text-align: center;">Doubao-ASR</th>
792
+ <th style="text-align: center;">Whisper<br>-large-v3</th>
793
+ <th style="text-align: center;">Fun-ASR<br>-MLT-Nano</th>
794
+ <th style="text-align: center;">Qwen3-ASR<br>-0.6B</th>
795
+ <th style="text-align: center;">Qwen3-ASR<br>-1.7B</th>
796
+ </tr>
797
+ </thead>
798
+ <tbody>
799
+ <tr>
800
+ <td colspan="8" style="text-align: left; font-style: italic; border-top: 1px solid #ddd; border-bottom: 1px solid #ddd;">Accented English</td>
801
+ </tr>
802
+ <tr>
803
+ <td style="text-align: left;">Dialog-Accented English</td>
804
+ <td style="text-align: center;">28.56</td>
805
+ <td style="text-align: center;">23.85</td>
806
+ <td style="text-align: center;">20.41</td>
807
+ <td style="text-align: center;">21.30</td>
808
+ <td style="text-align: center;">19.96</td>
809
+ <td style="text-align: center;"><strong>16.62<strong></td>
810
+ <td style="text-align: center;"><strong>16.07</strong></td>
811
+ </tr>
812
+ <tr>
813
+ <td colspan="8" style="text-align: left; font-style: italic; border-top: 1px solid #ddd; border-bottom: 1px solid #ddd;">Chinese Mandarin</td>
814
+ </tr>
815
+ <tr>
816
+ <td style="text-align: left;">Elders&Kids</td>
817
+ <td style="text-align: center;">14.27</td>
818
+ <td style="text-align: center;">36.93</td>
819
+ <td style="text-align: center;">4.17</td>
820
+ <td style="text-align: center;">10.61</td>
821
+ <td style="text-align: center;">4.54</td>
822
+ <td style="text-align: center;">4.48</td>
823
+ <td style="text-align: center;"><strong>3.81</strong></td>
824
+ </tr>
825
+ <tr>
826
+ <td style="text-align: left;">ExtremeNoise</td>
827
+ <td style="text-align: center;">36.11</td>
828
+ <td style="text-align: center;">29.06</td>
829
+ <td style="text-align: center;">17.04</td>
830
+ <td style="text-align: center;">63.17</td>
831
+ <td style="text-align: center;">36.55</td>
832
+ <td style="text-align: center;">17.88</td>
833
+ <td style="text-align: center;"><strong>16.17</strong></td>
834
+ </tr>
835
+ <tr>
836
+ <td style="text-align: left;">TongueTwister</td>
837
+ <td style="text-align: center;">20.87</td>
838
+ <td style="text-align: center;">4.97</td>
839
+ <td style="text-align: center;">3.47</td>
840
+ <td style="text-align: center;">16.63</td>
841
+ <td style="text-align: center;">9.02</td>
842
+ <td style="text-align: center;">4.06</td>
843
+ <td style="text-align: center;"><strong>2.44</strong></td>
844
+ </tr>
845
+ <tr>
846
+ <td style="text-align: left;">Dialog-Mandarin</td>
847
+ <td style="text-align: center;">20.73</td>
848
+ <td style="text-align: center;">12.50</td>
849
+ <td style="text-align: center;">6.61</td>
850
+ <td style="text-align: center;">14.01</td>
851
+ <td style="text-align: center;">7.32</td>
852
+ <td style="text-align: center;">7.06</td>
853
+ <td style="text-align: center;"><strong>6.54</strong></td>
854
+ </tr>
855
+ <tr>
856
+ <td colspan="8" style="text-align: left; font-style: italic; border-top: 1px solid #ddd; border-bottom: 1px solid #ddd;">Chinese Dialect</td>
857
+ </tr>
858
+ <tr>
859
+ <td style="text-align: left;">Dialog-Cantonese</td>
860
+ <td style="text-align: center;">16.05</td>
861
+ <td style="text-align: center;">14.98</td>
862
+ <td style="text-align: center;">7.56</td>
863
+ <td style="text-align: center;">31.04</td>
864
+ <td style="text-align: center;">5.85</td>
865
+ <td style="text-align: center;"><strong>4.80<strong></td>
866
+ <td style="text-align: center;"><strong>4.12</strong></td>
867
+ </tr>
868
+ <tr>
869
+ <td style="text-align: left;">Dialog-Chinese Dialects</td>
870
+ <td style="text-align: center;">45.37</td>
871
+ <td style="text-align: center;">47.70</td>
872
+ <td style="text-align: center;">19.85</td>
873
+ <td style="text-align: center;">44.55</td>
874
+ <td style="text-align: center;">19.41</td>
875
+ <td style="text-align: center;"><strong>18.24<strong></td>
876
+ <td style="text-align: center;"><strong>15.94</strong></td>
877
+ </tr>
878
+ </tbody>
879
+ </table>
880
+ <p><strong>Dialect coverage:</strong> Results for <em>Dialog-Accented English</em> are averaged over 16 accents, and results for <em>Dialog-Chinese Dialects</em> are averaged over 22 Chinese dialects.</p>
881
+
882
+ </details>
883
+
884
+ <details>
885
+ <summary>Multilingual ASR Benchmarks (WER ↓)</summary>
886
+
887
+ <table>
888
+ <thead>
889
+ <tr>
890
+ <th style="text-align: left;"></th>
891
+ <th style="text-align: center;">GLM-ASR<br>-Nano-2512</th>
892
+ <th style="text-align: center;">Whisper<br>-large-v3</th>
893
+ <th style="text-align: center;">Fun-ASR<br>-MLT-Nano</th>
894
+ <th style="text-align: center;">Qwen3-ASR<br>-0.6B</th>
895
+ <th style="text-align: center;">Qwen3-ASR<br>-1.7B</th>
896
+ </tr>
897
+ </thead>
898
+ <tbody>
899
+ <tr>
900
+ <td colspan="6" style="text-align: left; font-style: italic; border-top: 1px solid #ddd; border-bottom: 1px solid #ddd;">Open-sourced Benchmarks</td>
901
+ </tr>
902
+ <tr>
903
+ <td style="text-align: left;">MLS</td>
904
+ <td style="text-align: center;">13.32</td>
905
+ <td style="text-align: center;">8.62</td>
906
+ <td style="text-align: center;">28.70</td>
907
+ <td style="text-align: center;">13.19</td>
908
+ <td style="text-align: center;"><strong>8.55</strong></td>
909
+ </tr>
910
+ <tr>
911
+ <td style="text-align: left;">CommonVoice</td>
912
+ <td style="text-align: center;">19.40</td>
913
+ <td style="text-align: center;">10.77</td>
914
+ <td style="text-align: center;">17.25</td>
915
+ <td style="text-align: center;">12.75</td>
916
+ <td style="text-align: center;"><strong>9.18</strong></td>
917
+ </tr>
918
+ <tr>
919
+ <td style="text-align: left;">MLC-SLM</td>
920
+ <td style="text-align: center;">34.93</td>
921
+ <td style="text-align: center;">15.68</td>
922
+ <td style="text-align: center;">29.94</td>
923
+ <td style="text-align: center;">15.84</td>
924
+ <td style="text-align: center;"><strong>12.74</strong></td>
925
+ </tr>
926
+ <tr>
927
+ <td style="text-align: left;">Fleurs</td>
928
+ <td style="text-align: center;">16.08</td>
929
+ <td style="text-align: center;">5.27</td>
930
+ <td style="text-align: center;">10.03</td>
931
+ <td style="text-align: center;">7.57</td>
932
+ <td style="text-align: center;"><strong>4.90</strong></td>
933
+ </tr>
934
+ <tr>
935
+ <td style="text-align: left;">Fleurs<sup>†</sup></td>
936
+ <td style="text-align: center;">20.05</td>
937
+ <td style="text-align: center;">6.85</td>
938
+ <td style="text-align: center;">31.89</td>
939
+ <td style="text-align: center;">10.37</td>
940
+ <td style="text-align: center;"><strong>6.62</strong></td>
941
+ </tr>
942
+ <tr>
943
+ <td style="text-align: left;">Fleurs<sup>††</sup></td>
944
+ <td style="text-align: center;">24.83</td>
945
+ <td style="text-align: center;"><strong>8.16</strong></td>
946
+ <td style="text-align: center;">47.84</td>
947
+ <td style="text-align: center;">21.80</td>
948
+ <td style="text-align: center;">12.60</td>
949
+ </tr>
950
+ <tr>
951
+ <td colspan="6" style="text-align: left; font-style: italic; border-top: 1px solid #ddd; border-bottom: 1px solid #ddd;">Qwen-ASR Internal Benchmarks</td>
952
+ </tr>
953
+ <tr>
954
+ <td style="text-align: left;">News-Multilingual</td>
955
+ <td style="text-align: center;">49.40</td>
956
+ <td style="text-align: center;">14.80</td>
957
+ <td style="text-align: center;">65.07</td>
958
+ <td style="text-align: center;">17.39</td>
959
+ <td style="text-align: center;"><strong>12.80</strong></td>
960
+ </tr>
961
+ </tbody>
962
+ </table>
963
+ <p><strong>Language coverage:</strong> <em>MLS</em> includes 8 languages: {da, de, en, es, fr, it, pl, pt}.<br><em>CommonVoice</em> includes 13 languages: {en, zh, yue, zh_TW, ar, de, es, fr, it, ja, ko, pt, ru}.<br><em>MLC-SLM</em> includes 11 languages: {en, fr, de, it, pt, es, ja, ko, ru, th, vi}.<br><em>Fleurs</em> includes 12 languages: {en, zh, yue, ar, de, es, fr, it, ja, ko, pt, ru }.<br><em>Fleurs<sup>†</sup></em> includes 8 additional languages beyond Fleurs: {hi, id, ms, nl, pl, th, tr, vi}.<br><em>Fleurs<sup>††</sup></em> includes 10 additional languages beyond Fleurs<sup>†</sup>: {cs, da, el, fa, fi, fil, hu, mk, ro, sv}.<br><em>News-Multilingual</em> includes 15 languages: {ar, de, es, fr, hi, id, it, ja, ko, nl, pl, pt, ru, th, vi}.</p>
964
+
965
+ </details>
966
+
967
+ <details>
968
+ <summary>Language Identification Accuracy (%) ↑</summary>
969
+
970
+ <table>
971
+ <thead>
972
+ <tr>
973
+ <th style="text-align: left;"></th>
974
+ <th style="text-align: center;">Whisper-large-v3</th>
975
+ <th style="text-align: center;">Qwen3-ASR-0.6B</th>
976
+ <th style="text-align: center;">Qwen3-ASR-1.7B</th>
977
+ </tr>
978
+ </thead>
979
+ <tbody>
980
+ <tr>
981
+ <td style="text-align: left;">MLS</td>
982
+ <td style="text-align: center;"><strong>99.9</strong></td>
983
+ <td style="text-align: center;">99.3</td>
984
+ <td style="text-align: center;"><strong>99.9</strong></td>
985
+ </tr>
986
+ <tr>
987
+ <td style="text-align: left;">CommonVoice</td>
988
+ <td style="text-align: center;">92.7</td>
989
+ <td style="text-align: center;"><strong>98.2<strong></td>
990
+ <td style="text-align: center;"><strong>98.7</strong></td>
991
+ </tr>
992
+ <tr>
993
+ <td style="text-align: left;">MLC-SLM</td>
994
+ <td style="text-align: center;">89.2</td>
995
+ <td style="text-align: center;"><strong>92.7<strong></td>
996
+ <td style="text-align: center;"><strong>94.1</strong></td>
997
+ </tr>
998
+ <tr>
999
+ <td style="text-align: left;">Fleurs</td>
1000
+ <td style="text-align: center;">94.6</td>
1001
+ <td style="text-align: center;"><strong>97.1<strong></td>
1002
+ <td style="text-align: center;"><strong>98.7</strong></td>
1003
+ </tr>
1004
+ <tr style="border-top: 1px solid #ddd;">
1005
+ <td style="text-align: left;"><em>Avg.</em></td>
1006
+ <td style="text-align: center;">94.1</td>
1007
+ <td style="text-align: center;"><strong>96.8<strong></td>
1008
+ <td style="text-align: center;"><strong>97.9</strong></td>
1009
+ </tr>
1010
+ </tbody>
1011
+ </table>
1012
+ <p><strong>Language coverage:</strong> The language sets follow Multilingual ASR Benchmarks. Here, Fleurs corresponds to Fleurs<sup>††</sup> in Multilingual ASR Benchmarks and covers 30 languages.</p>
1013
+
1014
+ </details>
1015
+
1016
+ <details>
1017
+ <summary>Singing Voice & Song Transcription (WER ↓)</summary>
1018
+
1019
+ <table>
1020
+ <thead>
1021
+ <tr>
1022
+ <th style="text-align: left;"></th>
1023
+ <th style="text-align: center;">GPT-4o<br>-Transcribe</th>
1024
+ <th style="text-align: center;">Gemini-2.5<br>-Pro</th>
1025
+ <th style="text-align: center;">Doubao-ASR<br>-1.0</th>
1026
+ <th style="text-align: center;">Whisper<br>-large-v3</th>
1027
+ <th style="text-align: center;">Fun-ASR-MLT<br>-Nano</th>
1028
+ <th style="text-align: center;">Qwen3-ASR<br>-1.7B</th>
1029
+ </tr>
1030
+ </thead>
1031
+ <tbody>
1032
+ <tr>
1033
+ <td colspan="7" style="text-align: left; font-style: italic; border-top: 1px solid #ddd; border-bottom: 1px solid #ddd;">Singing</td>
1034
+ </tr>
1035
+ <tr>
1036
+ <td style="text-align: left;">M4Singer</td>
1037
+ <td style="text-align: center;">16.77</td>
1038
+ <td style="text-align: center;">20.88</td>
1039
+ <td style="text-align: center;">7.88</td>
1040
+ <td style="text-align: center;">13.58</td>
1041
+ <td style="text-align: center;">7.29</td>
1042
+ <td style="text-align: center;"><strong>5.98</strong></td>
1043
+ </tr>
1044
+ <tr>
1045
+ <td style="text-align: left;">MIR-1k-vocal</td>
1046
+ <td style="text-align: center;">11.87</td>
1047
+ <td style="text-align: center;">9.85</td>
1048
+ <td style="text-align: center;">6.56</td>
1049
+ <td style="text-align: center;">11.71</td>
1050
+ <td style="text-align: center;">8.17</td>
1051
+ <td style="text-align: center;"><strong>6.25</strong></td>
1052
+ </tr>
1053
+ <tr>
1054
+ <td style="text-align: left;">Opencpop</td>
1055
+ <td style="text-align: center;">7.93</td>
1056
+ <td style="text-align: center;">6.49</td>
1057
+ <td style="text-align: center;">3.80</td>
1058
+ <td style="text-align: center;">9.52</td>
1059
+ <td style="text-align: center;"><strong>2.98</strong></td>
1060
+ <td style="text-align: center;">3.08</td>
1061
+ </tr>
1062
+ <tr>
1063
+ <td style="text-align: left;">Popcs</td>
1064
+ <td style="text-align: center;">32.84</td>
1065
+ <td style="text-align: center;">15.13</td>
1066
+ <td style="text-align: center;">8.97</td>
1067
+ <td style="text-align: center;">13.77</td>
1068
+ <td style="text-align: center;">9.42</td>
1069
+ <td style="text-align: center;"><strong>8.52</strong></td>
1070
+ </tr>
1071
+ <tr>
1072
+ <td colspan="7" style="text-align: left; font-style: italic; border-top: 1px solid #ddd; border-bottom: 1px solid #ddd;">Songs with BGM</td>
1073
+ </tr>
1074
+ <tr>
1075
+ <td style="text-align: left;">EntireSongs-en</td>
1076
+ <td style="text-align: center;">30.71</td>
1077
+ <td style="text-align: center;"><strong>12.18</strong></td>
1078
+ <td style="text-align: center;">33.51</td>
1079
+ <td style="text-align: center;">N/A</td>
1080
+ <td style="text-align: center;">N/A</td>
1081
+ <td style="text-align: center;">14.60</td>
1082
+ </tr>
1083
+ <tr>
1084
+ <td style="text-align: left;">EntireSongs-zh</td>
1085
+ <td style="text-align: center;">34.86</td>
1086
+ <td style="text-align: center;">18.68</td>
1087
+ <td style="text-align: center;">23.99</td>
1088
+ <td style="text-align: center;">N/A</td>
1089
+ <td style="text-align: center;">N/A</td>
1090
+ <td style="text-align: center;"><strong>13.91</strong></td>
1091
+ </tr>
1092
+ </tbody>
1093
+ </table>
1094
+
1095
+ </details>
1096
+
1097
+ <details>
1098
+ <summary>ASR Inference Mode Performance (WER ↓)</summary>
1099
+
1100
+ <table>
1101
+ <thead>
1102
+ <tr>
1103
+ <th style="text-align: left;">Model</th>
1104
+ <th style="text-align: left;">Infer. Mode</th>
1105
+ <th style="text-align: center;">Librispeech</th>
1106
+ <th style="text-align: center;">Fleurs-en</th>
1107
+ <th style="text-align: center;">Fleurs-zh</th>
1108
+ <th style="text-align: center;">Avg.</th>
1109
+ </tr>
1110
+ </thead>
1111
+ <tbody>
1112
+ <tr>
1113
+ <td rowspan="2" style="text-align: left; vertical-align: middle;">Qwen3-ASR-1.7B</td>
1114
+ <td style="text-align: left;">Offline</td>
1115
+ <td style="text-align: center;">1.63 | 3.38</td>
1116
+ <td style="text-align: center;">3.35</td>
1117
+ <td style="text-align: center;">2.41</td>
1118
+ <td style="text-align: center;">2.69</td>
1119
+ </tr>
1120
+ <tr>
1121
+ <td style="text-align: left;">Streaming</td>
1122
+ <td style="text-align: center;">1.95 | 4.51</td>
1123
+ <td style="text-align: center;">4.02</td>
1124
+ <td style="text-align: center;">2.84</td>
1125
+ <td style="text-align: center;">3.33</td>
1126
+ </tr>
1127
+ <tr style="border-top: 1px solid #ddd;">
1128
+ <td rowspan="2" style="text-align: left; vertical-align: middle;">Qwen3-ASR-0.6B</td>
1129
+ <td style="text-align: left;">Offline</td>
1130
+ <td style="text-align: center;">2.11 | 4.55</td>
1131
+ <td style="text-align: center;">4.39</td>
1132
+ <td style="text-align: center;">2.88</td>
1133
+ <td style="text-align: center;">3.48</td>
1134
+ </tr>
1135
+ <tr>
1136
+ <td style="text-align: left;">Streaming</td>
1137
+ <td style="text-align: center;">2.54 | 6.27</td>
1138
+ <td style="text-align: center;">5.38</td>
1139
+ <td style="text-align: center;">3.40</td>
1140
+ <td style="text-align: center;">4.40</td>
1141
+ </tr>
1142
+ </tbody>
1143
+ </table>
1144
+
1145
+ </details>
1146
+
1147
+ <details>
1148
+ <summary>Forced Alignment Benchmarks (AAS ms ↓)</summary>
1149
+
1150
+ <table>
1151
+ <thead>
1152
+ <tr>
1153
+ <th style="text-align: left;"></th>
1154
+ <th style="text-align: center;">Monotonic-Aligner</th>
1155
+ <th style="text-align: center;">NFA</th>
1156
+ <th style="text-align: center;">WhisperX</th>
1157
+ <th style="text-align: center;">Qwen3-ForcedAligner-0.6B</th>
1158
+ </tr>
1159
+ </thead>
1160
+ <tbody>
1161
+ <tr>
1162
+ <td colspan="5" style="text-align: left; font-style: italic; border-top: 1px solid #ddd; border-bottom: 1px solid #ddd;">MFA-Labeled Raw</td>
1163
+ </tr>
1164
+ <tr>
1165
+ <td style="text-align: left;">Chinese</td>
1166
+ <td style="text-align: center;">161.1</td>
1167
+ <td style="text-align: center;">109.8</td>
1168
+ <td style="text-align: center;">-</td>
1169
+ <td style="text-align: center;"><strong>33.1</strong></td>
1170
+ </tr>
1171
+ <tr>
1172
+ <td style="text-align: left;">English</td>
1173
+ <td style="text-align: center;">-</td>
1174
+ <td style="text-align: center;">107.5</td>
1175
+ <td style="text-align: center;">92.1</td>
1176
+ <td style="text-align: center;"><strong>37.5</strong></td>
1177
+ </tr>
1178
+ <tr>
1179
+ <td style="text-align: left;">French</td>
1180
+ <td style="text-align: center;">-</td>
1181
+ <td style="text-align: center;">100.7</td>
1182
+ <td style="text-align: center;">145.3</td>
1183
+ <td style="text-align: center;"><strong>41.7</strong></td>
1184
+ </tr>
1185
+ <tr>
1186
+ <td style="text-align: left;">German</td>
1187
+ <td style="text-align: center;">-</td>
1188
+ <td style="text-align: center;">122.7</td>
1189
+ <td style="text-align: center;">165.1</td>
1190
+ <td style="text-align: center;"><strong>46.5</strong></td>
1191
+ </tr>
1192
+ <tr>
1193
+ <td style="text-align: left;">Italian</td>
1194
+ <td style="text-align: center;">-</td>
1195
+ <td style="text-align: center;">142.7</td>
1196
+ <td style="text-align: center;">155.5</td>
1197
+ <td style="text-align: center;"><strong>75.5</strong></td>
1198
+ </tr>
1199
+ <tr>
1200
+ <td style="text-align: left;">Japanese</td>
1201
+ <td style="text-align: center;">-</td>
1202
+ <td style="text-align: center;">-</td>
1203
+ <td style="text-align: center;">-</td>
1204
+ <td style="text-align: center;"><strong>42.2</strong></td>
1205
+ </tr>
1206
+ <tr>
1207
+ <td style="text-align: left;">Korean</td>
1208
+ <td style="text-align: center;">-</td>
1209
+ <td style="text-align: center;">-</td>
1210
+ <td style="text-align: center;">-</td>
1211
+ <td style="text-align: center;"><strong>37.2</strong></td>
1212
+ </tr>
1213
+ <tr>
1214
+ <td style="text-align: left;">Portuguese</td>
1215
+ <td style="text-align: center;">-</td>
1216
+ <td style="text-align: center;">-</td>
1217
+ <td style="text-align: center;">-</td>
1218
+ <td style="text-align: center;"><strong>38.4</strong></td>
1219
+ </tr>
1220
+ <tr>
1221
+ <td style="text-align: left;">Russian</td>
1222
+ <td style="text-align: center;">-</td>
1223
+ <td style="text-align: center;">200.7</td>
1224
+ <td style="text-align: center;">-</td>
1225
+ <td style="text-align: center;"><strong>40.2</strong></td>
1226
+ </tr>
1227
+ <tr>
1228
+ <td style="text-align: left;">Spanish</td>
1229
+ <td style="text-align: center;">-</td>
1230
+ <td style="text-align: center;">124.7</td>
1231
+ <td style="text-align: center;">108.0</td>
1232
+ <td style="text-align: center;"><strong>36.8</strong></td>
1233
+ </tr>
1234
+ <tr>
1235
+ <td style="text-align: left;"><em>Avg.</em></td>
1236
+ <td style="text-align: center;">161.1</td>
1237
+ <td style="text-align: center;">129.8</td>
1238
+ <td style="text-align: center;">133.2</td>
1239
+ <td style="text-align: center;"><strong>42.9</strong></td>
1240
+ </tr>
1241
+ <tr>
1242
+ <td colspan="5" style="text-align: left; font-style: italic; border-top: 1px solid #ddd; border-bottom: 1px solid #ddd;">MFA-Labeled Concat-300s</td>
1243
+ </tr>
1244
+ <tr>
1245
+ <td style="text-align: left;">Chinese</td>
1246
+ <td style="text-align: center;">1742.4</td>
1247
+ <td style="text-align: center;">235.0</td>
1248
+ <td style="text-align: center;">-</td>
1249
+ <td style="text-align: center;"><strong>36.5</strong></td>
1250
+ </tr>
1251
+ <tr>
1252
+ <td style="text-align: left;">English</td>
1253
+ <td style="text-align: center;">-</td>
1254
+ <td style="text-align: center;">226.7</td>
1255
+ <td style="text-align: center;">227.2</td>
1256
+ <td style="text-align: center;"><strong>58.6</strong></td>
1257
+ </tr>
1258
+ <tr>
1259
+ <td style="text-align: left;">French</td>
1260
+ <td style="text-align: center;">-</td>
1261
+ <td style="text-align: center;">230.6</td>
1262
+ <td style="text-align: center;">2052.2</td>
1263
+ <td style="text-align: center;"><strong>53.4</strong></td>
1264
+ </tr>
1265
+ <tr>
1266
+ <td style="text-align: left;">German</td>
1267
+ <td style="text-align: center;">-</td>
1268
+ <td style="text-align: center;">220.3</td>
1269
+ <td style="text-align: center;">993.4</td>
1270
+ <td style="text-align: center;"><strong>62.4</strong></td>
1271
+ </tr>
1272
+ <tr>
1273
+ <td style="text-align: left;">Italian</td>
1274
+ <td style="text-align: center;">-</td>
1275
+ <td style="text-align: center;">290.5</td>
1276
+ <td style="text-align: center;">5719.4</td>
1277
+ <td style="text-align: center;"><strong>81.6</strong></td>
1278
+ </tr>
1279
+ <tr>
1280
+ <td style="text-align: left;">Japanese</td>
1281
+ <td style="text-align: center;">-</td>
1282
+ <td style="text-align: center;">-</td>
1283
+ <td style="text-align: center;">-</td>
1284
+ <td style="text-align: center;"><strong>81.3</strong></td>
1285
+ </tr>
1286
+ <tr>
1287
+ <td style="text-align: left;">Korean</td>
1288
+ <td style="text-align: center;">-</td>
1289
+ <td style="text-align: center;">-</td>
1290
+ <td style="text-align: center;">-</td>
1291
+ <td style="text-align: center;"><strong>42.2</strong></td>
1292
+ </tr>
1293
+ <tr>
1294
+ <td style="text-align: left;">Portuguese</td>
1295
+ <td style="text-align: center;">-</td>
1296
+ <td style="text-align: center;">-</td>
1297
+ <td style="text-align: center;">-</td>
1298
+ <td style="text-align: center;"><strong>50.0</strong></td>
1299
+ </tr>
1300
+ <tr>
1301
+ <td style="text-align: left;">Russian</td>
1302
+ <td style="text-align: center;">-</td>
1303
+ <td style="text-align: center;">283.3</td>
1304
+ <td style="text-align: center;">-</td>
1305
+ <td style="text-align: center;"><strong>43.0</strong></td>
1306
+ </tr>
1307
+ <tr>
1308
+ <td style="text-align: left;">Spanish</td>
1309
+ <td style="text-align: center;">-</td>
1310
+ <td style="text-align: center;">240.2</td>
1311
+ <td style="text-align: center;">4549.9</td>
1312
+ <td style="text-align: center;"><strong>39.6</strong></td>
1313
+ </tr>
1314
+ <tr>
1315
+ <td style="text-align: left;">Cross-lingual</td>
1316
+ <td style="text-align: center;">-</td>
1317
+ <td style="text-align: center;">-</td>
1318
+ <td style="text-align: center;">-</td>
1319
+ <td style="text-align: center;"><strong>34.2</strong></td>
1320
+ </tr>
1321
+ <tr>
1322
+ <td style="text-align: left;"><em>Avg.</em></td>
1323
+ <td style="text-align: center;">1742.4</td>
1324
+ <td style="text-align: center;">246.7</td>
1325
+ <td style="text-align: center;">2708.4</td>
1326
+ <td style="text-align: center;"><strong>52.9</strong></td>
1327
+ </tr>
1328
+ <tr>
1329
+ <td colspan="5" style="text-align: left; font-style: italic; border-top: 1px solid #ddd; border-bottom: 1px solid #ddd;">Human-Labeled</td>
1330
+ </tr>
1331
+ <tr>
1332
+ <td style="text-align: left;">Raw</td>
1333
+ <td style="text-align: center;">49.9</td>
1334
+ <td style="text-align: center;">88.6</td>
1335
+ <td style="text-align: center;">-</td>
1336
+ <td style="text-align: center;"><strong>27.8</strong></td>
1337
+ </tr>
1338
+ <tr>
1339
+ <td style="text-align: left;">Raw-Noisy</td>
1340
+ <td style="text-align: center;">53.3</td>
1341
+ <td style="text-align: center;">89.5</td>
1342
+ <td style="text-align: center;">-</td>
1343
+ <td style="text-align: center;"><strong>41.8</strong></td>
1344
+ </tr>
1345
+ <tr>
1346
+ <td style="text-align: left;">Concat-60s</td>
1347
+ <td style="text-align: center;">51.1</td>
1348
+ <td style="text-align: center;">86.7</td>
1349
+ <td style="text-align: center;">-</td>
1350
+ <td style="text-align: center;"><strong>25.3</strong></td>
1351
+ </tr>
1352
+ <tr>
1353
+ <td style="text-align: left;">Concat-300s</td>
1354
+ <td style="text-align: center;">410.8</td>
1355
+ <td style="text-align: center;">140.0</td>
1356
+ <td style="text-align: center;">-</td>
1357
+ <td style="text-align: center;"><strong>24.8</strong></td>
1358
+ </tr>
1359
+ <tr>
1360
+ <td style="text-align: left;">Concat-Cross-lingual</td>
1361
+ <td style="text-align: center;">-</td>
1362
+ <td style="text-align: center;">-</td>
1363
+ <td style="text-align: center;">-</td>
1364
+ <td style="text-align: center;"><strong>42.5</strong></td>
1365
+ </tr>
1366
+ <tr>
1367
+ <td style="text-align: left;"><em>Avg.</em></td>
1368
+ <td style="text-align: center;">141.3</td>
1369
+ <td style="text-align: center;">101.2</td>
1370
+ <td style="text-align: center;">-</td>
1371
+ <td style="text-align: center;"><strong>32.4</strong></td>
1372
+ </tr>
1373
+ </tbody>
1374
+ </table>
1375
+
1376
+ </details>
1377
+
1378
+
1379
+ ## Citation
1380
+
1381
+ If you find our paper and code useful in your research, please consider giving a star :star: and citation :pencil: :)
1382
+
1383
+ ```BibTeX
1384
+ @article{Qwen3-ASR,
1385
+ title={Qwen3-ASR Technical Report},
1386
+ author={Xian Shi, Xiong Wang, Zhifang Guo, Yongqi Wang, Pei Zhang, Xinyu Zhang, Zishan Guo, Hongkun Hao, Yu Xi, Baosong Yang, Jin Xu, Jingren Zhou, Junyang Lin},
1387
+ journal={arXiv preprint arXiv:2601.21337},
1388
+ year={2026}
1389
+ }
1390
+ ```
1391
+
1392
+
1393
+ <br>
Qwen3-ASR-1.7B/chat_template.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"chat_template": "{%- set ns = namespace(system_text=\"\") -%}\n{%- for m in messages -%}\n {%- if m.role == 'system' -%}\n {%- if m.content is string -%}\n {%- set ns.system_text = ns.system_text + m.content -%}\n {%- else -%}\n {%- for c in m.content -%}\n {%- if c.type == 'text' and (c.text is defined) -%}\n {%- set ns.system_text = ns.system_text + c.text -%}\n {%- endif -%}\n {%- endfor -%}\n {%- endif -%}\n {%- endif -%}\n{%- endfor -%}\n\n{%- set ns2 = namespace(audio_tokens=\"\") -%}\n{%- for m in messages -%}\n {%- if m.content is not string -%}\n {%- for c in m.content -%}\n {%- if c.type == 'audio' or ('audio' in c) or ('audio_url' in c) -%}\n {%- set ns2.audio_tokens = ns2.audio_tokens + \"<|audio_start|><|audio_pad|><|audio_end|>\" -%}\n {%- endif -%}\n {%- endfor -%}\n {%- endif -%}\n{%- endfor -%}\n\n{{- '<|im_start|>system\\n' + (ns.system_text if ns.system_text is string else '') + '<|im_end|>\\n' -}}\n{{- '<|im_start|>user\\n' + ns2.audio_tokens + '<|im_end|>\\n' -}}\n{%- if add_generation_prompt -%}\n{{- '<|im_start|>assistant\\n' -}}\n{%- endif -%}"}
Qwen3-ASR-1.7B/config.json ADDED
@@ -0,0 +1,221 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "Qwen3ASRForConditionalGeneration"
4
+ ],
5
+ "model_type": "qwen3_asr",
6
+ "support_languages": [
7
+ "Chinese",
8
+ "English",
9
+ "Cantonese",
10
+ "Arabic",
11
+ "German",
12
+ "French",
13
+ "Spanish",
14
+ "Portuguese",
15
+ "Indonesian",
16
+ "Italian",
17
+ "Korean",
18
+ "Russian",
19
+ "Thai",
20
+ "Vietnamese",
21
+ "Japanese",
22
+ "Turkish",
23
+ "Hindi",
24
+ "Malay",
25
+ "Dutch",
26
+ "Swedish",
27
+ "Danish",
28
+ "Finnish",
29
+ "Polish",
30
+ "Czech",
31
+ "Filipino",
32
+ "Persian",
33
+ "Greek",
34
+ "Romanian",
35
+ "Hungarian",
36
+ "Macedonian"
37
+ ],
38
+ "thinker_config": {
39
+ "model_type": "qwen3_asr",
40
+ "architectures": [
41
+ "Qwen3ASRForConditionalGeneration"
42
+ ],
43
+ "audio_config": {
44
+ "_name_or_path": "",
45
+ "activation_dropout": 0,
46
+ "activation_function": "gelu",
47
+ "add_cross_attention": false,
48
+ "architectures": null,
49
+ "attention_dropout": 0,
50
+ "bad_words_ids": null,
51
+ "begin_suppress_tokens": null,
52
+ "bos_token_id": null,
53
+ "chunk_size_feed_forward": 0,
54
+ "conv_chunksize": 500,
55
+ "cross_attention_hidden_size": null,
56
+ "d_model": 1024,
57
+ "decoder_start_token_id": null,
58
+ "diversity_penalty": 0.0,
59
+ "do_sample": false,
60
+ "downsample_hidden_size": 480,
61
+ "dropout": 0,
62
+ "dtype": null,
63
+ "early_stopping": false,
64
+ "encoder_attention_heads": 16,
65
+ "encoder_ffn_dim": 4096,
66
+ "encoder_layers": 24,
67
+ "encoder_no_repeat_ngram_size": 0,
68
+ "eos_token_id": null,
69
+ "exponential_decay_length_penalty": null,
70
+ "finetuning_task": null,
71
+ "forced_bos_token_id": null,
72
+ "forced_eos_token_id": null,
73
+ "id2label": {
74
+ "0": "LABEL_0",
75
+ "1": "LABEL_1"
76
+ },
77
+ "initializer_range": 0.02,
78
+ "is_decoder": false,
79
+ "is_encoder_decoder": false,
80
+ "label2id": {
81
+ "LABEL_0": 0,
82
+ "LABEL_1": 1
83
+ },
84
+ "length_penalty": 1.0,
85
+ "max_length": 20,
86
+ "max_source_positions": 1500,
87
+ "min_length": 0,
88
+ "model_type": "qwen3_asr_audio_encoder",
89
+ "n_window": 50,
90
+ "n_window_infer": 800,
91
+ "no_repeat_ngram_size": 0,
92
+ "num_beam_groups": 1,
93
+ "num_beams": 1,
94
+ "num_hidden_layers": 24,
95
+ "num_mel_bins": 128,
96
+ "num_return_sequences": 1,
97
+ "output_attentions": false,
98
+ "output_dim": 2048,
99
+ "output_hidden_states": false,
100
+ "output_scores": false,
101
+ "pad_token_id": null,
102
+ "prefix": null,
103
+ "problem_type": null,
104
+ "pruned_heads": {},
105
+ "remove_invalid_values": false,
106
+ "repetition_penalty": 1.0,
107
+ "return_dict": true,
108
+ "return_dict_in_generate": false,
109
+ "scale_embedding": false,
110
+ "sep_token_id": null,
111
+ "suppress_tokens": null,
112
+ "task_specific_params": null,
113
+ "temperature": 1.0,
114
+ "tf_legacy_loss": false,
115
+ "tie_encoder_decoder": false,
116
+ "tie_word_embeddings": true,
117
+ "tokenizer_class": null,
118
+ "top_k": 50,
119
+ "top_p": 1.0,
120
+ "torchscript": false,
121
+ "typical_p": 1.0,
122
+ "use_bfloat16": false
123
+ },
124
+ "audio_end_token_id": 151670,
125
+ "audio_start_token_id": 151669,
126
+ "audio_token_id": 151676,
127
+ "dtype": "bfloat16",
128
+ "initializer_range": 0.02,
129
+ "text_config": {
130
+ "_name_or_path": "",
131
+ "add_cross_attention": false,
132
+ "architectures": null,
133
+ "attention_bias": false,
134
+ "attention_dropout": 0.0,
135
+ "bad_words_ids": null,
136
+ "begin_suppress_tokens": null,
137
+ "bos_token_id": null,
138
+ "chunk_size_feed_forward": 0,
139
+ "cross_attention_hidden_size": null,
140
+ "decoder_start_token_id": null,
141
+ "diversity_penalty": 0.0,
142
+ "do_sample": false,
143
+ "dtype": null,
144
+ "early_stopping": false,
145
+ "encoder_no_repeat_ngram_size": 0,
146
+ "eos_token_id": null,
147
+ "exponential_decay_length_penalty": null,
148
+ "finetuning_task": null,
149
+ "forced_bos_token_id": null,
150
+ "forced_eos_token_id": null,
151
+ "head_dim": 128,
152
+ "hidden_act": "silu",
153
+ "hidden_size": 2048,
154
+ "id2label": {
155
+ "0": "LABEL_0",
156
+ "1": "LABEL_1"
157
+ },
158
+ "initializer_range": 0.02,
159
+ "intermediate_size": 6144,
160
+ "is_decoder": false,
161
+ "is_encoder_decoder": false,
162
+ "label2id": {
163
+ "LABEL_0": 0,
164
+ "LABEL_1": 1
165
+ },
166
+ "length_penalty": 1.0,
167
+ "max_length": 20,
168
+ "max_position_embeddings": 65536,
169
+ "min_length": 0,
170
+ "model_type": "qwen3",
171
+ "no_repeat_ngram_size": 0,
172
+ "num_attention_heads": 16,
173
+ "num_beam_groups": 1,
174
+ "num_beams": 1,
175
+ "num_hidden_layers": 28,
176
+ "num_key_value_heads": 8,
177
+ "num_return_sequences": 1,
178
+ "output_attentions": false,
179
+ "output_hidden_states": false,
180
+ "output_scores": false,
181
+ "pad_token_id": null,
182
+ "prefix": null,
183
+ "problem_type": null,
184
+ "pruned_heads": {},
185
+ "remove_invalid_values": false,
186
+ "repetition_penalty": 1.0,
187
+ "return_dict": true,
188
+ "return_dict_in_generate": false,
189
+ "rms_norm_eps": 1e-06,
190
+ "rope_scaling": {
191
+ "interleaved": true,
192
+ "mrope_interleaved": true,
193
+ "mrope_section": [
194
+ 24,
195
+ 20,
196
+ 20
197
+ ],
198
+ "rope_type": "default",
199
+ "type": "default"
200
+ },
201
+ "rope_theta": 1000000,
202
+ "sep_token_id": null,
203
+ "suppress_tokens": null,
204
+ "task_specific_params": null,
205
+ "temperature": 1.0,
206
+ "tf_legacy_loss": false,
207
+ "tie_encoder_decoder": false,
208
+ "tie_word_embeddings": true,
209
+ "tokenizer_class": null,
210
+ "top_k": 50,
211
+ "top_p": 1.0,
212
+ "torchscript": false,
213
+ "typical_p": 1.0,
214
+ "use_bfloat16": false,
215
+ "use_cache": true,
216
+ "vocab_size": 151936
217
+ }
218
+ },
219
+ "transformers_version": "4.57.6"
220
+ }
221
+
Qwen3-ASR-1.7B/generation_config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "eos_token_id": [
4
+ 151643,
5
+ 151645
6
+ ],
7
+ "pad_token_id": 151643,
8
+ "do_sample": false
9
+ }
Qwen3-ASR-1.7B/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
Qwen3-ASR-1.7B/model-00001-of-00002.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a4cd1f1a04d90b757dc7f7dd26254e69a013b19e80efe590a83c6a3bde8608d6
3
+ size 4220320824
Qwen3-ASR-1.7B/model-00002-of-00002.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6e0b9d9e09e2e0238e7ef3cc8a484ab387e91b90f1900bedf88bc92d7929ccfc
3
+ size 478200688
Qwen3-ASR-1.7B/model.safetensors.index.json ADDED
@@ -0,0 +1,715 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "format": "pt"
4
+ },
5
+ "weight_map": {
6
+ "thinker.audio_tower.conv2d1.bias": "model-00001-of-00002.safetensors",
7
+ "thinker.audio_tower.conv2d1.weight": "model-00001-of-00002.safetensors",
8
+ "thinker.audio_tower.conv2d2.bias": "model-00001-of-00002.safetensors",
9
+ "thinker.audio_tower.conv2d2.weight": "model-00001-of-00002.safetensors",
10
+ "thinker.audio_tower.conv2d3.bias": "model-00001-of-00002.safetensors",
11
+ "thinker.audio_tower.conv2d3.weight": "model-00001-of-00002.safetensors",
12
+ "thinker.audio_tower.conv_out.weight": "model-00001-of-00002.safetensors",
13
+ "thinker.audio_tower.layers.0.fc1.bias": "model-00001-of-00002.safetensors",
14
+ "thinker.audio_tower.layers.0.fc1.weight": "model-00001-of-00002.safetensors",
15
+ "thinker.audio_tower.layers.0.fc2.bias": "model-00001-of-00002.safetensors",
16
+ "thinker.audio_tower.layers.0.fc2.weight": "model-00001-of-00002.safetensors",
17
+ "thinker.audio_tower.layers.0.final_layer_norm.bias": "model-00001-of-00002.safetensors",
18
+ "thinker.audio_tower.layers.0.final_layer_norm.weight": "model-00001-of-00002.safetensors",
19
+ "thinker.audio_tower.layers.0.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
20
+ "thinker.audio_tower.layers.0.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
21
+ "thinker.audio_tower.layers.0.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
22
+ "thinker.audio_tower.layers.0.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
23
+ "thinker.audio_tower.layers.0.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
24
+ "thinker.audio_tower.layers.0.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
25
+ "thinker.audio_tower.layers.0.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
26
+ "thinker.audio_tower.layers.0.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
27
+ "thinker.audio_tower.layers.0.self_attn_layer_norm.bias": "model-00001-of-00002.safetensors",
28
+ "thinker.audio_tower.layers.0.self_attn_layer_norm.weight": "model-00001-of-00002.safetensors",
29
+ "thinker.audio_tower.layers.1.fc1.bias": "model-00001-of-00002.safetensors",
30
+ "thinker.audio_tower.layers.1.fc1.weight": "model-00001-of-00002.safetensors",
31
+ "thinker.audio_tower.layers.1.fc2.bias": "model-00001-of-00002.safetensors",
32
+ "thinker.audio_tower.layers.1.fc2.weight": "model-00001-of-00002.safetensors",
33
+ "thinker.audio_tower.layers.1.final_layer_norm.bias": "model-00001-of-00002.safetensors",
34
+ "thinker.audio_tower.layers.1.final_layer_norm.weight": "model-00001-of-00002.safetensors",
35
+ "thinker.audio_tower.layers.1.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
36
+ "thinker.audio_tower.layers.1.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
37
+ "thinker.audio_tower.layers.1.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
38
+ "thinker.audio_tower.layers.1.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
39
+ "thinker.audio_tower.layers.1.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
40
+ "thinker.audio_tower.layers.1.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
41
+ "thinker.audio_tower.layers.1.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
42
+ "thinker.audio_tower.layers.1.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
43
+ "thinker.audio_tower.layers.1.self_attn_layer_norm.bias": "model-00001-of-00002.safetensors",
44
+ "thinker.audio_tower.layers.1.self_attn_layer_norm.weight": "model-00001-of-00002.safetensors",
45
+ "thinker.audio_tower.layers.10.fc1.bias": "model-00001-of-00002.safetensors",
46
+ "thinker.audio_tower.layers.10.fc1.weight": "model-00001-of-00002.safetensors",
47
+ "thinker.audio_tower.layers.10.fc2.bias": "model-00001-of-00002.safetensors",
48
+ "thinker.audio_tower.layers.10.fc2.weight": "model-00001-of-00002.safetensors",
49
+ "thinker.audio_tower.layers.10.final_layer_norm.bias": "model-00001-of-00002.safetensors",
50
+ "thinker.audio_tower.layers.10.final_layer_norm.weight": "model-00001-of-00002.safetensors",
51
+ "thinker.audio_tower.layers.10.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
52
+ "thinker.audio_tower.layers.10.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
53
+ "thinker.audio_tower.layers.10.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
54
+ "thinker.audio_tower.layers.10.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
55
+ "thinker.audio_tower.layers.10.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
56
+ "thinker.audio_tower.layers.10.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
57
+ "thinker.audio_tower.layers.10.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
58
+ "thinker.audio_tower.layers.10.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
59
+ "thinker.audio_tower.layers.10.self_attn_layer_norm.bias": "model-00001-of-00002.safetensors",
60
+ "thinker.audio_tower.layers.10.self_attn_layer_norm.weight": "model-00001-of-00002.safetensors",
61
+ "thinker.audio_tower.layers.11.fc1.bias": "model-00001-of-00002.safetensors",
62
+ "thinker.audio_tower.layers.11.fc1.weight": "model-00001-of-00002.safetensors",
63
+ "thinker.audio_tower.layers.11.fc2.bias": "model-00001-of-00002.safetensors",
64
+ "thinker.audio_tower.layers.11.fc2.weight": "model-00001-of-00002.safetensors",
65
+ "thinker.audio_tower.layers.11.final_layer_norm.bias": "model-00001-of-00002.safetensors",
66
+ "thinker.audio_tower.layers.11.final_layer_norm.weight": "model-00001-of-00002.safetensors",
67
+ "thinker.audio_tower.layers.11.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
68
+ "thinker.audio_tower.layers.11.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
69
+ "thinker.audio_tower.layers.11.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
70
+ "thinker.audio_tower.layers.11.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
71
+ "thinker.audio_tower.layers.11.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
72
+ "thinker.audio_tower.layers.11.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
73
+ "thinker.audio_tower.layers.11.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
74
+ "thinker.audio_tower.layers.11.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
75
+ "thinker.audio_tower.layers.11.self_attn_layer_norm.bias": "model-00001-of-00002.safetensors",
76
+ "thinker.audio_tower.layers.11.self_attn_layer_norm.weight": "model-00001-of-00002.safetensors",
77
+ "thinker.audio_tower.layers.12.fc1.bias": "model-00001-of-00002.safetensors",
78
+ "thinker.audio_tower.layers.12.fc1.weight": "model-00001-of-00002.safetensors",
79
+ "thinker.audio_tower.layers.12.fc2.bias": "model-00001-of-00002.safetensors",
80
+ "thinker.audio_tower.layers.12.fc2.weight": "model-00001-of-00002.safetensors",
81
+ "thinker.audio_tower.layers.12.final_layer_norm.bias": "model-00001-of-00002.safetensors",
82
+ "thinker.audio_tower.layers.12.final_layer_norm.weight": "model-00001-of-00002.safetensors",
83
+ "thinker.audio_tower.layers.12.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
84
+ "thinker.audio_tower.layers.12.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
85
+ "thinker.audio_tower.layers.12.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
86
+ "thinker.audio_tower.layers.12.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
87
+ "thinker.audio_tower.layers.12.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
88
+ "thinker.audio_tower.layers.12.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
89
+ "thinker.audio_tower.layers.12.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
90
+ "thinker.audio_tower.layers.12.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
91
+ "thinker.audio_tower.layers.12.self_attn_layer_norm.bias": "model-00001-of-00002.safetensors",
92
+ "thinker.audio_tower.layers.12.self_attn_layer_norm.weight": "model-00001-of-00002.safetensors",
93
+ "thinker.audio_tower.layers.13.fc1.bias": "model-00001-of-00002.safetensors",
94
+ "thinker.audio_tower.layers.13.fc1.weight": "model-00001-of-00002.safetensors",
95
+ "thinker.audio_tower.layers.13.fc2.bias": "model-00001-of-00002.safetensors",
96
+ "thinker.audio_tower.layers.13.fc2.weight": "model-00001-of-00002.safetensors",
97
+ "thinker.audio_tower.layers.13.final_layer_norm.bias": "model-00001-of-00002.safetensors",
98
+ "thinker.audio_tower.layers.13.final_layer_norm.weight": "model-00001-of-00002.safetensors",
99
+ "thinker.audio_tower.layers.13.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
100
+ "thinker.audio_tower.layers.13.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
101
+ "thinker.audio_tower.layers.13.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
102
+ "thinker.audio_tower.layers.13.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
103
+ "thinker.audio_tower.layers.13.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
104
+ "thinker.audio_tower.layers.13.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
105
+ "thinker.audio_tower.layers.13.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
106
+ "thinker.audio_tower.layers.13.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
107
+ "thinker.audio_tower.layers.13.self_attn_layer_norm.bias": "model-00001-of-00002.safetensors",
108
+ "thinker.audio_tower.layers.13.self_attn_layer_norm.weight": "model-00001-of-00002.safetensors",
109
+ "thinker.audio_tower.layers.14.fc1.bias": "model-00001-of-00002.safetensors",
110
+ "thinker.audio_tower.layers.14.fc1.weight": "model-00001-of-00002.safetensors",
111
+ "thinker.audio_tower.layers.14.fc2.bias": "model-00001-of-00002.safetensors",
112
+ "thinker.audio_tower.layers.14.fc2.weight": "model-00001-of-00002.safetensors",
113
+ "thinker.audio_tower.layers.14.final_layer_norm.bias": "model-00001-of-00002.safetensors",
114
+ "thinker.audio_tower.layers.14.final_layer_norm.weight": "model-00001-of-00002.safetensors",
115
+ "thinker.audio_tower.layers.14.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
116
+ "thinker.audio_tower.layers.14.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
117
+ "thinker.audio_tower.layers.14.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
118
+ "thinker.audio_tower.layers.14.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
119
+ "thinker.audio_tower.layers.14.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
120
+ "thinker.audio_tower.layers.14.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
121
+ "thinker.audio_tower.layers.14.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
122
+ "thinker.audio_tower.layers.14.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
123
+ "thinker.audio_tower.layers.14.self_attn_layer_norm.bias": "model-00001-of-00002.safetensors",
124
+ "thinker.audio_tower.layers.14.self_attn_layer_norm.weight": "model-00001-of-00002.safetensors",
125
+ "thinker.audio_tower.layers.15.fc1.bias": "model-00001-of-00002.safetensors",
126
+ "thinker.audio_tower.layers.15.fc1.weight": "model-00001-of-00002.safetensors",
127
+ "thinker.audio_tower.layers.15.fc2.bias": "model-00001-of-00002.safetensors",
128
+ "thinker.audio_tower.layers.15.fc2.weight": "model-00001-of-00002.safetensors",
129
+ "thinker.audio_tower.layers.15.final_layer_norm.bias": "model-00001-of-00002.safetensors",
130
+ "thinker.audio_tower.layers.15.final_layer_norm.weight": "model-00001-of-00002.safetensors",
131
+ "thinker.audio_tower.layers.15.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
132
+ "thinker.audio_tower.layers.15.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
133
+ "thinker.audio_tower.layers.15.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
134
+ "thinker.audio_tower.layers.15.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
135
+ "thinker.audio_tower.layers.15.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
136
+ "thinker.audio_tower.layers.15.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
137
+ "thinker.audio_tower.layers.15.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
138
+ "thinker.audio_tower.layers.15.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
139
+ "thinker.audio_tower.layers.15.self_attn_layer_norm.bias": "model-00001-of-00002.safetensors",
140
+ "thinker.audio_tower.layers.15.self_attn_layer_norm.weight": "model-00001-of-00002.safetensors",
141
+ "thinker.audio_tower.layers.16.fc1.bias": "model-00001-of-00002.safetensors",
142
+ "thinker.audio_tower.layers.16.fc1.weight": "model-00001-of-00002.safetensors",
143
+ "thinker.audio_tower.layers.16.fc2.bias": "model-00001-of-00002.safetensors",
144
+ "thinker.audio_tower.layers.16.fc2.weight": "model-00001-of-00002.safetensors",
145
+ "thinker.audio_tower.layers.16.final_layer_norm.bias": "model-00001-of-00002.safetensors",
146
+ "thinker.audio_tower.layers.16.final_layer_norm.weight": "model-00001-of-00002.safetensors",
147
+ "thinker.audio_tower.layers.16.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
148
+ "thinker.audio_tower.layers.16.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
149
+ "thinker.audio_tower.layers.16.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
150
+ "thinker.audio_tower.layers.16.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
151
+ "thinker.audio_tower.layers.16.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
152
+ "thinker.audio_tower.layers.16.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
153
+ "thinker.audio_tower.layers.16.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
154
+ "thinker.audio_tower.layers.16.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
155
+ "thinker.audio_tower.layers.16.self_attn_layer_norm.bias": "model-00001-of-00002.safetensors",
156
+ "thinker.audio_tower.layers.16.self_attn_layer_norm.weight": "model-00001-of-00002.safetensors",
157
+ "thinker.audio_tower.layers.17.fc1.bias": "model-00001-of-00002.safetensors",
158
+ "thinker.audio_tower.layers.17.fc1.weight": "model-00001-of-00002.safetensors",
159
+ "thinker.audio_tower.layers.17.fc2.bias": "model-00001-of-00002.safetensors",
160
+ "thinker.audio_tower.layers.17.fc2.weight": "model-00001-of-00002.safetensors",
161
+ "thinker.audio_tower.layers.17.final_layer_norm.bias": "model-00001-of-00002.safetensors",
162
+ "thinker.audio_tower.layers.17.final_layer_norm.weight": "model-00001-of-00002.safetensors",
163
+ "thinker.audio_tower.layers.17.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
164
+ "thinker.audio_tower.layers.17.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
165
+ "thinker.audio_tower.layers.17.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
166
+ "thinker.audio_tower.layers.17.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
167
+ "thinker.audio_tower.layers.17.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
168
+ "thinker.audio_tower.layers.17.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
169
+ "thinker.audio_tower.layers.17.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
170
+ "thinker.audio_tower.layers.17.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
171
+ "thinker.audio_tower.layers.17.self_attn_layer_norm.bias": "model-00001-of-00002.safetensors",
172
+ "thinker.audio_tower.layers.17.self_attn_layer_norm.weight": "model-00001-of-00002.safetensors",
173
+ "thinker.audio_tower.layers.18.fc1.bias": "model-00001-of-00002.safetensors",
174
+ "thinker.audio_tower.layers.18.fc1.weight": "model-00001-of-00002.safetensors",
175
+ "thinker.audio_tower.layers.18.fc2.bias": "model-00001-of-00002.safetensors",
176
+ "thinker.audio_tower.layers.18.fc2.weight": "model-00001-of-00002.safetensors",
177
+ "thinker.audio_tower.layers.18.final_layer_norm.bias": "model-00001-of-00002.safetensors",
178
+ "thinker.audio_tower.layers.18.final_layer_norm.weight": "model-00001-of-00002.safetensors",
179
+ "thinker.audio_tower.layers.18.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
180
+ "thinker.audio_tower.layers.18.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
181
+ "thinker.audio_tower.layers.18.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
182
+ "thinker.audio_tower.layers.18.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
183
+ "thinker.audio_tower.layers.18.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
184
+ "thinker.audio_tower.layers.18.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
185
+ "thinker.audio_tower.layers.18.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
186
+ "thinker.audio_tower.layers.18.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
187
+ "thinker.audio_tower.layers.18.self_attn_layer_norm.bias": "model-00001-of-00002.safetensors",
188
+ "thinker.audio_tower.layers.18.self_attn_layer_norm.weight": "model-00001-of-00002.safetensors",
189
+ "thinker.audio_tower.layers.19.fc1.bias": "model-00001-of-00002.safetensors",
190
+ "thinker.audio_tower.layers.19.fc1.weight": "model-00001-of-00002.safetensors",
191
+ "thinker.audio_tower.layers.19.fc2.bias": "model-00001-of-00002.safetensors",
192
+ "thinker.audio_tower.layers.19.fc2.weight": "model-00001-of-00002.safetensors",
193
+ "thinker.audio_tower.layers.19.final_layer_norm.bias": "model-00001-of-00002.safetensors",
194
+ "thinker.audio_tower.layers.19.final_layer_norm.weight": "model-00001-of-00002.safetensors",
195
+ "thinker.audio_tower.layers.19.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
196
+ "thinker.audio_tower.layers.19.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
197
+ "thinker.audio_tower.layers.19.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
198
+ "thinker.audio_tower.layers.19.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
199
+ "thinker.audio_tower.layers.19.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
200
+ "thinker.audio_tower.layers.19.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
201
+ "thinker.audio_tower.layers.19.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
202
+ "thinker.audio_tower.layers.19.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
203
+ "thinker.audio_tower.layers.19.self_attn_layer_norm.bias": "model-00001-of-00002.safetensors",
204
+ "thinker.audio_tower.layers.19.self_attn_layer_norm.weight": "model-00001-of-00002.safetensors",
205
+ "thinker.audio_tower.layers.2.fc1.bias": "model-00001-of-00002.safetensors",
206
+ "thinker.audio_tower.layers.2.fc1.weight": "model-00001-of-00002.safetensors",
207
+ "thinker.audio_tower.layers.2.fc2.bias": "model-00001-of-00002.safetensors",
208
+ "thinker.audio_tower.layers.2.fc2.weight": "model-00001-of-00002.safetensors",
209
+ "thinker.audio_tower.layers.2.final_layer_norm.bias": "model-00001-of-00002.safetensors",
210
+ "thinker.audio_tower.layers.2.final_layer_norm.weight": "model-00001-of-00002.safetensors",
211
+ "thinker.audio_tower.layers.2.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
212
+ "thinker.audio_tower.layers.2.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
213
+ "thinker.audio_tower.layers.2.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
214
+ "thinker.audio_tower.layers.2.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
215
+ "thinker.audio_tower.layers.2.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
216
+ "thinker.audio_tower.layers.2.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
217
+ "thinker.audio_tower.layers.2.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
218
+ "thinker.audio_tower.layers.2.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
219
+ "thinker.audio_tower.layers.2.self_attn_layer_norm.bias": "model-00001-of-00002.safetensors",
220
+ "thinker.audio_tower.layers.2.self_attn_layer_norm.weight": "model-00001-of-00002.safetensors",
221
+ "thinker.audio_tower.layers.20.fc1.bias": "model-00001-of-00002.safetensors",
222
+ "thinker.audio_tower.layers.20.fc1.weight": "model-00001-of-00002.safetensors",
223
+ "thinker.audio_tower.layers.20.fc2.bias": "model-00001-of-00002.safetensors",
224
+ "thinker.audio_tower.layers.20.fc2.weight": "model-00001-of-00002.safetensors",
225
+ "thinker.audio_tower.layers.20.final_layer_norm.bias": "model-00001-of-00002.safetensors",
226
+ "thinker.audio_tower.layers.20.final_layer_norm.weight": "model-00001-of-00002.safetensors",
227
+ "thinker.audio_tower.layers.20.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
228
+ "thinker.audio_tower.layers.20.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
229
+ "thinker.audio_tower.layers.20.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
230
+ "thinker.audio_tower.layers.20.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
231
+ "thinker.audio_tower.layers.20.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
232
+ "thinker.audio_tower.layers.20.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
233
+ "thinker.audio_tower.layers.20.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
234
+ "thinker.audio_tower.layers.20.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
235
+ "thinker.audio_tower.layers.20.self_attn_layer_norm.bias": "model-00001-of-00002.safetensors",
236
+ "thinker.audio_tower.layers.20.self_attn_layer_norm.weight": "model-00001-of-00002.safetensors",
237
+ "thinker.audio_tower.layers.21.fc1.bias": "model-00001-of-00002.safetensors",
238
+ "thinker.audio_tower.layers.21.fc1.weight": "model-00001-of-00002.safetensors",
239
+ "thinker.audio_tower.layers.21.fc2.bias": "model-00001-of-00002.safetensors",
240
+ "thinker.audio_tower.layers.21.fc2.weight": "model-00001-of-00002.safetensors",
241
+ "thinker.audio_tower.layers.21.final_layer_norm.bias": "model-00001-of-00002.safetensors",
242
+ "thinker.audio_tower.layers.21.final_layer_norm.weight": "model-00001-of-00002.safetensors",
243
+ "thinker.audio_tower.layers.21.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
244
+ "thinker.audio_tower.layers.21.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
245
+ "thinker.audio_tower.layers.21.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
246
+ "thinker.audio_tower.layers.21.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
247
+ "thinker.audio_tower.layers.21.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
248
+ "thinker.audio_tower.layers.21.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
249
+ "thinker.audio_tower.layers.21.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
250
+ "thinker.audio_tower.layers.21.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
251
+ "thinker.audio_tower.layers.21.self_attn_layer_norm.bias": "model-00001-of-00002.safetensors",
252
+ "thinker.audio_tower.layers.21.self_attn_layer_norm.weight": "model-00001-of-00002.safetensors",
253
+ "thinker.audio_tower.layers.22.fc1.bias": "model-00001-of-00002.safetensors",
254
+ "thinker.audio_tower.layers.22.fc1.weight": "model-00001-of-00002.safetensors",
255
+ "thinker.audio_tower.layers.22.fc2.bias": "model-00001-of-00002.safetensors",
256
+ "thinker.audio_tower.layers.22.fc2.weight": "model-00001-of-00002.safetensors",
257
+ "thinker.audio_tower.layers.22.final_layer_norm.bias": "model-00001-of-00002.safetensors",
258
+ "thinker.audio_tower.layers.22.final_layer_norm.weight": "model-00001-of-00002.safetensors",
259
+ "thinker.audio_tower.layers.22.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
260
+ "thinker.audio_tower.layers.22.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
261
+ "thinker.audio_tower.layers.22.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
262
+ "thinker.audio_tower.layers.22.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
263
+ "thinker.audio_tower.layers.22.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
264
+ "thinker.audio_tower.layers.22.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
265
+ "thinker.audio_tower.layers.22.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
266
+ "thinker.audio_tower.layers.22.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
267
+ "thinker.audio_tower.layers.22.self_attn_layer_norm.bias": "model-00001-of-00002.safetensors",
268
+ "thinker.audio_tower.layers.22.self_attn_layer_norm.weight": "model-00001-of-00002.safetensors",
269
+ "thinker.audio_tower.layers.23.fc1.bias": "model-00001-of-00002.safetensors",
270
+ "thinker.audio_tower.layers.23.fc1.weight": "model-00001-of-00002.safetensors",
271
+ "thinker.audio_tower.layers.23.fc2.bias": "model-00001-of-00002.safetensors",
272
+ "thinker.audio_tower.layers.23.fc2.weight": "model-00001-of-00002.safetensors",
273
+ "thinker.audio_tower.layers.23.final_layer_norm.bias": "model-00001-of-00002.safetensors",
274
+ "thinker.audio_tower.layers.23.final_layer_norm.weight": "model-00001-of-00002.safetensors",
275
+ "thinker.audio_tower.layers.23.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
276
+ "thinker.audio_tower.layers.23.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
277
+ "thinker.audio_tower.layers.23.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
278
+ "thinker.audio_tower.layers.23.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
279
+ "thinker.audio_tower.layers.23.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
280
+ "thinker.audio_tower.layers.23.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
281
+ "thinker.audio_tower.layers.23.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
282
+ "thinker.audio_tower.layers.23.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
283
+ "thinker.audio_tower.layers.23.self_attn_layer_norm.bias": "model-00001-of-00002.safetensors",
284
+ "thinker.audio_tower.layers.23.self_attn_layer_norm.weight": "model-00001-of-00002.safetensors",
285
+ "thinker.audio_tower.layers.3.fc1.bias": "model-00001-of-00002.safetensors",
286
+ "thinker.audio_tower.layers.3.fc1.weight": "model-00001-of-00002.safetensors",
287
+ "thinker.audio_tower.layers.3.fc2.bias": "model-00001-of-00002.safetensors",
288
+ "thinker.audio_tower.layers.3.fc2.weight": "model-00001-of-00002.safetensors",
289
+ "thinker.audio_tower.layers.3.final_layer_norm.bias": "model-00001-of-00002.safetensors",
290
+ "thinker.audio_tower.layers.3.final_layer_norm.weight": "model-00001-of-00002.safetensors",
291
+ "thinker.audio_tower.layers.3.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
292
+ "thinker.audio_tower.layers.3.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
293
+ "thinker.audio_tower.layers.3.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
294
+ "thinker.audio_tower.layers.3.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
295
+ "thinker.audio_tower.layers.3.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
296
+ "thinker.audio_tower.layers.3.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
297
+ "thinker.audio_tower.layers.3.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
298
+ "thinker.audio_tower.layers.3.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
299
+ "thinker.audio_tower.layers.3.self_attn_layer_norm.bias": "model-00001-of-00002.safetensors",
300
+ "thinker.audio_tower.layers.3.self_attn_layer_norm.weight": "model-00001-of-00002.safetensors",
301
+ "thinker.audio_tower.layers.4.fc1.bias": "model-00001-of-00002.safetensors",
302
+ "thinker.audio_tower.layers.4.fc1.weight": "model-00001-of-00002.safetensors",
303
+ "thinker.audio_tower.layers.4.fc2.bias": "model-00001-of-00002.safetensors",
304
+ "thinker.audio_tower.layers.4.fc2.weight": "model-00001-of-00002.safetensors",
305
+ "thinker.audio_tower.layers.4.final_layer_norm.bias": "model-00001-of-00002.safetensors",
306
+ "thinker.audio_tower.layers.4.final_layer_norm.weight": "model-00001-of-00002.safetensors",
307
+ "thinker.audio_tower.layers.4.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
308
+ "thinker.audio_tower.layers.4.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
309
+ "thinker.audio_tower.layers.4.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
310
+ "thinker.audio_tower.layers.4.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
311
+ "thinker.audio_tower.layers.4.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
312
+ "thinker.audio_tower.layers.4.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
313
+ "thinker.audio_tower.layers.4.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
314
+ "thinker.audio_tower.layers.4.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
315
+ "thinker.audio_tower.layers.4.self_attn_layer_norm.bias": "model-00001-of-00002.safetensors",
316
+ "thinker.audio_tower.layers.4.self_attn_layer_norm.weight": "model-00001-of-00002.safetensors",
317
+ "thinker.audio_tower.layers.5.fc1.bias": "model-00001-of-00002.safetensors",
318
+ "thinker.audio_tower.layers.5.fc1.weight": "model-00001-of-00002.safetensors",
319
+ "thinker.audio_tower.layers.5.fc2.bias": "model-00001-of-00002.safetensors",
320
+ "thinker.audio_tower.layers.5.fc2.weight": "model-00001-of-00002.safetensors",
321
+ "thinker.audio_tower.layers.5.final_layer_norm.bias": "model-00001-of-00002.safetensors",
322
+ "thinker.audio_tower.layers.5.final_layer_norm.weight": "model-00001-of-00002.safetensors",
323
+ "thinker.audio_tower.layers.5.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
324
+ "thinker.audio_tower.layers.5.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
325
+ "thinker.audio_tower.layers.5.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
326
+ "thinker.audio_tower.layers.5.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
327
+ "thinker.audio_tower.layers.5.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
328
+ "thinker.audio_tower.layers.5.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
329
+ "thinker.audio_tower.layers.5.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
330
+ "thinker.audio_tower.layers.5.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
331
+ "thinker.audio_tower.layers.5.self_attn_layer_norm.bias": "model-00001-of-00002.safetensors",
332
+ "thinker.audio_tower.layers.5.self_attn_layer_norm.weight": "model-00001-of-00002.safetensors",
333
+ "thinker.audio_tower.layers.6.fc1.bias": "model-00001-of-00002.safetensors",
334
+ "thinker.audio_tower.layers.6.fc1.weight": "model-00001-of-00002.safetensors",
335
+ "thinker.audio_tower.layers.6.fc2.bias": "model-00001-of-00002.safetensors",
336
+ "thinker.audio_tower.layers.6.fc2.weight": "model-00001-of-00002.safetensors",
337
+ "thinker.audio_tower.layers.6.final_layer_norm.bias": "model-00001-of-00002.safetensors",
338
+ "thinker.audio_tower.layers.6.final_layer_norm.weight": "model-00001-of-00002.safetensors",
339
+ "thinker.audio_tower.layers.6.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
340
+ "thinker.audio_tower.layers.6.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
341
+ "thinker.audio_tower.layers.6.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
342
+ "thinker.audio_tower.layers.6.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
343
+ "thinker.audio_tower.layers.6.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
344
+ "thinker.audio_tower.layers.6.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
345
+ "thinker.audio_tower.layers.6.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
346
+ "thinker.audio_tower.layers.6.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
347
+ "thinker.audio_tower.layers.6.self_attn_layer_norm.bias": "model-00001-of-00002.safetensors",
348
+ "thinker.audio_tower.layers.6.self_attn_layer_norm.weight": "model-00001-of-00002.safetensors",
349
+ "thinker.audio_tower.layers.7.fc1.bias": "model-00001-of-00002.safetensors",
350
+ "thinker.audio_tower.layers.7.fc1.weight": "model-00001-of-00002.safetensors",
351
+ "thinker.audio_tower.layers.7.fc2.bias": "model-00001-of-00002.safetensors",
352
+ "thinker.audio_tower.layers.7.fc2.weight": "model-00001-of-00002.safetensors",
353
+ "thinker.audio_tower.layers.7.final_layer_norm.bias": "model-00001-of-00002.safetensors",
354
+ "thinker.audio_tower.layers.7.final_layer_norm.weight": "model-00001-of-00002.safetensors",
355
+ "thinker.audio_tower.layers.7.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
356
+ "thinker.audio_tower.layers.7.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
357
+ "thinker.audio_tower.layers.7.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
358
+ "thinker.audio_tower.layers.7.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
359
+ "thinker.audio_tower.layers.7.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
360
+ "thinker.audio_tower.layers.7.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
361
+ "thinker.audio_tower.layers.7.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
362
+ "thinker.audio_tower.layers.7.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
363
+ "thinker.audio_tower.layers.7.self_attn_layer_norm.bias": "model-00001-of-00002.safetensors",
364
+ "thinker.audio_tower.layers.7.self_attn_layer_norm.weight": "model-00001-of-00002.safetensors",
365
+ "thinker.audio_tower.layers.8.fc1.bias": "model-00001-of-00002.safetensors",
366
+ "thinker.audio_tower.layers.8.fc1.weight": "model-00001-of-00002.safetensors",
367
+ "thinker.audio_tower.layers.8.fc2.bias": "model-00001-of-00002.safetensors",
368
+ "thinker.audio_tower.layers.8.fc2.weight": "model-00001-of-00002.safetensors",
369
+ "thinker.audio_tower.layers.8.final_layer_norm.bias": "model-00001-of-00002.safetensors",
370
+ "thinker.audio_tower.layers.8.final_layer_norm.weight": "model-00001-of-00002.safetensors",
371
+ "thinker.audio_tower.layers.8.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
372
+ "thinker.audio_tower.layers.8.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
373
+ "thinker.audio_tower.layers.8.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
374
+ "thinker.audio_tower.layers.8.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
375
+ "thinker.audio_tower.layers.8.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
376
+ "thinker.audio_tower.layers.8.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
377
+ "thinker.audio_tower.layers.8.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
378
+ "thinker.audio_tower.layers.8.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
379
+ "thinker.audio_tower.layers.8.self_attn_layer_norm.bias": "model-00001-of-00002.safetensors",
380
+ "thinker.audio_tower.layers.8.self_attn_layer_norm.weight": "model-00001-of-00002.safetensors",
381
+ "thinker.audio_tower.layers.9.fc1.bias": "model-00001-of-00002.safetensors",
382
+ "thinker.audio_tower.layers.9.fc1.weight": "model-00001-of-00002.safetensors",
383
+ "thinker.audio_tower.layers.9.fc2.bias": "model-00001-of-00002.safetensors",
384
+ "thinker.audio_tower.layers.9.fc2.weight": "model-00001-of-00002.safetensors",
385
+ "thinker.audio_tower.layers.9.final_layer_norm.bias": "model-00001-of-00002.safetensors",
386
+ "thinker.audio_tower.layers.9.final_layer_norm.weight": "model-00001-of-00002.safetensors",
387
+ "thinker.audio_tower.layers.9.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
388
+ "thinker.audio_tower.layers.9.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
389
+ "thinker.audio_tower.layers.9.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
390
+ "thinker.audio_tower.layers.9.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
391
+ "thinker.audio_tower.layers.9.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
392
+ "thinker.audio_tower.layers.9.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
393
+ "thinker.audio_tower.layers.9.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
394
+ "thinker.audio_tower.layers.9.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
395
+ "thinker.audio_tower.layers.9.self_attn_layer_norm.bias": "model-00001-of-00002.safetensors",
396
+ "thinker.audio_tower.layers.9.self_attn_layer_norm.weight": "model-00001-of-00002.safetensors",
397
+ "thinker.audio_tower.ln_post.bias": "model-00001-of-00002.safetensors",
398
+ "thinker.audio_tower.ln_post.weight": "model-00001-of-00002.safetensors",
399
+ "thinker.audio_tower.proj1.bias": "model-00001-of-00002.safetensors",
400
+ "thinker.audio_tower.proj1.weight": "model-00001-of-00002.safetensors",
401
+ "thinker.audio_tower.proj2.bias": "model-00001-of-00002.safetensors",
402
+ "thinker.audio_tower.proj2.weight": "model-00001-of-00002.safetensors",
403
+ "thinker.lm_head.weight": "model-00001-of-00002.safetensors",
404
+ "thinker.model.embed_tokens.weight": "model-00001-of-00002.safetensors",
405
+ "thinker.model.layers.0.input_layernorm.weight": "model-00001-of-00002.safetensors",
406
+ "thinker.model.layers.0.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
407
+ "thinker.model.layers.0.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
408
+ "thinker.model.layers.0.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
409
+ "thinker.model.layers.0.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
410
+ "thinker.model.layers.0.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
411
+ "thinker.model.layers.0.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
412
+ "thinker.model.layers.0.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
413
+ "thinker.model.layers.0.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
414
+ "thinker.model.layers.0.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
415
+ "thinker.model.layers.0.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
416
+ "thinker.model.layers.1.input_layernorm.weight": "model-00001-of-00002.safetensors",
417
+ "thinker.model.layers.1.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
418
+ "thinker.model.layers.1.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
419
+ "thinker.model.layers.1.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
420
+ "thinker.model.layers.1.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
421
+ "thinker.model.layers.1.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
422
+ "thinker.model.layers.1.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
423
+ "thinker.model.layers.1.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
424
+ "thinker.model.layers.1.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
425
+ "thinker.model.layers.1.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
426
+ "thinker.model.layers.1.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
427
+ "thinker.model.layers.10.input_layernorm.weight": "model-00001-of-00002.safetensors",
428
+ "thinker.model.layers.10.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
429
+ "thinker.model.layers.10.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
430
+ "thinker.model.layers.10.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
431
+ "thinker.model.layers.10.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
432
+ "thinker.model.layers.10.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
433
+ "thinker.model.layers.10.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
434
+ "thinker.model.layers.10.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
435
+ "thinker.model.layers.10.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
436
+ "thinker.model.layers.10.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
437
+ "thinker.model.layers.10.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
438
+ "thinker.model.layers.11.input_layernorm.weight": "model-00001-of-00002.safetensors",
439
+ "thinker.model.layers.11.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
440
+ "thinker.model.layers.11.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
441
+ "thinker.model.layers.11.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
442
+ "thinker.model.layers.11.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
443
+ "thinker.model.layers.11.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
444
+ "thinker.model.layers.11.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
445
+ "thinker.model.layers.11.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
446
+ "thinker.model.layers.11.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
447
+ "thinker.model.layers.11.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
448
+ "thinker.model.layers.11.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
449
+ "thinker.model.layers.12.input_layernorm.weight": "model-00001-of-00002.safetensors",
450
+ "thinker.model.layers.12.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
451
+ "thinker.model.layers.12.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
452
+ "thinker.model.layers.12.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
453
+ "thinker.model.layers.12.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
454
+ "thinker.model.layers.12.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
455
+ "thinker.model.layers.12.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
456
+ "thinker.model.layers.12.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
457
+ "thinker.model.layers.12.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
458
+ "thinker.model.layers.12.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
459
+ "thinker.model.layers.12.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
460
+ "thinker.model.layers.13.input_layernorm.weight": "model-00001-of-00002.safetensors",
461
+ "thinker.model.layers.13.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
462
+ "thinker.model.layers.13.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
463
+ "thinker.model.layers.13.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
464
+ "thinker.model.layers.13.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
465
+ "thinker.model.layers.13.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
466
+ "thinker.model.layers.13.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
467
+ "thinker.model.layers.13.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
468
+ "thinker.model.layers.13.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
469
+ "thinker.model.layers.13.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
470
+ "thinker.model.layers.13.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
471
+ "thinker.model.layers.14.input_layernorm.weight": "model-00001-of-00002.safetensors",
472
+ "thinker.model.layers.14.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
473
+ "thinker.model.layers.14.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
474
+ "thinker.model.layers.14.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
475
+ "thinker.model.layers.14.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
476
+ "thinker.model.layers.14.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
477
+ "thinker.model.layers.14.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
478
+ "thinker.model.layers.14.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
479
+ "thinker.model.layers.14.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
480
+ "thinker.model.layers.14.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
481
+ "thinker.model.layers.14.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
482
+ "thinker.model.layers.15.input_layernorm.weight": "model-00001-of-00002.safetensors",
483
+ "thinker.model.layers.15.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
484
+ "thinker.model.layers.15.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
485
+ "thinker.model.layers.15.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
486
+ "thinker.model.layers.15.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
487
+ "thinker.model.layers.15.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
488
+ "thinker.model.layers.15.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
489
+ "thinker.model.layers.15.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
490
+ "thinker.model.layers.15.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
491
+ "thinker.model.layers.15.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
492
+ "thinker.model.layers.15.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
493
+ "thinker.model.layers.16.input_layernorm.weight": "model-00001-of-00002.safetensors",
494
+ "thinker.model.layers.16.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
495
+ "thinker.model.layers.16.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
496
+ "thinker.model.layers.16.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
497
+ "thinker.model.layers.16.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
498
+ "thinker.model.layers.16.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
499
+ "thinker.model.layers.16.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
500
+ "thinker.model.layers.16.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
501
+ "thinker.model.layers.16.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
502
+ "thinker.model.layers.16.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
503
+ "thinker.model.layers.16.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
504
+ "thinker.model.layers.17.input_layernorm.weight": "model-00001-of-00002.safetensors",
505
+ "thinker.model.layers.17.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
506
+ "thinker.model.layers.17.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
507
+ "thinker.model.layers.17.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
508
+ "thinker.model.layers.17.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
509
+ "thinker.model.layers.17.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
510
+ "thinker.model.layers.17.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
511
+ "thinker.model.layers.17.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
512
+ "thinker.model.layers.17.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
513
+ "thinker.model.layers.17.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
514
+ "thinker.model.layers.17.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
515
+ "thinker.model.layers.18.input_layernorm.weight": "model-00001-of-00002.safetensors",
516
+ "thinker.model.layers.18.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
517
+ "thinker.model.layers.18.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
518
+ "thinker.model.layers.18.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
519
+ "thinker.model.layers.18.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
520
+ "thinker.model.layers.18.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
521
+ "thinker.model.layers.18.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
522
+ "thinker.model.layers.18.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
523
+ "thinker.model.layers.18.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
524
+ "thinker.model.layers.18.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
525
+ "thinker.model.layers.18.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
526
+ "thinker.model.layers.19.input_layernorm.weight": "model-00001-of-00002.safetensors",
527
+ "thinker.model.layers.19.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
528
+ "thinker.model.layers.19.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
529
+ "thinker.model.layers.19.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
530
+ "thinker.model.layers.19.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
531
+ "thinker.model.layers.19.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
532
+ "thinker.model.layers.19.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
533
+ "thinker.model.layers.19.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
534
+ "thinker.model.layers.19.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
535
+ "thinker.model.layers.19.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
536
+ "thinker.model.layers.19.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
537
+ "thinker.model.layers.2.input_layernorm.weight": "model-00001-of-00002.safetensors",
538
+ "thinker.model.layers.2.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
539
+ "thinker.model.layers.2.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
540
+ "thinker.model.layers.2.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
541
+ "thinker.model.layers.2.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
542
+ "thinker.model.layers.2.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
543
+ "thinker.model.layers.2.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
544
+ "thinker.model.layers.2.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
545
+ "thinker.model.layers.2.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
546
+ "thinker.model.layers.2.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
547
+ "thinker.model.layers.2.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
548
+ "thinker.model.layers.20.input_layernorm.weight": "model-00001-of-00002.safetensors",
549
+ "thinker.model.layers.20.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
550
+ "thinker.model.layers.20.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
551
+ "thinker.model.layers.20.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
552
+ "thinker.model.layers.20.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
553
+ "thinker.model.layers.20.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
554
+ "thinker.model.layers.20.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
555
+ "thinker.model.layers.20.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
556
+ "thinker.model.layers.20.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
557
+ "thinker.model.layers.20.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
558
+ "thinker.model.layers.20.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
559
+ "thinker.model.layers.21.input_layernorm.weight": "model-00001-of-00002.safetensors",
560
+ "thinker.model.layers.21.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
561
+ "thinker.model.layers.21.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
562
+ "thinker.model.layers.21.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
563
+ "thinker.model.layers.21.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
564
+ "thinker.model.layers.21.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
565
+ "thinker.model.layers.21.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
566
+ "thinker.model.layers.21.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
567
+ "thinker.model.layers.21.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
568
+ "thinker.model.layers.21.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
569
+ "thinker.model.layers.21.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
570
+ "thinker.model.layers.22.input_layernorm.weight": "model-00001-of-00002.safetensors",
571
+ "thinker.model.layers.22.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
572
+ "thinker.model.layers.22.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
573
+ "thinker.model.layers.22.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
574
+ "thinker.model.layers.22.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
575
+ "thinker.model.layers.22.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
576
+ "thinker.model.layers.22.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
577
+ "thinker.model.layers.22.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
578
+ "thinker.model.layers.22.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
579
+ "thinker.model.layers.22.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
580
+ "thinker.model.layers.22.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
581
+ "thinker.model.layers.23.input_layernorm.weight": "model-00001-of-00002.safetensors",
582
+ "thinker.model.layers.23.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
583
+ "thinker.model.layers.23.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
584
+ "thinker.model.layers.23.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
585
+ "thinker.model.layers.23.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
586
+ "thinker.model.layers.23.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
587
+ "thinker.model.layers.23.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
588
+ "thinker.model.layers.23.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
589
+ "thinker.model.layers.23.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
590
+ "thinker.model.layers.23.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
591
+ "thinker.model.layers.23.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
592
+ "thinker.model.layers.24.input_layernorm.weight": "model-00001-of-00002.safetensors",
593
+ "thinker.model.layers.24.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
594
+ "thinker.model.layers.24.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
595
+ "thinker.model.layers.24.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
596
+ "thinker.model.layers.24.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
597
+ "thinker.model.layers.24.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
598
+ "thinker.model.layers.24.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
599
+ "thinker.model.layers.24.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
600
+ "thinker.model.layers.24.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
601
+ "thinker.model.layers.24.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
602
+ "thinker.model.layers.24.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
603
+ "thinker.model.layers.25.input_layernorm.weight": "model-00001-of-00002.safetensors",
604
+ "thinker.model.layers.25.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
605
+ "thinker.model.layers.25.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
606
+ "thinker.model.layers.25.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
607
+ "thinker.model.layers.25.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
608
+ "thinker.model.layers.25.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
609
+ "thinker.model.layers.25.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
610
+ "thinker.model.layers.25.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
611
+ "thinker.model.layers.25.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
612
+ "thinker.model.layers.25.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
613
+ "thinker.model.layers.25.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
614
+ "thinker.model.layers.26.input_layernorm.weight": "model-00001-of-00002.safetensors",
615
+ "thinker.model.layers.26.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
616
+ "thinker.model.layers.26.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
617
+ "thinker.model.layers.26.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
618
+ "thinker.model.layers.26.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
619
+ "thinker.model.layers.26.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
620
+ "thinker.model.layers.26.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
621
+ "thinker.model.layers.26.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
622
+ "thinker.model.layers.26.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
623
+ "thinker.model.layers.26.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
624
+ "thinker.model.layers.26.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
625
+ "thinker.model.layers.27.input_layernorm.weight": "model-00001-of-00002.safetensors",
626
+ "thinker.model.layers.27.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
627
+ "thinker.model.layers.27.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
628
+ "thinker.model.layers.27.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
629
+ "thinker.model.layers.27.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
630
+ "thinker.model.layers.27.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
631
+ "thinker.model.layers.27.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
632
+ "thinker.model.layers.27.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
633
+ "thinker.model.layers.27.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
634
+ "thinker.model.layers.27.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
635
+ "thinker.model.layers.27.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
636
+ "thinker.model.layers.3.input_layernorm.weight": "model-00001-of-00002.safetensors",
637
+ "thinker.model.layers.3.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
638
+ "thinker.model.layers.3.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
639
+ "thinker.model.layers.3.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
640
+ "thinker.model.layers.3.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
641
+ "thinker.model.layers.3.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
642
+ "thinker.model.layers.3.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
643
+ "thinker.model.layers.3.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
644
+ "thinker.model.layers.3.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
645
+ "thinker.model.layers.3.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
646
+ "thinker.model.layers.3.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
647
+ "thinker.model.layers.4.input_layernorm.weight": "model-00001-of-00002.safetensors",
648
+ "thinker.model.layers.4.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
649
+ "thinker.model.layers.4.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
650
+ "thinker.model.layers.4.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
651
+ "thinker.model.layers.4.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
652
+ "thinker.model.layers.4.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
653
+ "thinker.model.layers.4.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
654
+ "thinker.model.layers.4.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
655
+ "thinker.model.layers.4.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
656
+ "thinker.model.layers.4.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
657
+ "thinker.model.layers.4.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
658
+ "thinker.model.layers.5.input_layernorm.weight": "model-00001-of-00002.safetensors",
659
+ "thinker.model.layers.5.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
660
+ "thinker.model.layers.5.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
661
+ "thinker.model.layers.5.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
662
+ "thinker.model.layers.5.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
663
+ "thinker.model.layers.5.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
664
+ "thinker.model.layers.5.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
665
+ "thinker.model.layers.5.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
666
+ "thinker.model.layers.5.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
667
+ "thinker.model.layers.5.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
668
+ "thinker.model.layers.5.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
669
+ "thinker.model.layers.6.input_layernorm.weight": "model-00002-of-00002.safetensors",
670
+ "thinker.model.layers.6.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
671
+ "thinker.model.layers.6.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
672
+ "thinker.model.layers.6.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
673
+ "thinker.model.layers.6.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
674
+ "thinker.model.layers.6.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
675
+ "thinker.model.layers.6.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
676
+ "thinker.model.layers.6.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
677
+ "thinker.model.layers.6.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
678
+ "thinker.model.layers.6.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
679
+ "thinker.model.layers.6.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
680
+ "thinker.model.layers.7.input_layernorm.weight": "model-00002-of-00002.safetensors",
681
+ "thinker.model.layers.7.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
682
+ "thinker.model.layers.7.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
683
+ "thinker.model.layers.7.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
684
+ "thinker.model.layers.7.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
685
+ "thinker.model.layers.7.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
686
+ "thinker.model.layers.7.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
687
+ "thinker.model.layers.7.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
688
+ "thinker.model.layers.7.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
689
+ "thinker.model.layers.7.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
690
+ "thinker.model.layers.7.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
691
+ "thinker.model.layers.8.input_layernorm.weight": "model-00002-of-00002.safetensors",
692
+ "thinker.model.layers.8.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
693
+ "thinker.model.layers.8.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
694
+ "thinker.model.layers.8.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
695
+ "thinker.model.layers.8.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
696
+ "thinker.model.layers.8.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
697
+ "thinker.model.layers.8.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
698
+ "thinker.model.layers.8.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
699
+ "thinker.model.layers.8.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
700
+ "thinker.model.layers.8.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
701
+ "thinker.model.layers.8.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
702
+ "thinker.model.layers.9.input_layernorm.weight": "model-00002-of-00002.safetensors",
703
+ "thinker.model.layers.9.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
704
+ "thinker.model.layers.9.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
705
+ "thinker.model.layers.9.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
706
+ "thinker.model.layers.9.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
707
+ "thinker.model.layers.9.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
708
+ "thinker.model.layers.9.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
709
+ "thinker.model.layers.9.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
710
+ "thinker.model.layers.9.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
711
+ "thinker.model.layers.9.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
712
+ "thinker.model.layers.9.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
713
+ "thinker.model.norm.weight": "model-00002-of-00002.safetensors"
714
+ }
715
+ }
Qwen3-ASR-1.7B/preprocessor_config.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "chunk_length": 30,
3
+ "dither": 0.0,
4
+ "feature_extractor_type": "WhisperFeatureExtractor",
5
+ "feature_size": 128,
6
+ "hop_length": 160,
7
+ "n_fft": 400,
8
+ "n_samples": 480000,
9
+ "nb_max_frames": 3000,
10
+ "padding_side": "right",
11
+ "padding_value": 0.0,
12
+ "processor_class": "Qwen3ASRProcessor",
13
+ "return_attention_mask": true
14
+ }
Qwen3-ASR-1.7B/tokenizer_config.json ADDED
@@ -0,0 +1,549 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "151646": {
30
+ "content": "<|object_ref_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "151647": {
38
+ "content": "<|object_ref_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "151648": {
46
+ "content": "<|box_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "151649": {
54
+ "content": "<|box_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "151650": {
62
+ "content": "<|quad_start|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "151651": {
70
+ "content": "<|quad_end|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "151652": {
78
+ "content": "<|vision_start|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "151653": {
86
+ "content": "<|vision_end|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "151654": {
94
+ "content": "<|vision_pad|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "151655": {
102
+ "content": "<|image_pad|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "151656": {
110
+ "content": "<|video_pad|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "151657": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "151658": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "151659": {
134
+ "content": "<|fim_prefix|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "151660": {
142
+ "content": "<|fim_middle|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "151661": {
150
+ "content": "<|fim_suffix|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "151662": {
158
+ "content": "<|fim_pad|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "151663": {
166
+ "content": "<|repo_name|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "151664": {
174
+ "content": "<|file_sep|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ },
181
+ "151665": {
182
+ "content": "<tool_response>",
183
+ "lstrip": false,
184
+ "normalized": false,
185
+ "rstrip": false,
186
+ "single_word": false,
187
+ "special": false
188
+ },
189
+ "151666": {
190
+ "content": "</tool_response>",
191
+ "lstrip": false,
192
+ "normalized": false,
193
+ "rstrip": false,
194
+ "single_word": false,
195
+ "special": false
196
+ },
197
+ "151667": {
198
+ "content": "<think>",
199
+ "lstrip": false,
200
+ "normalized": false,
201
+ "rstrip": false,
202
+ "single_word": false,
203
+ "special": false
204
+ },
205
+ "151668": {
206
+ "content": "</think>",
207
+ "lstrip": false,
208
+ "normalized": false,
209
+ "rstrip": false,
210
+ "single_word": false,
211
+ "special": false
212
+ },
213
+ "151669": {
214
+ "content": "<|audio_start|>",
215
+ "lstrip": false,
216
+ "normalized": false,
217
+ "rstrip": false,
218
+ "single_word": false,
219
+ "special": true
220
+ },
221
+ "151670": {
222
+ "content": "<|audio_end|>",
223
+ "lstrip": false,
224
+ "normalized": false,
225
+ "rstrip": false,
226
+ "single_word": false,
227
+ "special": true
228
+ },
229
+ "151671": {
230
+ "content": "<tts_pad>",
231
+ "lstrip": false,
232
+ "normalized": false,
233
+ "rstrip": false,
234
+ "single_word": false,
235
+ "special": true
236
+ },
237
+ "151672": {
238
+ "content": "<tts_text_bos>",
239
+ "lstrip": false,
240
+ "normalized": false,
241
+ "rstrip": false,
242
+ "single_word": false,
243
+ "special": true
244
+ },
245
+ "151673": {
246
+ "content": "<tts_text_eod>",
247
+ "lstrip": false,
248
+ "normalized": false,
249
+ "rstrip": false,
250
+ "single_word": false,
251
+ "special": true
252
+ },
253
+ "151674": {
254
+ "content": "<tts_text_bos_single>",
255
+ "lstrip": false,
256
+ "normalized": false,
257
+ "rstrip": false,
258
+ "single_word": false,
259
+ "special": true
260
+ },
261
+ "151675": {
262
+ "content": "<non_speech>",
263
+ "lstrip": false,
264
+ "normalized": false,
265
+ "rstrip": false,
266
+ "single_word": false,
267
+ "special": false
268
+ },
269
+ "151676": {
270
+ "content": "<|audio_pad|>",
271
+ "lstrip": false,
272
+ "normalized": false,
273
+ "rstrip": false,
274
+ "single_word": false,
275
+ "special": true
276
+ },
277
+ "151677": {
278
+ "content": "<blank1>",
279
+ "lstrip": false,
280
+ "normalized": false,
281
+ "rstrip": false,
282
+ "single_word": false,
283
+ "special": true
284
+ },
285
+ "151678": {
286
+ "content": "<blank2>",
287
+ "lstrip": false,
288
+ "normalized": false,
289
+ "rstrip": false,
290
+ "single_word": false,
291
+ "special": true
292
+ },
293
+ "151679": {
294
+ "content": "<blank3>",
295
+ "lstrip": false,
296
+ "normalized": false,
297
+ "rstrip": false,
298
+ "single_word": false,
299
+ "special": true
300
+ },
301
+ "151680": {
302
+ "content": "<blank4>",
303
+ "lstrip": false,
304
+ "normalized": false,
305
+ "rstrip": false,
306
+ "single_word": false,
307
+ "special": true
308
+ },
309
+ "151681": {
310
+ "content": "<blank5>",
311
+ "lstrip": false,
312
+ "normalized": false,
313
+ "rstrip": false,
314
+ "single_word": false,
315
+ "special": true
316
+ },
317
+ "151682": {
318
+ "content": "<blank6>",
319
+ "lstrip": false,
320
+ "normalized": false,
321
+ "rstrip": false,
322
+ "single_word": false,
323
+ "special": true
324
+ },
325
+ "151683": {
326
+ "content": "<blank7>",
327
+ "lstrip": false,
328
+ "normalized": false,
329
+ "rstrip": false,
330
+ "single_word": false,
331
+ "special": true
332
+ },
333
+ "151684": {
334
+ "content": "<blank8>",
335
+ "lstrip": false,
336
+ "normalized": false,
337
+ "rstrip": false,
338
+ "single_word": false,
339
+ "special": true
340
+ },
341
+ "151685": {
342
+ "content": "<blank9>",
343
+ "lstrip": false,
344
+ "normalized": false,
345
+ "rstrip": false,
346
+ "single_word": false,
347
+ "special": true
348
+ },
349
+ "151686": {
350
+ "content": "<blank10>",
351
+ "lstrip": false,
352
+ "normalized": false,
353
+ "rstrip": false,
354
+ "single_word": false,
355
+ "special": true
356
+ },
357
+ "151687": {
358
+ "content": "<blank11>",
359
+ "lstrip": false,
360
+ "normalized": false,
361
+ "rstrip": false,
362
+ "single_word": false,
363
+ "special": true
364
+ },
365
+ "151688": {
366
+ "content": "<blank12>",
367
+ "lstrip": false,
368
+ "normalized": false,
369
+ "rstrip": false,
370
+ "single_word": false,
371
+ "special": true
372
+ },
373
+ "151689": {
374
+ "content": "<blank13>",
375
+ "lstrip": false,
376
+ "normalized": false,
377
+ "rstrip": false,
378
+ "single_word": false,
379
+ "special": true
380
+ },
381
+ "151690": {
382
+ "content": "<blank14>",
383
+ "lstrip": false,
384
+ "normalized": false,
385
+ "rstrip": false,
386
+ "single_word": false,
387
+ "special": true
388
+ },
389
+ "151691": {
390
+ "content": "<blank15>",
391
+ "lstrip": false,
392
+ "normalized": false,
393
+ "rstrip": false,
394
+ "single_word": false,
395
+ "special": true
396
+ },
397
+ "151692": {
398
+ "content": "<blank16>",
399
+ "lstrip": false,
400
+ "normalized": false,
401
+ "rstrip": false,
402
+ "single_word": false,
403
+ "special": true
404
+ },
405
+ "151693": {
406
+ "content": "<blank17>",
407
+ "lstrip": false,
408
+ "normalized": false,
409
+ "rstrip": false,
410
+ "single_word": false,
411
+ "special": true
412
+ },
413
+ "151694": {
414
+ "content": "<blank18>",
415
+ "lstrip": false,
416
+ "normalized": false,
417
+ "rstrip": false,
418
+ "single_word": false,
419
+ "special": true
420
+ },
421
+ "151695": {
422
+ "content": "<blank19>",
423
+ "lstrip": false,
424
+ "normalized": false,
425
+ "rstrip": false,
426
+ "single_word": false,
427
+ "special": true
428
+ },
429
+ "151696": {
430
+ "content": "<blank20>",
431
+ "lstrip": false,
432
+ "normalized": false,
433
+ "rstrip": false,
434
+ "single_word": false,
435
+ "special": true
436
+ },
437
+ "151697": {
438
+ "content": "<blank21>",
439
+ "lstrip": false,
440
+ "normalized": false,
441
+ "rstrip": false,
442
+ "single_word": false,
443
+ "special": true
444
+ },
445
+ "151698": {
446
+ "content": "<blank22>",
447
+ "lstrip": false,
448
+ "normalized": false,
449
+ "rstrip": false,
450
+ "single_word": false,
451
+ "special": true
452
+ },
453
+ "151699": {
454
+ "content": "<blank23>",
455
+ "lstrip": false,
456
+ "normalized": false,
457
+ "rstrip": false,
458
+ "single_word": false,
459
+ "special": true
460
+ },
461
+ "151700": {
462
+ "content": "<blank24>",
463
+ "lstrip": false,
464
+ "normalized": false,
465
+ "rstrip": false,
466
+ "single_word": false,
467
+ "special": true
468
+ },
469
+ "151701": {
470
+ "content": "<blank25>",
471
+ "lstrip": false,
472
+ "normalized": false,
473
+ "rstrip": false,
474
+ "single_word": false,
475
+ "special": true
476
+ },
477
+ "151702": {
478
+ "content": "<blank26>",
479
+ "lstrip": false,
480
+ "normalized": false,
481
+ "rstrip": false,
482
+ "single_word": false,
483
+ "special": true
484
+ },
485
+ "151703": {
486
+ "content": "<blank27>",
487
+ "lstrip": false,
488
+ "normalized": false,
489
+ "rstrip": false,
490
+ "single_word": false,
491
+ "special": true
492
+ },
493
+ "151704": {
494
+ "content": "<asr_text>",
495
+ "lstrip": false,
496
+ "normalized": false,
497
+ "rstrip": false,
498
+ "single_word": false,
499
+ "special": false
500
+ }
501
+ },
502
+ "additional_special_tokens": [
503
+ "<|im_start|>",
504
+ "<|im_end|>",
505
+ "<|object_ref_start|>",
506
+ "<|object_ref_end|>",
507
+ "<|box_start|>",
508
+ "<|box_end|>",
509
+ "<|quad_start|>",
510
+ "<|quad_end|>",
511
+ "<|vision_start|>",
512
+ "<|vision_end|>",
513
+ "<|vision_pad|>",
514
+ "<|image_pad|>",
515
+ "<|video_pad|>",
516
+ "<|audio_start|>",
517
+ "<|audio_end|>",
518
+ "<tts_pad>",
519
+ "<tts_text_bos>",
520
+ "<tts_text_bos_single>",
521
+ "<|audio_pad|>"
522
+ ],
523
+ "audio_bos_token": "<|audio_start|>",
524
+ "audio_eos_token": "<|audio_end|>",
525
+ "audio_token": "<|audio_pad|>",
526
+ "bos_token": null,
527
+ "clean_up_tokenization_spaces": false,
528
+ "eos_token": "<|im_end|>",
529
+ "errors": "replace",
530
+ "extra_special_tokens": {
531
+ "audio_bos_token": "<|audio_start|>",
532
+ "audio_eos_token": "<|audio_end|>",
533
+ "audio_token": "<|audio_pad|>",
534
+ "image_token": "<|image_pad|>",
535
+ "video_token": "<|video_pad|>",
536
+ "vision_bos_token": "<|vision_start|>",
537
+ "vision_eos_token": "<|vision_end|>"
538
+ },
539
+ "image_token": "<|image_pad|>",
540
+ "model_max_length": 131072,
541
+ "pad_token": "<|endoftext|>",
542
+ "processor_class": "Qwen3ASRProcessor",
543
+ "split_special_tokens": false,
544
+ "tokenizer_class": "Qwen2Tokenizer",
545
+ "unk_token": null,
546
+ "video_token": "<|video_pad|>",
547
+ "vision_bos_token": "<|vision_start|>",
548
+ "vision_eos_token": "<|vision_end|>"
549
+ }
Qwen3-ASR-1.7B/vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
audio_quality_router/best_acc_model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4539baeb97ef97b19648e79990b5966345805e2bef8760b28a8620544300a873
3
+ size 12359420
mega-asr-merged/adapter_config.json ADDED
@@ -0,0 +1,1099 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {
3
+ "thinker.audio_tower.conv_out": 24,
4
+ "thinker.audio_tower.layers.0.fc1": 8,
5
+ "thinker.audio_tower.layers.0.fc2": 8,
6
+ "thinker.audio_tower.layers.0.self_attn.k_proj": 16,
7
+ "thinker.audio_tower.layers.0.self_attn.out_proj": 16,
8
+ "thinker.audio_tower.layers.0.self_attn.q_proj": 16,
9
+ "thinker.audio_tower.layers.0.self_attn.v_proj": 16,
10
+ "thinker.audio_tower.layers.1.fc1": 8,
11
+ "thinker.audio_tower.layers.1.fc2": 8,
12
+ "thinker.audio_tower.layers.1.self_attn.k_proj": 16,
13
+ "thinker.audio_tower.layers.1.self_attn.out_proj": 16,
14
+ "thinker.audio_tower.layers.1.self_attn.q_proj": 16,
15
+ "thinker.audio_tower.layers.1.self_attn.v_proj": 16,
16
+ "thinker.audio_tower.layers.10.fc1": 8,
17
+ "thinker.audio_tower.layers.10.fc2": 8,
18
+ "thinker.audio_tower.layers.10.self_attn.k_proj": 16,
19
+ "thinker.audio_tower.layers.10.self_attn.out_proj": 16,
20
+ "thinker.audio_tower.layers.10.self_attn.q_proj": 16,
21
+ "thinker.audio_tower.layers.10.self_attn.v_proj": 16,
22
+ "thinker.audio_tower.layers.11.fc1": 8,
23
+ "thinker.audio_tower.layers.11.fc2": 8,
24
+ "thinker.audio_tower.layers.11.self_attn.k_proj": 16,
25
+ "thinker.audio_tower.layers.11.self_attn.out_proj": 16,
26
+ "thinker.audio_tower.layers.11.self_attn.q_proj": 16,
27
+ "thinker.audio_tower.layers.11.self_attn.v_proj": 16,
28
+ "thinker.audio_tower.layers.12.fc1": 8,
29
+ "thinker.audio_tower.layers.12.fc2": 8,
30
+ "thinker.audio_tower.layers.12.self_attn.k_proj": 16,
31
+ "thinker.audio_tower.layers.12.self_attn.out_proj": 16,
32
+ "thinker.audio_tower.layers.12.self_attn.q_proj": 16,
33
+ "thinker.audio_tower.layers.12.self_attn.v_proj": 16,
34
+ "thinker.audio_tower.layers.13.fc1": 8,
35
+ "thinker.audio_tower.layers.13.fc2": 8,
36
+ "thinker.audio_tower.layers.13.self_attn.k_proj": 16,
37
+ "thinker.audio_tower.layers.13.self_attn.out_proj": 16,
38
+ "thinker.audio_tower.layers.13.self_attn.q_proj": 16,
39
+ "thinker.audio_tower.layers.13.self_attn.v_proj": 16,
40
+ "thinker.audio_tower.layers.14.fc1": 8,
41
+ "thinker.audio_tower.layers.14.fc2": 8,
42
+ "thinker.audio_tower.layers.14.self_attn.k_proj": 16,
43
+ "thinker.audio_tower.layers.14.self_attn.out_proj": 16,
44
+ "thinker.audio_tower.layers.14.self_attn.q_proj": 16,
45
+ "thinker.audio_tower.layers.14.self_attn.v_proj": 16,
46
+ "thinker.audio_tower.layers.15.fc1": 8,
47
+ "thinker.audio_tower.layers.15.fc2": 8,
48
+ "thinker.audio_tower.layers.15.self_attn.k_proj": 16,
49
+ "thinker.audio_tower.layers.15.self_attn.out_proj": 16,
50
+ "thinker.audio_tower.layers.15.self_attn.q_proj": 16,
51
+ "thinker.audio_tower.layers.15.self_attn.v_proj": 16,
52
+ "thinker.audio_tower.layers.16.fc1": 8,
53
+ "thinker.audio_tower.layers.16.fc2": 8,
54
+ "thinker.audio_tower.layers.16.self_attn.k_proj": 16,
55
+ "thinker.audio_tower.layers.16.self_attn.out_proj": 16,
56
+ "thinker.audio_tower.layers.16.self_attn.q_proj": 16,
57
+ "thinker.audio_tower.layers.16.self_attn.v_proj": 16,
58
+ "thinker.audio_tower.layers.17.fc1": 8,
59
+ "thinker.audio_tower.layers.17.fc2": 8,
60
+ "thinker.audio_tower.layers.17.self_attn.k_proj": 16,
61
+ "thinker.audio_tower.layers.17.self_attn.out_proj": 16,
62
+ "thinker.audio_tower.layers.17.self_attn.q_proj": 16,
63
+ "thinker.audio_tower.layers.17.self_attn.v_proj": 16,
64
+ "thinker.audio_tower.layers.18.fc1": 8,
65
+ "thinker.audio_tower.layers.18.fc2": 8,
66
+ "thinker.audio_tower.layers.18.self_attn.k_proj": 16,
67
+ "thinker.audio_tower.layers.18.self_attn.out_proj": 16,
68
+ "thinker.audio_tower.layers.18.self_attn.q_proj": 16,
69
+ "thinker.audio_tower.layers.18.self_attn.v_proj": 16,
70
+ "thinker.audio_tower.layers.19.fc1": 8,
71
+ "thinker.audio_tower.layers.19.fc2": 8,
72
+ "thinker.audio_tower.layers.19.self_attn.k_proj": 16,
73
+ "thinker.audio_tower.layers.19.self_attn.out_proj": 16,
74
+ "thinker.audio_tower.layers.19.self_attn.q_proj": 16,
75
+ "thinker.audio_tower.layers.19.self_attn.v_proj": 16,
76
+ "thinker.audio_tower.layers.2.fc1": 8,
77
+ "thinker.audio_tower.layers.2.fc2": 8,
78
+ "thinker.audio_tower.layers.2.self_attn.k_proj": 16,
79
+ "thinker.audio_tower.layers.2.self_attn.out_proj": 16,
80
+ "thinker.audio_tower.layers.2.self_attn.q_proj": 16,
81
+ "thinker.audio_tower.layers.2.self_attn.v_proj": 16,
82
+ "thinker.audio_tower.layers.20.fc1": 8,
83
+ "thinker.audio_tower.layers.20.fc2": 8,
84
+ "thinker.audio_tower.layers.20.self_attn.k_proj": 24,
85
+ "thinker.audio_tower.layers.20.self_attn.out_proj": 24,
86
+ "thinker.audio_tower.layers.20.self_attn.q_proj": 24,
87
+ "thinker.audio_tower.layers.20.self_attn.v_proj": 24,
88
+ "thinker.audio_tower.layers.21.fc1": 8,
89
+ "thinker.audio_tower.layers.21.fc2": 8,
90
+ "thinker.audio_tower.layers.21.self_attn.k_proj": 24,
91
+ "thinker.audio_tower.layers.21.self_attn.out_proj": 24,
92
+ "thinker.audio_tower.layers.21.self_attn.q_proj": 24,
93
+ "thinker.audio_tower.layers.21.self_attn.v_proj": 24,
94
+ "thinker.audio_tower.layers.22.fc1": 8,
95
+ "thinker.audio_tower.layers.22.fc2": 8,
96
+ "thinker.audio_tower.layers.22.self_attn.k_proj": 24,
97
+ "thinker.audio_tower.layers.22.self_attn.out_proj": 24,
98
+ "thinker.audio_tower.layers.22.self_attn.q_proj": 24,
99
+ "thinker.audio_tower.layers.22.self_attn.v_proj": 24,
100
+ "thinker.audio_tower.layers.23.fc1": 8,
101
+ "thinker.audio_tower.layers.23.fc2": 8,
102
+ "thinker.audio_tower.layers.23.self_attn.k_proj": 24,
103
+ "thinker.audio_tower.layers.23.self_attn.out_proj": 24,
104
+ "thinker.audio_tower.layers.23.self_attn.q_proj": 24,
105
+ "thinker.audio_tower.layers.23.self_attn.v_proj": 24,
106
+ "thinker.audio_tower.layers.3.fc1": 8,
107
+ "thinker.audio_tower.layers.3.fc2": 8,
108
+ "thinker.audio_tower.layers.3.self_attn.k_proj": 16,
109
+ "thinker.audio_tower.layers.3.self_attn.out_proj": 16,
110
+ "thinker.audio_tower.layers.3.self_attn.q_proj": 16,
111
+ "thinker.audio_tower.layers.3.self_attn.v_proj": 16,
112
+ "thinker.audio_tower.layers.4.fc1": 8,
113
+ "thinker.audio_tower.layers.4.fc2": 8,
114
+ "thinker.audio_tower.layers.4.self_attn.k_proj": 16,
115
+ "thinker.audio_tower.layers.4.self_attn.out_proj": 16,
116
+ "thinker.audio_tower.layers.4.self_attn.q_proj": 16,
117
+ "thinker.audio_tower.layers.4.self_attn.v_proj": 16,
118
+ "thinker.audio_tower.layers.5.fc1": 8,
119
+ "thinker.audio_tower.layers.5.fc2": 8,
120
+ "thinker.audio_tower.layers.5.self_attn.k_proj": 16,
121
+ "thinker.audio_tower.layers.5.self_attn.out_proj": 16,
122
+ "thinker.audio_tower.layers.5.self_attn.q_proj": 16,
123
+ "thinker.audio_tower.layers.5.self_attn.v_proj": 16,
124
+ "thinker.audio_tower.layers.6.fc1": 8,
125
+ "thinker.audio_tower.layers.6.fc2": 8,
126
+ "thinker.audio_tower.layers.6.self_attn.k_proj": 16,
127
+ "thinker.audio_tower.layers.6.self_attn.out_proj": 16,
128
+ "thinker.audio_tower.layers.6.self_attn.q_proj": 16,
129
+ "thinker.audio_tower.layers.6.self_attn.v_proj": 16,
130
+ "thinker.audio_tower.layers.7.fc1": 8,
131
+ "thinker.audio_tower.layers.7.fc2": 8,
132
+ "thinker.audio_tower.layers.7.self_attn.k_proj": 16,
133
+ "thinker.audio_tower.layers.7.self_attn.out_proj": 16,
134
+ "thinker.audio_tower.layers.7.self_attn.q_proj": 16,
135
+ "thinker.audio_tower.layers.7.self_attn.v_proj": 16,
136
+ "thinker.audio_tower.layers.8.fc1": 8,
137
+ "thinker.audio_tower.layers.8.fc2": 8,
138
+ "thinker.audio_tower.layers.8.self_attn.k_proj": 16,
139
+ "thinker.audio_tower.layers.8.self_attn.out_proj": 16,
140
+ "thinker.audio_tower.layers.8.self_attn.q_proj": 16,
141
+ "thinker.audio_tower.layers.8.self_attn.v_proj": 16,
142
+ "thinker.audio_tower.layers.9.fc1": 8,
143
+ "thinker.audio_tower.layers.9.fc2": 8,
144
+ "thinker.audio_tower.layers.9.self_attn.k_proj": 16,
145
+ "thinker.audio_tower.layers.9.self_attn.out_proj": 16,
146
+ "thinker.audio_tower.layers.9.self_attn.q_proj": 16,
147
+ "thinker.audio_tower.layers.9.self_attn.v_proj": 16,
148
+ "thinker.audio_tower.proj1": 16,
149
+ "thinker.audio_tower.proj2": 16,
150
+ "thinker.layers.0.mlp.down_proj": 8,
151
+ "thinker.layers.0.mlp.gate_proj": 8,
152
+ "thinker.layers.0.mlp.up_proj": 8,
153
+ "thinker.layers.0.self_attn.k_proj": 8,
154
+ "thinker.layers.0.self_attn.o_proj": 8,
155
+ "thinker.layers.0.self_attn.q_proj": 8,
156
+ "thinker.layers.0.self_attn.v_proj": 8,
157
+ "thinker.layers.1.mlp.down_proj": 8,
158
+ "thinker.layers.1.mlp.gate_proj": 8,
159
+ "thinker.layers.1.mlp.up_proj": 8,
160
+ "thinker.layers.1.self_attn.k_proj": 8,
161
+ "thinker.layers.1.self_attn.o_proj": 8,
162
+ "thinker.layers.1.self_attn.q_proj": 8,
163
+ "thinker.layers.1.self_attn.v_proj": 8,
164
+ "thinker.layers.10.mlp.down_proj": 8,
165
+ "thinker.layers.10.mlp.gate_proj": 8,
166
+ "thinker.layers.10.mlp.up_proj": 8,
167
+ "thinker.layers.10.self_attn.k_proj": 8,
168
+ "thinker.layers.10.self_attn.o_proj": 8,
169
+ "thinker.layers.10.self_attn.q_proj": 8,
170
+ "thinker.layers.10.self_attn.v_proj": 8,
171
+ "thinker.layers.11.mlp.down_proj": 8,
172
+ "thinker.layers.11.mlp.gate_proj": 8,
173
+ "thinker.layers.11.mlp.up_proj": 8,
174
+ "thinker.layers.11.self_attn.k_proj": 8,
175
+ "thinker.layers.11.self_attn.o_proj": 8,
176
+ "thinker.layers.11.self_attn.q_proj": 8,
177
+ "thinker.layers.11.self_attn.v_proj": 8,
178
+ "thinker.layers.12.mlp.down_proj": 8,
179
+ "thinker.layers.12.mlp.gate_proj": 8,
180
+ "thinker.layers.12.mlp.up_proj": 8,
181
+ "thinker.layers.12.self_attn.k_proj": 8,
182
+ "thinker.layers.12.self_attn.o_proj": 8,
183
+ "thinker.layers.12.self_attn.q_proj": 8,
184
+ "thinker.layers.12.self_attn.v_proj": 8,
185
+ "thinker.layers.13.mlp.down_proj": 8,
186
+ "thinker.layers.13.mlp.gate_proj": 8,
187
+ "thinker.layers.13.mlp.up_proj": 8,
188
+ "thinker.layers.13.self_attn.k_proj": 8,
189
+ "thinker.layers.13.self_attn.o_proj": 8,
190
+ "thinker.layers.13.self_attn.q_proj": 8,
191
+ "thinker.layers.13.self_attn.v_proj": 8,
192
+ "thinker.layers.14.mlp.down_proj": 8,
193
+ "thinker.layers.14.mlp.gate_proj": 8,
194
+ "thinker.layers.14.mlp.up_proj": 8,
195
+ "thinker.layers.14.self_attn.k_proj": 8,
196
+ "thinker.layers.14.self_attn.o_proj": 8,
197
+ "thinker.layers.14.self_attn.q_proj": 8,
198
+ "thinker.layers.14.self_attn.v_proj": 8,
199
+ "thinker.layers.15.mlp.down_proj": 8,
200
+ "thinker.layers.15.mlp.gate_proj": 8,
201
+ "thinker.layers.15.mlp.up_proj": 8,
202
+ "thinker.layers.15.self_attn.k_proj": 8,
203
+ "thinker.layers.15.self_attn.o_proj": 8,
204
+ "thinker.layers.15.self_attn.q_proj": 8,
205
+ "thinker.layers.15.self_attn.v_proj": 8,
206
+ "thinker.layers.16.mlp.down_proj": 8,
207
+ "thinker.layers.16.mlp.gate_proj": 8,
208
+ "thinker.layers.16.mlp.up_proj": 8,
209
+ "thinker.layers.16.self_attn.k_proj": 8,
210
+ "thinker.layers.16.self_attn.o_proj": 8,
211
+ "thinker.layers.16.self_attn.q_proj": 8,
212
+ "thinker.layers.16.self_attn.v_proj": 8,
213
+ "thinker.layers.17.mlp.down_proj": 8,
214
+ "thinker.layers.17.mlp.gate_proj": 8,
215
+ "thinker.layers.17.mlp.up_proj": 8,
216
+ "thinker.layers.17.self_attn.k_proj": 8,
217
+ "thinker.layers.17.self_attn.o_proj": 8,
218
+ "thinker.layers.17.self_attn.q_proj": 8,
219
+ "thinker.layers.17.self_attn.v_proj": 8,
220
+ "thinker.layers.18.mlp.down_proj": 8,
221
+ "thinker.layers.18.mlp.gate_proj": 8,
222
+ "thinker.layers.18.mlp.up_proj": 8,
223
+ "thinker.layers.18.self_attn.k_proj": 8,
224
+ "thinker.layers.18.self_attn.o_proj": 8,
225
+ "thinker.layers.18.self_attn.q_proj": 8,
226
+ "thinker.layers.18.self_attn.v_proj": 8,
227
+ "thinker.layers.19.mlp.down_proj": 8,
228
+ "thinker.layers.19.mlp.gate_proj": 8,
229
+ "thinker.layers.19.mlp.up_proj": 8,
230
+ "thinker.layers.19.self_attn.k_proj": 8,
231
+ "thinker.layers.19.self_attn.o_proj": 8,
232
+ "thinker.layers.19.self_attn.q_proj": 8,
233
+ "thinker.layers.19.self_attn.v_proj": 8,
234
+ "thinker.layers.2.mlp.down_proj": 8,
235
+ "thinker.layers.2.mlp.gate_proj": 8,
236
+ "thinker.layers.2.mlp.up_proj": 8,
237
+ "thinker.layers.2.self_attn.k_proj": 8,
238
+ "thinker.layers.2.self_attn.o_proj": 8,
239
+ "thinker.layers.2.self_attn.q_proj": 8,
240
+ "thinker.layers.2.self_attn.v_proj": 8,
241
+ "thinker.layers.20.mlp.down_proj": 8,
242
+ "thinker.layers.20.mlp.gate_proj": 8,
243
+ "thinker.layers.20.mlp.up_proj": 8,
244
+ "thinker.layers.20.self_attn.k_proj": 8,
245
+ "thinker.layers.20.self_attn.o_proj": 8,
246
+ "thinker.layers.20.self_attn.q_proj": 8,
247
+ "thinker.layers.20.self_attn.v_proj": 8,
248
+ "thinker.layers.21.mlp.down_proj": 8,
249
+ "thinker.layers.21.mlp.gate_proj": 8,
250
+ "thinker.layers.21.mlp.up_proj": 8,
251
+ "thinker.layers.21.self_attn.k_proj": 8,
252
+ "thinker.layers.21.self_attn.o_proj": 8,
253
+ "thinker.layers.21.self_attn.q_proj": 8,
254
+ "thinker.layers.21.self_attn.v_proj": 8,
255
+ "thinker.layers.22.mlp.down_proj": 8,
256
+ "thinker.layers.22.mlp.gate_proj": 8,
257
+ "thinker.layers.22.mlp.up_proj": 8,
258
+ "thinker.layers.22.self_attn.k_proj": 8,
259
+ "thinker.layers.22.self_attn.o_proj": 8,
260
+ "thinker.layers.22.self_attn.q_proj": 8,
261
+ "thinker.layers.22.self_attn.v_proj": 8,
262
+ "thinker.layers.23.mlp.down_proj": 8,
263
+ "thinker.layers.23.mlp.gate_proj": 8,
264
+ "thinker.layers.23.mlp.up_proj": 8,
265
+ "thinker.layers.23.self_attn.k_proj": 8,
266
+ "thinker.layers.23.self_attn.o_proj": 8,
267
+ "thinker.layers.23.self_attn.q_proj": 8,
268
+ "thinker.layers.23.self_attn.v_proj": 8,
269
+ "thinker.layers.24.mlp.down_proj": 8,
270
+ "thinker.layers.24.mlp.gate_proj": 8,
271
+ "thinker.layers.24.mlp.up_proj": 8,
272
+ "thinker.layers.24.self_attn.k_proj": 8,
273
+ "thinker.layers.24.self_attn.o_proj": 8,
274
+ "thinker.layers.24.self_attn.q_proj": 8,
275
+ "thinker.layers.24.self_attn.v_proj": 8,
276
+ "thinker.layers.25.mlp.down_proj": 8,
277
+ "thinker.layers.25.mlp.gate_proj": 8,
278
+ "thinker.layers.25.mlp.up_proj": 8,
279
+ "thinker.layers.25.self_attn.k_proj": 8,
280
+ "thinker.layers.25.self_attn.o_proj": 8,
281
+ "thinker.layers.25.self_attn.q_proj": 8,
282
+ "thinker.layers.25.self_attn.v_proj": 8,
283
+ "thinker.layers.26.mlp.down_proj": 8,
284
+ "thinker.layers.26.mlp.gate_proj": 8,
285
+ "thinker.layers.26.mlp.up_proj": 8,
286
+ "thinker.layers.26.self_attn.k_proj": 8,
287
+ "thinker.layers.26.self_attn.o_proj": 8,
288
+ "thinker.layers.26.self_attn.q_proj": 8,
289
+ "thinker.layers.26.self_attn.v_proj": 8,
290
+ "thinker.layers.27.mlp.down_proj": 8,
291
+ "thinker.layers.27.mlp.gate_proj": 8,
292
+ "thinker.layers.27.mlp.up_proj": 8,
293
+ "thinker.layers.27.self_attn.k_proj": 8,
294
+ "thinker.layers.27.self_attn.o_proj": 8,
295
+ "thinker.layers.27.self_attn.q_proj": 8,
296
+ "thinker.layers.27.self_attn.v_proj": 8,
297
+ "thinker.layers.3.mlp.down_proj": 8,
298
+ "thinker.layers.3.mlp.gate_proj": 8,
299
+ "thinker.layers.3.mlp.up_proj": 8,
300
+ "thinker.layers.3.self_attn.k_proj": 8,
301
+ "thinker.layers.3.self_attn.o_proj": 8,
302
+ "thinker.layers.3.self_attn.q_proj": 8,
303
+ "thinker.layers.3.self_attn.v_proj": 8,
304
+ "thinker.layers.4.mlp.down_proj": 8,
305
+ "thinker.layers.4.mlp.gate_proj": 8,
306
+ "thinker.layers.4.mlp.up_proj": 8,
307
+ "thinker.layers.4.self_attn.k_proj": 8,
308
+ "thinker.layers.4.self_attn.o_proj": 8,
309
+ "thinker.layers.4.self_attn.q_proj": 8,
310
+ "thinker.layers.4.self_attn.v_proj": 8,
311
+ "thinker.layers.5.mlp.down_proj": 8,
312
+ "thinker.layers.5.mlp.gate_proj": 8,
313
+ "thinker.layers.5.mlp.up_proj": 8,
314
+ "thinker.layers.5.self_attn.k_proj": 8,
315
+ "thinker.layers.5.self_attn.o_proj": 8,
316
+ "thinker.layers.5.self_attn.q_proj": 8,
317
+ "thinker.layers.5.self_attn.v_proj": 8,
318
+ "thinker.layers.6.mlp.down_proj": 8,
319
+ "thinker.layers.6.mlp.gate_proj": 8,
320
+ "thinker.layers.6.mlp.up_proj": 8,
321
+ "thinker.layers.6.self_attn.k_proj": 8,
322
+ "thinker.layers.6.self_attn.o_proj": 8,
323
+ "thinker.layers.6.self_attn.q_proj": 8,
324
+ "thinker.layers.6.self_attn.v_proj": 8,
325
+ "thinker.layers.7.mlp.down_proj": 8,
326
+ "thinker.layers.7.mlp.gate_proj": 8,
327
+ "thinker.layers.7.mlp.up_proj": 8,
328
+ "thinker.layers.7.self_attn.k_proj": 8,
329
+ "thinker.layers.7.self_attn.o_proj": 8,
330
+ "thinker.layers.7.self_attn.q_proj": 8,
331
+ "thinker.layers.7.self_attn.v_proj": 8,
332
+ "thinker.layers.8.mlp.down_proj": 8,
333
+ "thinker.layers.8.mlp.gate_proj": 8,
334
+ "thinker.layers.8.mlp.up_proj": 8,
335
+ "thinker.layers.8.self_attn.k_proj": 8,
336
+ "thinker.layers.8.self_attn.o_proj": 8,
337
+ "thinker.layers.8.self_attn.q_proj": 8,
338
+ "thinker.layers.8.self_attn.v_proj": 8,
339
+ "thinker.layers.9.mlp.down_proj": 8,
340
+ "thinker.layers.9.mlp.gate_proj": 8,
341
+ "thinker.layers.9.mlp.up_proj": 8,
342
+ "thinker.layers.9.self_attn.k_proj": 8,
343
+ "thinker.layers.9.self_attn.o_proj": 8,
344
+ "thinker.layers.9.self_attn.q_proj": 8,
345
+ "thinker.layers.9.self_attn.v_proj": 8,
346
+ "thinker.model.layers.0.mlp.down_proj": 8,
347
+ "thinker.model.layers.0.mlp.gate_proj": 8,
348
+ "thinker.model.layers.0.mlp.up_proj": 8,
349
+ "thinker.model.layers.0.self_attn.k_proj": 8,
350
+ "thinker.model.layers.0.self_attn.o_proj": 8,
351
+ "thinker.model.layers.0.self_attn.q_proj": 8,
352
+ "thinker.model.layers.0.self_attn.v_proj": 8,
353
+ "thinker.model.layers.1.mlp.down_proj": 8,
354
+ "thinker.model.layers.1.mlp.gate_proj": 8,
355
+ "thinker.model.layers.1.mlp.up_proj": 8,
356
+ "thinker.model.layers.1.self_attn.k_proj": 8,
357
+ "thinker.model.layers.1.self_attn.o_proj": 8,
358
+ "thinker.model.layers.1.self_attn.q_proj": 8,
359
+ "thinker.model.layers.1.self_attn.v_proj": 8,
360
+ "thinker.model.layers.10.mlp.down_proj": 8,
361
+ "thinker.model.layers.10.mlp.gate_proj": 8,
362
+ "thinker.model.layers.10.mlp.up_proj": 8,
363
+ "thinker.model.layers.10.self_attn.k_proj": 8,
364
+ "thinker.model.layers.10.self_attn.o_proj": 8,
365
+ "thinker.model.layers.10.self_attn.q_proj": 8,
366
+ "thinker.model.layers.10.self_attn.v_proj": 8,
367
+ "thinker.model.layers.11.mlp.down_proj": 8,
368
+ "thinker.model.layers.11.mlp.gate_proj": 8,
369
+ "thinker.model.layers.11.mlp.up_proj": 8,
370
+ "thinker.model.layers.11.self_attn.k_proj": 8,
371
+ "thinker.model.layers.11.self_attn.o_proj": 8,
372
+ "thinker.model.layers.11.self_attn.q_proj": 8,
373
+ "thinker.model.layers.11.self_attn.v_proj": 8,
374
+ "thinker.model.layers.12.mlp.down_proj": 8,
375
+ "thinker.model.layers.12.mlp.gate_proj": 8,
376
+ "thinker.model.layers.12.mlp.up_proj": 8,
377
+ "thinker.model.layers.12.self_attn.k_proj": 8,
378
+ "thinker.model.layers.12.self_attn.o_proj": 8,
379
+ "thinker.model.layers.12.self_attn.q_proj": 8,
380
+ "thinker.model.layers.12.self_attn.v_proj": 8,
381
+ "thinker.model.layers.13.mlp.down_proj": 8,
382
+ "thinker.model.layers.13.mlp.gate_proj": 8,
383
+ "thinker.model.layers.13.mlp.up_proj": 8,
384
+ "thinker.model.layers.13.self_attn.k_proj": 8,
385
+ "thinker.model.layers.13.self_attn.o_proj": 8,
386
+ "thinker.model.layers.13.self_attn.q_proj": 8,
387
+ "thinker.model.layers.13.self_attn.v_proj": 8,
388
+ "thinker.model.layers.14.mlp.down_proj": 8,
389
+ "thinker.model.layers.14.mlp.gate_proj": 8,
390
+ "thinker.model.layers.14.mlp.up_proj": 8,
391
+ "thinker.model.layers.14.self_attn.k_proj": 8,
392
+ "thinker.model.layers.14.self_attn.o_proj": 8,
393
+ "thinker.model.layers.14.self_attn.q_proj": 8,
394
+ "thinker.model.layers.14.self_attn.v_proj": 8,
395
+ "thinker.model.layers.15.mlp.down_proj": 8,
396
+ "thinker.model.layers.15.mlp.gate_proj": 8,
397
+ "thinker.model.layers.15.mlp.up_proj": 8,
398
+ "thinker.model.layers.15.self_attn.k_proj": 8,
399
+ "thinker.model.layers.15.self_attn.o_proj": 8,
400
+ "thinker.model.layers.15.self_attn.q_proj": 8,
401
+ "thinker.model.layers.15.self_attn.v_proj": 8,
402
+ "thinker.model.layers.16.mlp.down_proj": 8,
403
+ "thinker.model.layers.16.mlp.gate_proj": 8,
404
+ "thinker.model.layers.16.mlp.up_proj": 8,
405
+ "thinker.model.layers.16.self_attn.k_proj": 8,
406
+ "thinker.model.layers.16.self_attn.o_proj": 8,
407
+ "thinker.model.layers.16.self_attn.q_proj": 8,
408
+ "thinker.model.layers.16.self_attn.v_proj": 8,
409
+ "thinker.model.layers.17.mlp.down_proj": 8,
410
+ "thinker.model.layers.17.mlp.gate_proj": 8,
411
+ "thinker.model.layers.17.mlp.up_proj": 8,
412
+ "thinker.model.layers.17.self_attn.k_proj": 8,
413
+ "thinker.model.layers.17.self_attn.o_proj": 8,
414
+ "thinker.model.layers.17.self_attn.q_proj": 8,
415
+ "thinker.model.layers.17.self_attn.v_proj": 8,
416
+ "thinker.model.layers.18.mlp.down_proj": 8,
417
+ "thinker.model.layers.18.mlp.gate_proj": 8,
418
+ "thinker.model.layers.18.mlp.up_proj": 8,
419
+ "thinker.model.layers.18.self_attn.k_proj": 8,
420
+ "thinker.model.layers.18.self_attn.o_proj": 8,
421
+ "thinker.model.layers.18.self_attn.q_proj": 8,
422
+ "thinker.model.layers.18.self_attn.v_proj": 8,
423
+ "thinker.model.layers.19.mlp.down_proj": 8,
424
+ "thinker.model.layers.19.mlp.gate_proj": 8,
425
+ "thinker.model.layers.19.mlp.up_proj": 8,
426
+ "thinker.model.layers.19.self_attn.k_proj": 8,
427
+ "thinker.model.layers.19.self_attn.o_proj": 8,
428
+ "thinker.model.layers.19.self_attn.q_proj": 8,
429
+ "thinker.model.layers.19.self_attn.v_proj": 8,
430
+ "thinker.model.layers.2.mlp.down_proj": 8,
431
+ "thinker.model.layers.2.mlp.gate_proj": 8,
432
+ "thinker.model.layers.2.mlp.up_proj": 8,
433
+ "thinker.model.layers.2.self_attn.k_proj": 8,
434
+ "thinker.model.layers.2.self_attn.o_proj": 8,
435
+ "thinker.model.layers.2.self_attn.q_proj": 8,
436
+ "thinker.model.layers.2.self_attn.v_proj": 8,
437
+ "thinker.model.layers.20.mlp.down_proj": 8,
438
+ "thinker.model.layers.20.mlp.gate_proj": 8,
439
+ "thinker.model.layers.20.mlp.up_proj": 8,
440
+ "thinker.model.layers.20.self_attn.k_proj": 8,
441
+ "thinker.model.layers.20.self_attn.o_proj": 8,
442
+ "thinker.model.layers.20.self_attn.q_proj": 8,
443
+ "thinker.model.layers.20.self_attn.v_proj": 8,
444
+ "thinker.model.layers.21.mlp.down_proj": 8,
445
+ "thinker.model.layers.21.mlp.gate_proj": 8,
446
+ "thinker.model.layers.21.mlp.up_proj": 8,
447
+ "thinker.model.layers.21.self_attn.k_proj": 8,
448
+ "thinker.model.layers.21.self_attn.o_proj": 8,
449
+ "thinker.model.layers.21.self_attn.q_proj": 8,
450
+ "thinker.model.layers.21.self_attn.v_proj": 8,
451
+ "thinker.model.layers.22.mlp.down_proj": 8,
452
+ "thinker.model.layers.22.mlp.gate_proj": 8,
453
+ "thinker.model.layers.22.mlp.up_proj": 8,
454
+ "thinker.model.layers.22.self_attn.k_proj": 8,
455
+ "thinker.model.layers.22.self_attn.o_proj": 8,
456
+ "thinker.model.layers.22.self_attn.q_proj": 8,
457
+ "thinker.model.layers.22.self_attn.v_proj": 8,
458
+ "thinker.model.layers.23.mlp.down_proj": 8,
459
+ "thinker.model.layers.23.mlp.gate_proj": 8,
460
+ "thinker.model.layers.23.mlp.up_proj": 8,
461
+ "thinker.model.layers.23.self_attn.k_proj": 8,
462
+ "thinker.model.layers.23.self_attn.o_proj": 8,
463
+ "thinker.model.layers.23.self_attn.q_proj": 8,
464
+ "thinker.model.layers.23.self_attn.v_proj": 8,
465
+ "thinker.model.layers.24.mlp.down_proj": 8,
466
+ "thinker.model.layers.24.mlp.gate_proj": 8,
467
+ "thinker.model.layers.24.mlp.up_proj": 8,
468
+ "thinker.model.layers.24.self_attn.k_proj": 8,
469
+ "thinker.model.layers.24.self_attn.o_proj": 8,
470
+ "thinker.model.layers.24.self_attn.q_proj": 8,
471
+ "thinker.model.layers.24.self_attn.v_proj": 8,
472
+ "thinker.model.layers.25.mlp.down_proj": 8,
473
+ "thinker.model.layers.25.mlp.gate_proj": 8,
474
+ "thinker.model.layers.25.mlp.up_proj": 8,
475
+ "thinker.model.layers.25.self_attn.k_proj": 8,
476
+ "thinker.model.layers.25.self_attn.o_proj": 8,
477
+ "thinker.model.layers.25.self_attn.q_proj": 8,
478
+ "thinker.model.layers.25.self_attn.v_proj": 8,
479
+ "thinker.model.layers.26.mlp.down_proj": 8,
480
+ "thinker.model.layers.26.mlp.gate_proj": 8,
481
+ "thinker.model.layers.26.mlp.up_proj": 8,
482
+ "thinker.model.layers.26.self_attn.k_proj": 8,
483
+ "thinker.model.layers.26.self_attn.o_proj": 8,
484
+ "thinker.model.layers.26.self_attn.q_proj": 8,
485
+ "thinker.model.layers.26.self_attn.v_proj": 8,
486
+ "thinker.model.layers.27.mlp.down_proj": 8,
487
+ "thinker.model.layers.27.mlp.gate_proj": 8,
488
+ "thinker.model.layers.27.mlp.up_proj": 8,
489
+ "thinker.model.layers.27.self_attn.k_proj": 8,
490
+ "thinker.model.layers.27.self_attn.o_proj": 8,
491
+ "thinker.model.layers.27.self_attn.q_proj": 8,
492
+ "thinker.model.layers.27.self_attn.v_proj": 8,
493
+ "thinker.model.layers.3.mlp.down_proj": 8,
494
+ "thinker.model.layers.3.mlp.gate_proj": 8,
495
+ "thinker.model.layers.3.mlp.up_proj": 8,
496
+ "thinker.model.layers.3.self_attn.k_proj": 8,
497
+ "thinker.model.layers.3.self_attn.o_proj": 8,
498
+ "thinker.model.layers.3.self_attn.q_proj": 8,
499
+ "thinker.model.layers.3.self_attn.v_proj": 8,
500
+ "thinker.model.layers.4.mlp.down_proj": 8,
501
+ "thinker.model.layers.4.mlp.gate_proj": 8,
502
+ "thinker.model.layers.4.mlp.up_proj": 8,
503
+ "thinker.model.layers.4.self_attn.k_proj": 8,
504
+ "thinker.model.layers.4.self_attn.o_proj": 8,
505
+ "thinker.model.layers.4.self_attn.q_proj": 8,
506
+ "thinker.model.layers.4.self_attn.v_proj": 8,
507
+ "thinker.model.layers.5.mlp.down_proj": 8,
508
+ "thinker.model.layers.5.mlp.gate_proj": 8,
509
+ "thinker.model.layers.5.mlp.up_proj": 8,
510
+ "thinker.model.layers.5.self_attn.k_proj": 8,
511
+ "thinker.model.layers.5.self_attn.o_proj": 8,
512
+ "thinker.model.layers.5.self_attn.q_proj": 8,
513
+ "thinker.model.layers.5.self_attn.v_proj": 8,
514
+ "thinker.model.layers.6.mlp.down_proj": 8,
515
+ "thinker.model.layers.6.mlp.gate_proj": 8,
516
+ "thinker.model.layers.6.mlp.up_proj": 8,
517
+ "thinker.model.layers.6.self_attn.k_proj": 8,
518
+ "thinker.model.layers.6.self_attn.o_proj": 8,
519
+ "thinker.model.layers.6.self_attn.q_proj": 8,
520
+ "thinker.model.layers.6.self_attn.v_proj": 8,
521
+ "thinker.model.layers.7.mlp.down_proj": 8,
522
+ "thinker.model.layers.7.mlp.gate_proj": 8,
523
+ "thinker.model.layers.7.mlp.up_proj": 8,
524
+ "thinker.model.layers.7.self_attn.k_proj": 8,
525
+ "thinker.model.layers.7.self_attn.o_proj": 8,
526
+ "thinker.model.layers.7.self_attn.q_proj": 8,
527
+ "thinker.model.layers.7.self_attn.v_proj": 8,
528
+ "thinker.model.layers.8.mlp.down_proj": 8,
529
+ "thinker.model.layers.8.mlp.gate_proj": 8,
530
+ "thinker.model.layers.8.mlp.up_proj": 8,
531
+ "thinker.model.layers.8.self_attn.k_proj": 8,
532
+ "thinker.model.layers.8.self_attn.o_proj": 8,
533
+ "thinker.model.layers.8.self_attn.q_proj": 8,
534
+ "thinker.model.layers.8.self_attn.v_proj": 8,
535
+ "thinker.model.layers.9.mlp.down_proj": 8,
536
+ "thinker.model.layers.9.mlp.gate_proj": 8,
537
+ "thinker.model.layers.9.mlp.up_proj": 8,
538
+ "thinker.model.layers.9.self_attn.k_proj": 8,
539
+ "thinker.model.layers.9.self_attn.o_proj": 8,
540
+ "thinker.model.layers.9.self_attn.q_proj": 8,
541
+ "thinker.model.layers.9.self_attn.v_proj": 8
542
+ },
543
+ "base_model_name_or_path": "",
544
+ "bias": "none",
545
+ "fan_in_fan_out": false,
546
+ "inference_mode": true,
547
+ "init_lora_weights": true,
548
+ "lora_alpha": 24,
549
+ "lora_bias": false,
550
+ "lora_dropout": 0.0,
551
+ "modules_to_save": null,
552
+ "peft_type": "LORA",
553
+ "r": 24,
554
+ "rank_pattern": {
555
+ "thinker.audio_tower.conv_out": 24,
556
+ "thinker.audio_tower.layers.0.fc1": 8,
557
+ "thinker.audio_tower.layers.0.fc2": 8,
558
+ "thinker.audio_tower.layers.0.self_attn.k_proj": 16,
559
+ "thinker.audio_tower.layers.0.self_attn.out_proj": 16,
560
+ "thinker.audio_tower.layers.0.self_attn.q_proj": 16,
561
+ "thinker.audio_tower.layers.0.self_attn.v_proj": 16,
562
+ "thinker.audio_tower.layers.1.fc1": 8,
563
+ "thinker.audio_tower.layers.1.fc2": 8,
564
+ "thinker.audio_tower.layers.1.self_attn.k_proj": 16,
565
+ "thinker.audio_tower.layers.1.self_attn.out_proj": 16,
566
+ "thinker.audio_tower.layers.1.self_attn.q_proj": 16,
567
+ "thinker.audio_tower.layers.1.self_attn.v_proj": 16,
568
+ "thinker.audio_tower.layers.10.fc1": 8,
569
+ "thinker.audio_tower.layers.10.fc2": 8,
570
+ "thinker.audio_tower.layers.10.self_attn.k_proj": 16,
571
+ "thinker.audio_tower.layers.10.self_attn.out_proj": 16,
572
+ "thinker.audio_tower.layers.10.self_attn.q_proj": 16,
573
+ "thinker.audio_tower.layers.10.self_attn.v_proj": 16,
574
+ "thinker.audio_tower.layers.11.fc1": 8,
575
+ "thinker.audio_tower.layers.11.fc2": 8,
576
+ "thinker.audio_tower.layers.11.self_attn.k_proj": 16,
577
+ "thinker.audio_tower.layers.11.self_attn.out_proj": 16,
578
+ "thinker.audio_tower.layers.11.self_attn.q_proj": 16,
579
+ "thinker.audio_tower.layers.11.self_attn.v_proj": 16,
580
+ "thinker.audio_tower.layers.12.fc1": 8,
581
+ "thinker.audio_tower.layers.12.fc2": 8,
582
+ "thinker.audio_tower.layers.12.self_attn.k_proj": 16,
583
+ "thinker.audio_tower.layers.12.self_attn.out_proj": 16,
584
+ "thinker.audio_tower.layers.12.self_attn.q_proj": 16,
585
+ "thinker.audio_tower.layers.12.self_attn.v_proj": 16,
586
+ "thinker.audio_tower.layers.13.fc1": 8,
587
+ "thinker.audio_tower.layers.13.fc2": 8,
588
+ "thinker.audio_tower.layers.13.self_attn.k_proj": 16,
589
+ "thinker.audio_tower.layers.13.self_attn.out_proj": 16,
590
+ "thinker.audio_tower.layers.13.self_attn.q_proj": 16,
591
+ "thinker.audio_tower.layers.13.self_attn.v_proj": 16,
592
+ "thinker.audio_tower.layers.14.fc1": 8,
593
+ "thinker.audio_tower.layers.14.fc2": 8,
594
+ "thinker.audio_tower.layers.14.self_attn.k_proj": 16,
595
+ "thinker.audio_tower.layers.14.self_attn.out_proj": 16,
596
+ "thinker.audio_tower.layers.14.self_attn.q_proj": 16,
597
+ "thinker.audio_tower.layers.14.self_attn.v_proj": 16,
598
+ "thinker.audio_tower.layers.15.fc1": 8,
599
+ "thinker.audio_tower.layers.15.fc2": 8,
600
+ "thinker.audio_tower.layers.15.self_attn.k_proj": 16,
601
+ "thinker.audio_tower.layers.15.self_attn.out_proj": 16,
602
+ "thinker.audio_tower.layers.15.self_attn.q_proj": 16,
603
+ "thinker.audio_tower.layers.15.self_attn.v_proj": 16,
604
+ "thinker.audio_tower.layers.16.fc1": 8,
605
+ "thinker.audio_tower.layers.16.fc2": 8,
606
+ "thinker.audio_tower.layers.16.self_attn.k_proj": 16,
607
+ "thinker.audio_tower.layers.16.self_attn.out_proj": 16,
608
+ "thinker.audio_tower.layers.16.self_attn.q_proj": 16,
609
+ "thinker.audio_tower.layers.16.self_attn.v_proj": 16,
610
+ "thinker.audio_tower.layers.17.fc1": 8,
611
+ "thinker.audio_tower.layers.17.fc2": 8,
612
+ "thinker.audio_tower.layers.17.self_attn.k_proj": 16,
613
+ "thinker.audio_tower.layers.17.self_attn.out_proj": 16,
614
+ "thinker.audio_tower.layers.17.self_attn.q_proj": 16,
615
+ "thinker.audio_tower.layers.17.self_attn.v_proj": 16,
616
+ "thinker.audio_tower.layers.18.fc1": 8,
617
+ "thinker.audio_tower.layers.18.fc2": 8,
618
+ "thinker.audio_tower.layers.18.self_attn.k_proj": 16,
619
+ "thinker.audio_tower.layers.18.self_attn.out_proj": 16,
620
+ "thinker.audio_tower.layers.18.self_attn.q_proj": 16,
621
+ "thinker.audio_tower.layers.18.self_attn.v_proj": 16,
622
+ "thinker.audio_tower.layers.19.fc1": 8,
623
+ "thinker.audio_tower.layers.19.fc2": 8,
624
+ "thinker.audio_tower.layers.19.self_attn.k_proj": 16,
625
+ "thinker.audio_tower.layers.19.self_attn.out_proj": 16,
626
+ "thinker.audio_tower.layers.19.self_attn.q_proj": 16,
627
+ "thinker.audio_tower.layers.19.self_attn.v_proj": 16,
628
+ "thinker.audio_tower.layers.2.fc1": 8,
629
+ "thinker.audio_tower.layers.2.fc2": 8,
630
+ "thinker.audio_tower.layers.2.self_attn.k_proj": 16,
631
+ "thinker.audio_tower.layers.2.self_attn.out_proj": 16,
632
+ "thinker.audio_tower.layers.2.self_attn.q_proj": 16,
633
+ "thinker.audio_tower.layers.2.self_attn.v_proj": 16,
634
+ "thinker.audio_tower.layers.20.fc1": 8,
635
+ "thinker.audio_tower.layers.20.fc2": 8,
636
+ "thinker.audio_tower.layers.20.self_attn.k_proj": 24,
637
+ "thinker.audio_tower.layers.20.self_attn.out_proj": 24,
638
+ "thinker.audio_tower.layers.20.self_attn.q_proj": 24,
639
+ "thinker.audio_tower.layers.20.self_attn.v_proj": 24,
640
+ "thinker.audio_tower.layers.21.fc1": 8,
641
+ "thinker.audio_tower.layers.21.fc2": 8,
642
+ "thinker.audio_tower.layers.21.self_attn.k_proj": 24,
643
+ "thinker.audio_tower.layers.21.self_attn.out_proj": 24,
644
+ "thinker.audio_tower.layers.21.self_attn.q_proj": 24,
645
+ "thinker.audio_tower.layers.21.self_attn.v_proj": 24,
646
+ "thinker.audio_tower.layers.22.fc1": 8,
647
+ "thinker.audio_tower.layers.22.fc2": 8,
648
+ "thinker.audio_tower.layers.22.self_attn.k_proj": 24,
649
+ "thinker.audio_tower.layers.22.self_attn.out_proj": 24,
650
+ "thinker.audio_tower.layers.22.self_attn.q_proj": 24,
651
+ "thinker.audio_tower.layers.22.self_attn.v_proj": 24,
652
+ "thinker.audio_tower.layers.23.fc1": 8,
653
+ "thinker.audio_tower.layers.23.fc2": 8,
654
+ "thinker.audio_tower.layers.23.self_attn.k_proj": 24,
655
+ "thinker.audio_tower.layers.23.self_attn.out_proj": 24,
656
+ "thinker.audio_tower.layers.23.self_attn.q_proj": 24,
657
+ "thinker.audio_tower.layers.23.self_attn.v_proj": 24,
658
+ "thinker.audio_tower.layers.3.fc1": 8,
659
+ "thinker.audio_tower.layers.3.fc2": 8,
660
+ "thinker.audio_tower.layers.3.self_attn.k_proj": 16,
661
+ "thinker.audio_tower.layers.3.self_attn.out_proj": 16,
662
+ "thinker.audio_tower.layers.3.self_attn.q_proj": 16,
663
+ "thinker.audio_tower.layers.3.self_attn.v_proj": 16,
664
+ "thinker.audio_tower.layers.4.fc1": 8,
665
+ "thinker.audio_tower.layers.4.fc2": 8,
666
+ "thinker.audio_tower.layers.4.self_attn.k_proj": 16,
667
+ "thinker.audio_tower.layers.4.self_attn.out_proj": 16,
668
+ "thinker.audio_tower.layers.4.self_attn.q_proj": 16,
669
+ "thinker.audio_tower.layers.4.self_attn.v_proj": 16,
670
+ "thinker.audio_tower.layers.5.fc1": 8,
671
+ "thinker.audio_tower.layers.5.fc2": 8,
672
+ "thinker.audio_tower.layers.5.self_attn.k_proj": 16,
673
+ "thinker.audio_tower.layers.5.self_attn.out_proj": 16,
674
+ "thinker.audio_tower.layers.5.self_attn.q_proj": 16,
675
+ "thinker.audio_tower.layers.5.self_attn.v_proj": 16,
676
+ "thinker.audio_tower.layers.6.fc1": 8,
677
+ "thinker.audio_tower.layers.6.fc2": 8,
678
+ "thinker.audio_tower.layers.6.self_attn.k_proj": 16,
679
+ "thinker.audio_tower.layers.6.self_attn.out_proj": 16,
680
+ "thinker.audio_tower.layers.6.self_attn.q_proj": 16,
681
+ "thinker.audio_tower.layers.6.self_attn.v_proj": 16,
682
+ "thinker.audio_tower.layers.7.fc1": 8,
683
+ "thinker.audio_tower.layers.7.fc2": 8,
684
+ "thinker.audio_tower.layers.7.self_attn.k_proj": 16,
685
+ "thinker.audio_tower.layers.7.self_attn.out_proj": 16,
686
+ "thinker.audio_tower.layers.7.self_attn.q_proj": 16,
687
+ "thinker.audio_tower.layers.7.self_attn.v_proj": 16,
688
+ "thinker.audio_tower.layers.8.fc1": 8,
689
+ "thinker.audio_tower.layers.8.fc2": 8,
690
+ "thinker.audio_tower.layers.8.self_attn.k_proj": 16,
691
+ "thinker.audio_tower.layers.8.self_attn.out_proj": 16,
692
+ "thinker.audio_tower.layers.8.self_attn.q_proj": 16,
693
+ "thinker.audio_tower.layers.8.self_attn.v_proj": 16,
694
+ "thinker.audio_tower.layers.9.fc1": 8,
695
+ "thinker.audio_tower.layers.9.fc2": 8,
696
+ "thinker.audio_tower.layers.9.self_attn.k_proj": 16,
697
+ "thinker.audio_tower.layers.9.self_attn.out_proj": 16,
698
+ "thinker.audio_tower.layers.9.self_attn.q_proj": 16,
699
+ "thinker.audio_tower.layers.9.self_attn.v_proj": 16,
700
+ "thinker.audio_tower.proj1": 16,
701
+ "thinker.audio_tower.proj2": 16,
702
+ "thinker.layers.0.mlp.down_proj": 8,
703
+ "thinker.layers.0.mlp.gate_proj": 8,
704
+ "thinker.layers.0.mlp.up_proj": 8,
705
+ "thinker.layers.0.self_attn.k_proj": 8,
706
+ "thinker.layers.0.self_attn.o_proj": 8,
707
+ "thinker.layers.0.self_attn.q_proj": 8,
708
+ "thinker.layers.0.self_attn.v_proj": 8,
709
+ "thinker.layers.1.mlp.down_proj": 8,
710
+ "thinker.layers.1.mlp.gate_proj": 8,
711
+ "thinker.layers.1.mlp.up_proj": 8,
712
+ "thinker.layers.1.self_attn.k_proj": 8,
713
+ "thinker.layers.1.self_attn.o_proj": 8,
714
+ "thinker.layers.1.self_attn.q_proj": 8,
715
+ "thinker.layers.1.self_attn.v_proj": 8,
716
+ "thinker.layers.10.mlp.down_proj": 8,
717
+ "thinker.layers.10.mlp.gate_proj": 8,
718
+ "thinker.layers.10.mlp.up_proj": 8,
719
+ "thinker.layers.10.self_attn.k_proj": 8,
720
+ "thinker.layers.10.self_attn.o_proj": 8,
721
+ "thinker.layers.10.self_attn.q_proj": 8,
722
+ "thinker.layers.10.self_attn.v_proj": 8,
723
+ "thinker.layers.11.mlp.down_proj": 8,
724
+ "thinker.layers.11.mlp.gate_proj": 8,
725
+ "thinker.layers.11.mlp.up_proj": 8,
726
+ "thinker.layers.11.self_attn.k_proj": 8,
727
+ "thinker.layers.11.self_attn.o_proj": 8,
728
+ "thinker.layers.11.self_attn.q_proj": 8,
729
+ "thinker.layers.11.self_attn.v_proj": 8,
730
+ "thinker.layers.12.mlp.down_proj": 8,
731
+ "thinker.layers.12.mlp.gate_proj": 8,
732
+ "thinker.layers.12.mlp.up_proj": 8,
733
+ "thinker.layers.12.self_attn.k_proj": 8,
734
+ "thinker.layers.12.self_attn.o_proj": 8,
735
+ "thinker.layers.12.self_attn.q_proj": 8,
736
+ "thinker.layers.12.self_attn.v_proj": 8,
737
+ "thinker.layers.13.mlp.down_proj": 8,
738
+ "thinker.layers.13.mlp.gate_proj": 8,
739
+ "thinker.layers.13.mlp.up_proj": 8,
740
+ "thinker.layers.13.self_attn.k_proj": 8,
741
+ "thinker.layers.13.self_attn.o_proj": 8,
742
+ "thinker.layers.13.self_attn.q_proj": 8,
743
+ "thinker.layers.13.self_attn.v_proj": 8,
744
+ "thinker.layers.14.mlp.down_proj": 8,
745
+ "thinker.layers.14.mlp.gate_proj": 8,
746
+ "thinker.layers.14.mlp.up_proj": 8,
747
+ "thinker.layers.14.self_attn.k_proj": 8,
748
+ "thinker.layers.14.self_attn.o_proj": 8,
749
+ "thinker.layers.14.self_attn.q_proj": 8,
750
+ "thinker.layers.14.self_attn.v_proj": 8,
751
+ "thinker.layers.15.mlp.down_proj": 8,
752
+ "thinker.layers.15.mlp.gate_proj": 8,
753
+ "thinker.layers.15.mlp.up_proj": 8,
754
+ "thinker.layers.15.self_attn.k_proj": 8,
755
+ "thinker.layers.15.self_attn.o_proj": 8,
756
+ "thinker.layers.15.self_attn.q_proj": 8,
757
+ "thinker.layers.15.self_attn.v_proj": 8,
758
+ "thinker.layers.16.mlp.down_proj": 8,
759
+ "thinker.layers.16.mlp.gate_proj": 8,
760
+ "thinker.layers.16.mlp.up_proj": 8,
761
+ "thinker.layers.16.self_attn.k_proj": 8,
762
+ "thinker.layers.16.self_attn.o_proj": 8,
763
+ "thinker.layers.16.self_attn.q_proj": 8,
764
+ "thinker.layers.16.self_attn.v_proj": 8,
765
+ "thinker.layers.17.mlp.down_proj": 8,
766
+ "thinker.layers.17.mlp.gate_proj": 8,
767
+ "thinker.layers.17.mlp.up_proj": 8,
768
+ "thinker.layers.17.self_attn.k_proj": 8,
769
+ "thinker.layers.17.self_attn.o_proj": 8,
770
+ "thinker.layers.17.self_attn.q_proj": 8,
771
+ "thinker.layers.17.self_attn.v_proj": 8,
772
+ "thinker.layers.18.mlp.down_proj": 8,
773
+ "thinker.layers.18.mlp.gate_proj": 8,
774
+ "thinker.layers.18.mlp.up_proj": 8,
775
+ "thinker.layers.18.self_attn.k_proj": 8,
776
+ "thinker.layers.18.self_attn.o_proj": 8,
777
+ "thinker.layers.18.self_attn.q_proj": 8,
778
+ "thinker.layers.18.self_attn.v_proj": 8,
779
+ "thinker.layers.19.mlp.down_proj": 8,
780
+ "thinker.layers.19.mlp.gate_proj": 8,
781
+ "thinker.layers.19.mlp.up_proj": 8,
782
+ "thinker.layers.19.self_attn.k_proj": 8,
783
+ "thinker.layers.19.self_attn.o_proj": 8,
784
+ "thinker.layers.19.self_attn.q_proj": 8,
785
+ "thinker.layers.19.self_attn.v_proj": 8,
786
+ "thinker.layers.2.mlp.down_proj": 8,
787
+ "thinker.layers.2.mlp.gate_proj": 8,
788
+ "thinker.layers.2.mlp.up_proj": 8,
789
+ "thinker.layers.2.self_attn.k_proj": 8,
790
+ "thinker.layers.2.self_attn.o_proj": 8,
791
+ "thinker.layers.2.self_attn.q_proj": 8,
792
+ "thinker.layers.2.self_attn.v_proj": 8,
793
+ "thinker.layers.20.mlp.down_proj": 8,
794
+ "thinker.layers.20.mlp.gate_proj": 8,
795
+ "thinker.layers.20.mlp.up_proj": 8,
796
+ "thinker.layers.20.self_attn.k_proj": 8,
797
+ "thinker.layers.20.self_attn.o_proj": 8,
798
+ "thinker.layers.20.self_attn.q_proj": 8,
799
+ "thinker.layers.20.self_attn.v_proj": 8,
800
+ "thinker.layers.21.mlp.down_proj": 8,
801
+ "thinker.layers.21.mlp.gate_proj": 8,
802
+ "thinker.layers.21.mlp.up_proj": 8,
803
+ "thinker.layers.21.self_attn.k_proj": 8,
804
+ "thinker.layers.21.self_attn.o_proj": 8,
805
+ "thinker.layers.21.self_attn.q_proj": 8,
806
+ "thinker.layers.21.self_attn.v_proj": 8,
807
+ "thinker.layers.22.mlp.down_proj": 8,
808
+ "thinker.layers.22.mlp.gate_proj": 8,
809
+ "thinker.layers.22.mlp.up_proj": 8,
810
+ "thinker.layers.22.self_attn.k_proj": 8,
811
+ "thinker.layers.22.self_attn.o_proj": 8,
812
+ "thinker.layers.22.self_attn.q_proj": 8,
813
+ "thinker.layers.22.self_attn.v_proj": 8,
814
+ "thinker.layers.23.mlp.down_proj": 8,
815
+ "thinker.layers.23.mlp.gate_proj": 8,
816
+ "thinker.layers.23.mlp.up_proj": 8,
817
+ "thinker.layers.23.self_attn.k_proj": 8,
818
+ "thinker.layers.23.self_attn.o_proj": 8,
819
+ "thinker.layers.23.self_attn.q_proj": 8,
820
+ "thinker.layers.23.self_attn.v_proj": 8,
821
+ "thinker.layers.24.mlp.down_proj": 8,
822
+ "thinker.layers.24.mlp.gate_proj": 8,
823
+ "thinker.layers.24.mlp.up_proj": 8,
824
+ "thinker.layers.24.self_attn.k_proj": 8,
825
+ "thinker.layers.24.self_attn.o_proj": 8,
826
+ "thinker.layers.24.self_attn.q_proj": 8,
827
+ "thinker.layers.24.self_attn.v_proj": 8,
828
+ "thinker.layers.25.mlp.down_proj": 8,
829
+ "thinker.layers.25.mlp.gate_proj": 8,
830
+ "thinker.layers.25.mlp.up_proj": 8,
831
+ "thinker.layers.25.self_attn.k_proj": 8,
832
+ "thinker.layers.25.self_attn.o_proj": 8,
833
+ "thinker.layers.25.self_attn.q_proj": 8,
834
+ "thinker.layers.25.self_attn.v_proj": 8,
835
+ "thinker.layers.26.mlp.down_proj": 8,
836
+ "thinker.layers.26.mlp.gate_proj": 8,
837
+ "thinker.layers.26.mlp.up_proj": 8,
838
+ "thinker.layers.26.self_attn.k_proj": 8,
839
+ "thinker.layers.26.self_attn.o_proj": 8,
840
+ "thinker.layers.26.self_attn.q_proj": 8,
841
+ "thinker.layers.26.self_attn.v_proj": 8,
842
+ "thinker.layers.27.mlp.down_proj": 8,
843
+ "thinker.layers.27.mlp.gate_proj": 8,
844
+ "thinker.layers.27.mlp.up_proj": 8,
845
+ "thinker.layers.27.self_attn.k_proj": 8,
846
+ "thinker.layers.27.self_attn.o_proj": 8,
847
+ "thinker.layers.27.self_attn.q_proj": 8,
848
+ "thinker.layers.27.self_attn.v_proj": 8,
849
+ "thinker.layers.3.mlp.down_proj": 8,
850
+ "thinker.layers.3.mlp.gate_proj": 8,
851
+ "thinker.layers.3.mlp.up_proj": 8,
852
+ "thinker.layers.3.self_attn.k_proj": 8,
853
+ "thinker.layers.3.self_attn.o_proj": 8,
854
+ "thinker.layers.3.self_attn.q_proj": 8,
855
+ "thinker.layers.3.self_attn.v_proj": 8,
856
+ "thinker.layers.4.mlp.down_proj": 8,
857
+ "thinker.layers.4.mlp.gate_proj": 8,
858
+ "thinker.layers.4.mlp.up_proj": 8,
859
+ "thinker.layers.4.self_attn.k_proj": 8,
860
+ "thinker.layers.4.self_attn.o_proj": 8,
861
+ "thinker.layers.4.self_attn.q_proj": 8,
862
+ "thinker.layers.4.self_attn.v_proj": 8,
863
+ "thinker.layers.5.mlp.down_proj": 8,
864
+ "thinker.layers.5.mlp.gate_proj": 8,
865
+ "thinker.layers.5.mlp.up_proj": 8,
866
+ "thinker.layers.5.self_attn.k_proj": 8,
867
+ "thinker.layers.5.self_attn.o_proj": 8,
868
+ "thinker.layers.5.self_attn.q_proj": 8,
869
+ "thinker.layers.5.self_attn.v_proj": 8,
870
+ "thinker.layers.6.mlp.down_proj": 8,
871
+ "thinker.layers.6.mlp.gate_proj": 8,
872
+ "thinker.layers.6.mlp.up_proj": 8,
873
+ "thinker.layers.6.self_attn.k_proj": 8,
874
+ "thinker.layers.6.self_attn.o_proj": 8,
875
+ "thinker.layers.6.self_attn.q_proj": 8,
876
+ "thinker.layers.6.self_attn.v_proj": 8,
877
+ "thinker.layers.7.mlp.down_proj": 8,
878
+ "thinker.layers.7.mlp.gate_proj": 8,
879
+ "thinker.layers.7.mlp.up_proj": 8,
880
+ "thinker.layers.7.self_attn.k_proj": 8,
881
+ "thinker.layers.7.self_attn.o_proj": 8,
882
+ "thinker.layers.7.self_attn.q_proj": 8,
883
+ "thinker.layers.7.self_attn.v_proj": 8,
884
+ "thinker.layers.8.mlp.down_proj": 8,
885
+ "thinker.layers.8.mlp.gate_proj": 8,
886
+ "thinker.layers.8.mlp.up_proj": 8,
887
+ "thinker.layers.8.self_attn.k_proj": 8,
888
+ "thinker.layers.8.self_attn.o_proj": 8,
889
+ "thinker.layers.8.self_attn.q_proj": 8,
890
+ "thinker.layers.8.self_attn.v_proj": 8,
891
+ "thinker.layers.9.mlp.down_proj": 8,
892
+ "thinker.layers.9.mlp.gate_proj": 8,
893
+ "thinker.layers.9.mlp.up_proj": 8,
894
+ "thinker.layers.9.self_attn.k_proj": 8,
895
+ "thinker.layers.9.self_attn.o_proj": 8,
896
+ "thinker.layers.9.self_attn.q_proj": 8,
897
+ "thinker.layers.9.self_attn.v_proj": 8,
898
+ "thinker.model.layers.0.mlp.down_proj": 8,
899
+ "thinker.model.layers.0.mlp.gate_proj": 8,
900
+ "thinker.model.layers.0.mlp.up_proj": 8,
901
+ "thinker.model.layers.0.self_attn.k_proj": 8,
902
+ "thinker.model.layers.0.self_attn.o_proj": 8,
903
+ "thinker.model.layers.0.self_attn.q_proj": 8,
904
+ "thinker.model.layers.0.self_attn.v_proj": 8,
905
+ "thinker.model.layers.1.mlp.down_proj": 8,
906
+ "thinker.model.layers.1.mlp.gate_proj": 8,
907
+ "thinker.model.layers.1.mlp.up_proj": 8,
908
+ "thinker.model.layers.1.self_attn.k_proj": 8,
909
+ "thinker.model.layers.1.self_attn.o_proj": 8,
910
+ "thinker.model.layers.1.self_attn.q_proj": 8,
911
+ "thinker.model.layers.1.self_attn.v_proj": 8,
912
+ "thinker.model.layers.10.mlp.down_proj": 8,
913
+ "thinker.model.layers.10.mlp.gate_proj": 8,
914
+ "thinker.model.layers.10.mlp.up_proj": 8,
915
+ "thinker.model.layers.10.self_attn.k_proj": 8,
916
+ "thinker.model.layers.10.self_attn.o_proj": 8,
917
+ "thinker.model.layers.10.self_attn.q_proj": 8,
918
+ "thinker.model.layers.10.self_attn.v_proj": 8,
919
+ "thinker.model.layers.11.mlp.down_proj": 8,
920
+ "thinker.model.layers.11.mlp.gate_proj": 8,
921
+ "thinker.model.layers.11.mlp.up_proj": 8,
922
+ "thinker.model.layers.11.self_attn.k_proj": 8,
923
+ "thinker.model.layers.11.self_attn.o_proj": 8,
924
+ "thinker.model.layers.11.self_attn.q_proj": 8,
925
+ "thinker.model.layers.11.self_attn.v_proj": 8,
926
+ "thinker.model.layers.12.mlp.down_proj": 8,
927
+ "thinker.model.layers.12.mlp.gate_proj": 8,
928
+ "thinker.model.layers.12.mlp.up_proj": 8,
929
+ "thinker.model.layers.12.self_attn.k_proj": 8,
930
+ "thinker.model.layers.12.self_attn.o_proj": 8,
931
+ "thinker.model.layers.12.self_attn.q_proj": 8,
932
+ "thinker.model.layers.12.self_attn.v_proj": 8,
933
+ "thinker.model.layers.13.mlp.down_proj": 8,
934
+ "thinker.model.layers.13.mlp.gate_proj": 8,
935
+ "thinker.model.layers.13.mlp.up_proj": 8,
936
+ "thinker.model.layers.13.self_attn.k_proj": 8,
937
+ "thinker.model.layers.13.self_attn.o_proj": 8,
938
+ "thinker.model.layers.13.self_attn.q_proj": 8,
939
+ "thinker.model.layers.13.self_attn.v_proj": 8,
940
+ "thinker.model.layers.14.mlp.down_proj": 8,
941
+ "thinker.model.layers.14.mlp.gate_proj": 8,
942
+ "thinker.model.layers.14.mlp.up_proj": 8,
943
+ "thinker.model.layers.14.self_attn.k_proj": 8,
944
+ "thinker.model.layers.14.self_attn.o_proj": 8,
945
+ "thinker.model.layers.14.self_attn.q_proj": 8,
946
+ "thinker.model.layers.14.self_attn.v_proj": 8,
947
+ "thinker.model.layers.15.mlp.down_proj": 8,
948
+ "thinker.model.layers.15.mlp.gate_proj": 8,
949
+ "thinker.model.layers.15.mlp.up_proj": 8,
950
+ "thinker.model.layers.15.self_attn.k_proj": 8,
951
+ "thinker.model.layers.15.self_attn.o_proj": 8,
952
+ "thinker.model.layers.15.self_attn.q_proj": 8,
953
+ "thinker.model.layers.15.self_attn.v_proj": 8,
954
+ "thinker.model.layers.16.mlp.down_proj": 8,
955
+ "thinker.model.layers.16.mlp.gate_proj": 8,
956
+ "thinker.model.layers.16.mlp.up_proj": 8,
957
+ "thinker.model.layers.16.self_attn.k_proj": 8,
958
+ "thinker.model.layers.16.self_attn.o_proj": 8,
959
+ "thinker.model.layers.16.self_attn.q_proj": 8,
960
+ "thinker.model.layers.16.self_attn.v_proj": 8,
961
+ "thinker.model.layers.17.mlp.down_proj": 8,
962
+ "thinker.model.layers.17.mlp.gate_proj": 8,
963
+ "thinker.model.layers.17.mlp.up_proj": 8,
964
+ "thinker.model.layers.17.self_attn.k_proj": 8,
965
+ "thinker.model.layers.17.self_attn.o_proj": 8,
966
+ "thinker.model.layers.17.self_attn.q_proj": 8,
967
+ "thinker.model.layers.17.self_attn.v_proj": 8,
968
+ "thinker.model.layers.18.mlp.down_proj": 8,
969
+ "thinker.model.layers.18.mlp.gate_proj": 8,
970
+ "thinker.model.layers.18.mlp.up_proj": 8,
971
+ "thinker.model.layers.18.self_attn.k_proj": 8,
972
+ "thinker.model.layers.18.self_attn.o_proj": 8,
973
+ "thinker.model.layers.18.self_attn.q_proj": 8,
974
+ "thinker.model.layers.18.self_attn.v_proj": 8,
975
+ "thinker.model.layers.19.mlp.down_proj": 8,
976
+ "thinker.model.layers.19.mlp.gate_proj": 8,
977
+ "thinker.model.layers.19.mlp.up_proj": 8,
978
+ "thinker.model.layers.19.self_attn.k_proj": 8,
979
+ "thinker.model.layers.19.self_attn.o_proj": 8,
980
+ "thinker.model.layers.19.self_attn.q_proj": 8,
981
+ "thinker.model.layers.19.self_attn.v_proj": 8,
982
+ "thinker.model.layers.2.mlp.down_proj": 8,
983
+ "thinker.model.layers.2.mlp.gate_proj": 8,
984
+ "thinker.model.layers.2.mlp.up_proj": 8,
985
+ "thinker.model.layers.2.self_attn.k_proj": 8,
986
+ "thinker.model.layers.2.self_attn.o_proj": 8,
987
+ "thinker.model.layers.2.self_attn.q_proj": 8,
988
+ "thinker.model.layers.2.self_attn.v_proj": 8,
989
+ "thinker.model.layers.20.mlp.down_proj": 8,
990
+ "thinker.model.layers.20.mlp.gate_proj": 8,
991
+ "thinker.model.layers.20.mlp.up_proj": 8,
992
+ "thinker.model.layers.20.self_attn.k_proj": 8,
993
+ "thinker.model.layers.20.self_attn.o_proj": 8,
994
+ "thinker.model.layers.20.self_attn.q_proj": 8,
995
+ "thinker.model.layers.20.self_attn.v_proj": 8,
996
+ "thinker.model.layers.21.mlp.down_proj": 8,
997
+ "thinker.model.layers.21.mlp.gate_proj": 8,
998
+ "thinker.model.layers.21.mlp.up_proj": 8,
999
+ "thinker.model.layers.21.self_attn.k_proj": 8,
1000
+ "thinker.model.layers.21.self_attn.o_proj": 8,
1001
+ "thinker.model.layers.21.self_attn.q_proj": 8,
1002
+ "thinker.model.layers.21.self_attn.v_proj": 8,
1003
+ "thinker.model.layers.22.mlp.down_proj": 8,
1004
+ "thinker.model.layers.22.mlp.gate_proj": 8,
1005
+ "thinker.model.layers.22.mlp.up_proj": 8,
1006
+ "thinker.model.layers.22.self_attn.k_proj": 8,
1007
+ "thinker.model.layers.22.self_attn.o_proj": 8,
1008
+ "thinker.model.layers.22.self_attn.q_proj": 8,
1009
+ "thinker.model.layers.22.self_attn.v_proj": 8,
1010
+ "thinker.model.layers.23.mlp.down_proj": 8,
1011
+ "thinker.model.layers.23.mlp.gate_proj": 8,
1012
+ "thinker.model.layers.23.mlp.up_proj": 8,
1013
+ "thinker.model.layers.23.self_attn.k_proj": 8,
1014
+ "thinker.model.layers.23.self_attn.o_proj": 8,
1015
+ "thinker.model.layers.23.self_attn.q_proj": 8,
1016
+ "thinker.model.layers.23.self_attn.v_proj": 8,
1017
+ "thinker.model.layers.24.mlp.down_proj": 8,
1018
+ "thinker.model.layers.24.mlp.gate_proj": 8,
1019
+ "thinker.model.layers.24.mlp.up_proj": 8,
1020
+ "thinker.model.layers.24.self_attn.k_proj": 8,
1021
+ "thinker.model.layers.24.self_attn.o_proj": 8,
1022
+ "thinker.model.layers.24.self_attn.q_proj": 8,
1023
+ "thinker.model.layers.24.self_attn.v_proj": 8,
1024
+ "thinker.model.layers.25.mlp.down_proj": 8,
1025
+ "thinker.model.layers.25.mlp.gate_proj": 8,
1026
+ "thinker.model.layers.25.mlp.up_proj": 8,
1027
+ "thinker.model.layers.25.self_attn.k_proj": 8,
1028
+ "thinker.model.layers.25.self_attn.o_proj": 8,
1029
+ "thinker.model.layers.25.self_attn.q_proj": 8,
1030
+ "thinker.model.layers.25.self_attn.v_proj": 8,
1031
+ "thinker.model.layers.26.mlp.down_proj": 8,
1032
+ "thinker.model.layers.26.mlp.gate_proj": 8,
1033
+ "thinker.model.layers.26.mlp.up_proj": 8,
1034
+ "thinker.model.layers.26.self_attn.k_proj": 8,
1035
+ "thinker.model.layers.26.self_attn.o_proj": 8,
1036
+ "thinker.model.layers.26.self_attn.q_proj": 8,
1037
+ "thinker.model.layers.26.self_attn.v_proj": 8,
1038
+ "thinker.model.layers.27.mlp.down_proj": 8,
1039
+ "thinker.model.layers.27.mlp.gate_proj": 8,
1040
+ "thinker.model.layers.27.mlp.up_proj": 8,
1041
+ "thinker.model.layers.27.self_attn.k_proj": 8,
1042
+ "thinker.model.layers.27.self_attn.o_proj": 8,
1043
+ "thinker.model.layers.27.self_attn.q_proj": 8,
1044
+ "thinker.model.layers.27.self_attn.v_proj": 8,
1045
+ "thinker.model.layers.3.mlp.down_proj": 8,
1046
+ "thinker.model.layers.3.mlp.gate_proj": 8,
1047
+ "thinker.model.layers.3.mlp.up_proj": 8,
1048
+ "thinker.model.layers.3.self_attn.k_proj": 8,
1049
+ "thinker.model.layers.3.self_attn.o_proj": 8,
1050
+ "thinker.model.layers.3.self_attn.q_proj": 8,
1051
+ "thinker.model.layers.3.self_attn.v_proj": 8,
1052
+ "thinker.model.layers.4.mlp.down_proj": 8,
1053
+ "thinker.model.layers.4.mlp.gate_proj": 8,
1054
+ "thinker.model.layers.4.mlp.up_proj": 8,
1055
+ "thinker.model.layers.4.self_attn.k_proj": 8,
1056
+ "thinker.model.layers.4.self_attn.o_proj": 8,
1057
+ "thinker.model.layers.4.self_attn.q_proj": 8,
1058
+ "thinker.model.layers.4.self_attn.v_proj": 8,
1059
+ "thinker.model.layers.5.mlp.down_proj": 8,
1060
+ "thinker.model.layers.5.mlp.gate_proj": 8,
1061
+ "thinker.model.layers.5.mlp.up_proj": 8,
1062
+ "thinker.model.layers.5.self_attn.k_proj": 8,
1063
+ "thinker.model.layers.5.self_attn.o_proj": 8,
1064
+ "thinker.model.layers.5.self_attn.q_proj": 8,
1065
+ "thinker.model.layers.5.self_attn.v_proj": 8,
1066
+ "thinker.model.layers.6.mlp.down_proj": 8,
1067
+ "thinker.model.layers.6.mlp.gate_proj": 8,
1068
+ "thinker.model.layers.6.mlp.up_proj": 8,
1069
+ "thinker.model.layers.6.self_attn.k_proj": 8,
1070
+ "thinker.model.layers.6.self_attn.o_proj": 8,
1071
+ "thinker.model.layers.6.self_attn.q_proj": 8,
1072
+ "thinker.model.layers.6.self_attn.v_proj": 8,
1073
+ "thinker.model.layers.7.mlp.down_proj": 8,
1074
+ "thinker.model.layers.7.mlp.gate_proj": 8,
1075
+ "thinker.model.layers.7.mlp.up_proj": 8,
1076
+ "thinker.model.layers.7.self_attn.k_proj": 8,
1077
+ "thinker.model.layers.7.self_attn.o_proj": 8,
1078
+ "thinker.model.layers.7.self_attn.q_proj": 8,
1079
+ "thinker.model.layers.7.self_attn.v_proj": 8,
1080
+ "thinker.model.layers.8.mlp.down_proj": 8,
1081
+ "thinker.model.layers.8.mlp.gate_proj": 8,
1082
+ "thinker.model.layers.8.mlp.up_proj": 8,
1083
+ "thinker.model.layers.8.self_attn.k_proj": 8,
1084
+ "thinker.model.layers.8.self_attn.o_proj": 8,
1085
+ "thinker.model.layers.8.self_attn.q_proj": 8,
1086
+ "thinker.model.layers.8.self_attn.v_proj": 8,
1087
+ "thinker.model.layers.9.mlp.down_proj": 8,
1088
+ "thinker.model.layers.9.mlp.gate_proj": 8,
1089
+ "thinker.model.layers.9.mlp.up_proj": 8,
1090
+ "thinker.model.layers.9.self_attn.k_proj": 8,
1091
+ "thinker.model.layers.9.self_attn.o_proj": 8,
1092
+ "thinker.model.layers.9.self_attn.q_proj": 8,
1093
+ "thinker.model.layers.9.self_attn.v_proj": 8
1094
+ },
1095
+ "target_modules": ".*",
1096
+ "task_type": "CAUSAL_LM",
1097
+ "use_dora": false,
1098
+ "use_rslora": false
1099
+ }
mega-asr-merged/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7fd88dc65e59d502689323b5a296d82f3d8225a38232773963a829e387d44a31
3
+ size 92538896
mega-asr-merged/mega_lora_blocks.json ADDED
@@ -0,0 +1,5010 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "thinker.audio_tower.conv_out": [
3
+ {
4
+ "alpha": 8,
5
+ "end": 8,
6
+ "rank": 8,
7
+ "start": 0
8
+ },
9
+ {
10
+ "alpha": 8,
11
+ "end": 16,
12
+ "rank": 8,
13
+ "start": 8
14
+ },
15
+ {
16
+ "alpha": 8,
17
+ "end": 24,
18
+ "rank": 8,
19
+ "start": 16
20
+ }
21
+ ],
22
+ "thinker.audio_tower.layers.0.fc1": [
23
+ {
24
+ "alpha": 8,
25
+ "end": 8,
26
+ "rank": 8,
27
+ "start": 0
28
+ }
29
+ ],
30
+ "thinker.audio_tower.layers.0.fc2": [
31
+ {
32
+ "alpha": 8,
33
+ "end": 8,
34
+ "rank": 8,
35
+ "start": 0
36
+ }
37
+ ],
38
+ "thinker.audio_tower.layers.0.self_attn.k_proj": [
39
+ {
40
+ "alpha": 8,
41
+ "end": 8,
42
+ "rank": 8,
43
+ "start": 0
44
+ },
45
+ {
46
+ "alpha": 8,
47
+ "end": 16,
48
+ "rank": 8,
49
+ "start": 8
50
+ }
51
+ ],
52
+ "thinker.audio_tower.layers.0.self_attn.out_proj": [
53
+ {
54
+ "alpha": 8,
55
+ "end": 8,
56
+ "rank": 8,
57
+ "start": 0
58
+ },
59
+ {
60
+ "alpha": 8,
61
+ "end": 16,
62
+ "rank": 8,
63
+ "start": 8
64
+ }
65
+ ],
66
+ "thinker.audio_tower.layers.0.self_attn.q_proj": [
67
+ {
68
+ "alpha": 8,
69
+ "end": 8,
70
+ "rank": 8,
71
+ "start": 0
72
+ },
73
+ {
74
+ "alpha": 8,
75
+ "end": 16,
76
+ "rank": 8,
77
+ "start": 8
78
+ }
79
+ ],
80
+ "thinker.audio_tower.layers.0.self_attn.v_proj": [
81
+ {
82
+ "alpha": 8,
83
+ "end": 8,
84
+ "rank": 8,
85
+ "start": 0
86
+ },
87
+ {
88
+ "alpha": 8,
89
+ "end": 16,
90
+ "rank": 8,
91
+ "start": 8
92
+ }
93
+ ],
94
+ "thinker.audio_tower.layers.1.fc1": [
95
+ {
96
+ "alpha": 8,
97
+ "end": 8,
98
+ "rank": 8,
99
+ "start": 0
100
+ }
101
+ ],
102
+ "thinker.audio_tower.layers.1.fc2": [
103
+ {
104
+ "alpha": 8,
105
+ "end": 8,
106
+ "rank": 8,
107
+ "start": 0
108
+ }
109
+ ],
110
+ "thinker.audio_tower.layers.1.self_attn.k_proj": [
111
+ {
112
+ "alpha": 8,
113
+ "end": 8,
114
+ "rank": 8,
115
+ "start": 0
116
+ },
117
+ {
118
+ "alpha": 8,
119
+ "end": 16,
120
+ "rank": 8,
121
+ "start": 8
122
+ }
123
+ ],
124
+ "thinker.audio_tower.layers.1.self_attn.out_proj": [
125
+ {
126
+ "alpha": 8,
127
+ "end": 8,
128
+ "rank": 8,
129
+ "start": 0
130
+ },
131
+ {
132
+ "alpha": 8,
133
+ "end": 16,
134
+ "rank": 8,
135
+ "start": 8
136
+ }
137
+ ],
138
+ "thinker.audio_tower.layers.1.self_attn.q_proj": [
139
+ {
140
+ "alpha": 8,
141
+ "end": 8,
142
+ "rank": 8,
143
+ "start": 0
144
+ },
145
+ {
146
+ "alpha": 8,
147
+ "end": 16,
148
+ "rank": 8,
149
+ "start": 8
150
+ }
151
+ ],
152
+ "thinker.audio_tower.layers.1.self_attn.v_proj": [
153
+ {
154
+ "alpha": 8,
155
+ "end": 8,
156
+ "rank": 8,
157
+ "start": 0
158
+ },
159
+ {
160
+ "alpha": 8,
161
+ "end": 16,
162
+ "rank": 8,
163
+ "start": 8
164
+ }
165
+ ],
166
+ "thinker.audio_tower.layers.10.fc1": [
167
+ {
168
+ "alpha": 8,
169
+ "end": 8,
170
+ "rank": 8,
171
+ "start": 0
172
+ }
173
+ ],
174
+ "thinker.audio_tower.layers.10.fc2": [
175
+ {
176
+ "alpha": 8,
177
+ "end": 8,
178
+ "rank": 8,
179
+ "start": 0
180
+ }
181
+ ],
182
+ "thinker.audio_tower.layers.10.self_attn.k_proj": [
183
+ {
184
+ "alpha": 8,
185
+ "end": 8,
186
+ "rank": 8,
187
+ "start": 0
188
+ },
189
+ {
190
+ "alpha": 8,
191
+ "end": 16,
192
+ "rank": 8,
193
+ "start": 8
194
+ }
195
+ ],
196
+ "thinker.audio_tower.layers.10.self_attn.out_proj": [
197
+ {
198
+ "alpha": 8,
199
+ "end": 8,
200
+ "rank": 8,
201
+ "start": 0
202
+ },
203
+ {
204
+ "alpha": 8,
205
+ "end": 16,
206
+ "rank": 8,
207
+ "start": 8
208
+ }
209
+ ],
210
+ "thinker.audio_tower.layers.10.self_attn.q_proj": [
211
+ {
212
+ "alpha": 8,
213
+ "end": 8,
214
+ "rank": 8,
215
+ "start": 0
216
+ },
217
+ {
218
+ "alpha": 8,
219
+ "end": 16,
220
+ "rank": 8,
221
+ "start": 8
222
+ }
223
+ ],
224
+ "thinker.audio_tower.layers.10.self_attn.v_proj": [
225
+ {
226
+ "alpha": 8,
227
+ "end": 8,
228
+ "rank": 8,
229
+ "start": 0
230
+ },
231
+ {
232
+ "alpha": 8,
233
+ "end": 16,
234
+ "rank": 8,
235
+ "start": 8
236
+ }
237
+ ],
238
+ "thinker.audio_tower.layers.11.fc1": [
239
+ {
240
+ "alpha": 8,
241
+ "end": 8,
242
+ "rank": 8,
243
+ "start": 0
244
+ }
245
+ ],
246
+ "thinker.audio_tower.layers.11.fc2": [
247
+ {
248
+ "alpha": 8,
249
+ "end": 8,
250
+ "rank": 8,
251
+ "start": 0
252
+ }
253
+ ],
254
+ "thinker.audio_tower.layers.11.self_attn.k_proj": [
255
+ {
256
+ "alpha": 8,
257
+ "end": 8,
258
+ "rank": 8,
259
+ "start": 0
260
+ },
261
+ {
262
+ "alpha": 8,
263
+ "end": 16,
264
+ "rank": 8,
265
+ "start": 8
266
+ }
267
+ ],
268
+ "thinker.audio_tower.layers.11.self_attn.out_proj": [
269
+ {
270
+ "alpha": 8,
271
+ "end": 8,
272
+ "rank": 8,
273
+ "start": 0
274
+ },
275
+ {
276
+ "alpha": 8,
277
+ "end": 16,
278
+ "rank": 8,
279
+ "start": 8
280
+ }
281
+ ],
282
+ "thinker.audio_tower.layers.11.self_attn.q_proj": [
283
+ {
284
+ "alpha": 8,
285
+ "end": 8,
286
+ "rank": 8,
287
+ "start": 0
288
+ },
289
+ {
290
+ "alpha": 8,
291
+ "end": 16,
292
+ "rank": 8,
293
+ "start": 8
294
+ }
295
+ ],
296
+ "thinker.audio_tower.layers.11.self_attn.v_proj": [
297
+ {
298
+ "alpha": 8,
299
+ "end": 8,
300
+ "rank": 8,
301
+ "start": 0
302
+ },
303
+ {
304
+ "alpha": 8,
305
+ "end": 16,
306
+ "rank": 8,
307
+ "start": 8
308
+ }
309
+ ],
310
+ "thinker.audio_tower.layers.12.fc1": [
311
+ {
312
+ "alpha": 8,
313
+ "end": 8,
314
+ "rank": 8,
315
+ "start": 0
316
+ }
317
+ ],
318
+ "thinker.audio_tower.layers.12.fc2": [
319
+ {
320
+ "alpha": 8,
321
+ "end": 8,
322
+ "rank": 8,
323
+ "start": 0
324
+ }
325
+ ],
326
+ "thinker.audio_tower.layers.12.self_attn.k_proj": [
327
+ {
328
+ "alpha": 8,
329
+ "end": 8,
330
+ "rank": 8,
331
+ "start": 0
332
+ },
333
+ {
334
+ "alpha": 8,
335
+ "end": 16,
336
+ "rank": 8,
337
+ "start": 8
338
+ }
339
+ ],
340
+ "thinker.audio_tower.layers.12.self_attn.out_proj": [
341
+ {
342
+ "alpha": 8,
343
+ "end": 8,
344
+ "rank": 8,
345
+ "start": 0
346
+ },
347
+ {
348
+ "alpha": 8,
349
+ "end": 16,
350
+ "rank": 8,
351
+ "start": 8
352
+ }
353
+ ],
354
+ "thinker.audio_tower.layers.12.self_attn.q_proj": [
355
+ {
356
+ "alpha": 8,
357
+ "end": 8,
358
+ "rank": 8,
359
+ "start": 0
360
+ },
361
+ {
362
+ "alpha": 8,
363
+ "end": 16,
364
+ "rank": 8,
365
+ "start": 8
366
+ }
367
+ ],
368
+ "thinker.audio_tower.layers.12.self_attn.v_proj": [
369
+ {
370
+ "alpha": 8,
371
+ "end": 8,
372
+ "rank": 8,
373
+ "start": 0
374
+ },
375
+ {
376
+ "alpha": 8,
377
+ "end": 16,
378
+ "rank": 8,
379
+ "start": 8
380
+ }
381
+ ],
382
+ "thinker.audio_tower.layers.13.fc1": [
383
+ {
384
+ "alpha": 8,
385
+ "end": 8,
386
+ "rank": 8,
387
+ "start": 0
388
+ }
389
+ ],
390
+ "thinker.audio_tower.layers.13.fc2": [
391
+ {
392
+ "alpha": 8,
393
+ "end": 8,
394
+ "rank": 8,
395
+ "start": 0
396
+ }
397
+ ],
398
+ "thinker.audio_tower.layers.13.self_attn.k_proj": [
399
+ {
400
+ "alpha": 8,
401
+ "end": 8,
402
+ "rank": 8,
403
+ "start": 0
404
+ },
405
+ {
406
+ "alpha": 8,
407
+ "end": 16,
408
+ "rank": 8,
409
+ "start": 8
410
+ }
411
+ ],
412
+ "thinker.audio_tower.layers.13.self_attn.out_proj": [
413
+ {
414
+ "alpha": 8,
415
+ "end": 8,
416
+ "rank": 8,
417
+ "start": 0
418
+ },
419
+ {
420
+ "alpha": 8,
421
+ "end": 16,
422
+ "rank": 8,
423
+ "start": 8
424
+ }
425
+ ],
426
+ "thinker.audio_tower.layers.13.self_attn.q_proj": [
427
+ {
428
+ "alpha": 8,
429
+ "end": 8,
430
+ "rank": 8,
431
+ "start": 0
432
+ },
433
+ {
434
+ "alpha": 8,
435
+ "end": 16,
436
+ "rank": 8,
437
+ "start": 8
438
+ }
439
+ ],
440
+ "thinker.audio_tower.layers.13.self_attn.v_proj": [
441
+ {
442
+ "alpha": 8,
443
+ "end": 8,
444
+ "rank": 8,
445
+ "start": 0
446
+ },
447
+ {
448
+ "alpha": 8,
449
+ "end": 16,
450
+ "rank": 8,
451
+ "start": 8
452
+ }
453
+ ],
454
+ "thinker.audio_tower.layers.14.fc1": [
455
+ {
456
+ "alpha": 8,
457
+ "end": 8,
458
+ "rank": 8,
459
+ "start": 0
460
+ }
461
+ ],
462
+ "thinker.audio_tower.layers.14.fc2": [
463
+ {
464
+ "alpha": 8,
465
+ "end": 8,
466
+ "rank": 8,
467
+ "start": 0
468
+ }
469
+ ],
470
+ "thinker.audio_tower.layers.14.self_attn.k_proj": [
471
+ {
472
+ "alpha": 8,
473
+ "end": 8,
474
+ "rank": 8,
475
+ "start": 0
476
+ },
477
+ {
478
+ "alpha": 8,
479
+ "end": 16,
480
+ "rank": 8,
481
+ "start": 8
482
+ }
483
+ ],
484
+ "thinker.audio_tower.layers.14.self_attn.out_proj": [
485
+ {
486
+ "alpha": 8,
487
+ "end": 8,
488
+ "rank": 8,
489
+ "start": 0
490
+ },
491
+ {
492
+ "alpha": 8,
493
+ "end": 16,
494
+ "rank": 8,
495
+ "start": 8
496
+ }
497
+ ],
498
+ "thinker.audio_tower.layers.14.self_attn.q_proj": [
499
+ {
500
+ "alpha": 8,
501
+ "end": 8,
502
+ "rank": 8,
503
+ "start": 0
504
+ },
505
+ {
506
+ "alpha": 8,
507
+ "end": 16,
508
+ "rank": 8,
509
+ "start": 8
510
+ }
511
+ ],
512
+ "thinker.audio_tower.layers.14.self_attn.v_proj": [
513
+ {
514
+ "alpha": 8,
515
+ "end": 8,
516
+ "rank": 8,
517
+ "start": 0
518
+ },
519
+ {
520
+ "alpha": 8,
521
+ "end": 16,
522
+ "rank": 8,
523
+ "start": 8
524
+ }
525
+ ],
526
+ "thinker.audio_tower.layers.15.fc1": [
527
+ {
528
+ "alpha": 8,
529
+ "end": 8,
530
+ "rank": 8,
531
+ "start": 0
532
+ }
533
+ ],
534
+ "thinker.audio_tower.layers.15.fc2": [
535
+ {
536
+ "alpha": 8,
537
+ "end": 8,
538
+ "rank": 8,
539
+ "start": 0
540
+ }
541
+ ],
542
+ "thinker.audio_tower.layers.15.self_attn.k_proj": [
543
+ {
544
+ "alpha": 8,
545
+ "end": 8,
546
+ "rank": 8,
547
+ "start": 0
548
+ },
549
+ {
550
+ "alpha": 8,
551
+ "end": 16,
552
+ "rank": 8,
553
+ "start": 8
554
+ }
555
+ ],
556
+ "thinker.audio_tower.layers.15.self_attn.out_proj": [
557
+ {
558
+ "alpha": 8,
559
+ "end": 8,
560
+ "rank": 8,
561
+ "start": 0
562
+ },
563
+ {
564
+ "alpha": 8,
565
+ "end": 16,
566
+ "rank": 8,
567
+ "start": 8
568
+ }
569
+ ],
570
+ "thinker.audio_tower.layers.15.self_attn.q_proj": [
571
+ {
572
+ "alpha": 8,
573
+ "end": 8,
574
+ "rank": 8,
575
+ "start": 0
576
+ },
577
+ {
578
+ "alpha": 8,
579
+ "end": 16,
580
+ "rank": 8,
581
+ "start": 8
582
+ }
583
+ ],
584
+ "thinker.audio_tower.layers.15.self_attn.v_proj": [
585
+ {
586
+ "alpha": 8,
587
+ "end": 8,
588
+ "rank": 8,
589
+ "start": 0
590
+ },
591
+ {
592
+ "alpha": 8,
593
+ "end": 16,
594
+ "rank": 8,
595
+ "start": 8
596
+ }
597
+ ],
598
+ "thinker.audio_tower.layers.16.fc1": [
599
+ {
600
+ "alpha": 8,
601
+ "end": 8,
602
+ "rank": 8,
603
+ "start": 0
604
+ }
605
+ ],
606
+ "thinker.audio_tower.layers.16.fc2": [
607
+ {
608
+ "alpha": 8,
609
+ "end": 8,
610
+ "rank": 8,
611
+ "start": 0
612
+ }
613
+ ],
614
+ "thinker.audio_tower.layers.16.self_attn.k_proj": [
615
+ {
616
+ "alpha": 8,
617
+ "end": 8,
618
+ "rank": 8,
619
+ "start": 0
620
+ },
621
+ {
622
+ "alpha": 8,
623
+ "end": 16,
624
+ "rank": 8,
625
+ "start": 8
626
+ }
627
+ ],
628
+ "thinker.audio_tower.layers.16.self_attn.out_proj": [
629
+ {
630
+ "alpha": 8,
631
+ "end": 8,
632
+ "rank": 8,
633
+ "start": 0
634
+ },
635
+ {
636
+ "alpha": 8,
637
+ "end": 16,
638
+ "rank": 8,
639
+ "start": 8
640
+ }
641
+ ],
642
+ "thinker.audio_tower.layers.16.self_attn.q_proj": [
643
+ {
644
+ "alpha": 8,
645
+ "end": 8,
646
+ "rank": 8,
647
+ "start": 0
648
+ },
649
+ {
650
+ "alpha": 8,
651
+ "end": 16,
652
+ "rank": 8,
653
+ "start": 8
654
+ }
655
+ ],
656
+ "thinker.audio_tower.layers.16.self_attn.v_proj": [
657
+ {
658
+ "alpha": 8,
659
+ "end": 8,
660
+ "rank": 8,
661
+ "start": 0
662
+ },
663
+ {
664
+ "alpha": 8,
665
+ "end": 16,
666
+ "rank": 8,
667
+ "start": 8
668
+ }
669
+ ],
670
+ "thinker.audio_tower.layers.17.fc1": [
671
+ {
672
+ "alpha": 8,
673
+ "end": 8,
674
+ "rank": 8,
675
+ "start": 0
676
+ }
677
+ ],
678
+ "thinker.audio_tower.layers.17.fc2": [
679
+ {
680
+ "alpha": 8,
681
+ "end": 8,
682
+ "rank": 8,
683
+ "start": 0
684
+ }
685
+ ],
686
+ "thinker.audio_tower.layers.17.self_attn.k_proj": [
687
+ {
688
+ "alpha": 8,
689
+ "end": 8,
690
+ "rank": 8,
691
+ "start": 0
692
+ },
693
+ {
694
+ "alpha": 8,
695
+ "end": 16,
696
+ "rank": 8,
697
+ "start": 8
698
+ }
699
+ ],
700
+ "thinker.audio_tower.layers.17.self_attn.out_proj": [
701
+ {
702
+ "alpha": 8,
703
+ "end": 8,
704
+ "rank": 8,
705
+ "start": 0
706
+ },
707
+ {
708
+ "alpha": 8,
709
+ "end": 16,
710
+ "rank": 8,
711
+ "start": 8
712
+ }
713
+ ],
714
+ "thinker.audio_tower.layers.17.self_attn.q_proj": [
715
+ {
716
+ "alpha": 8,
717
+ "end": 8,
718
+ "rank": 8,
719
+ "start": 0
720
+ },
721
+ {
722
+ "alpha": 8,
723
+ "end": 16,
724
+ "rank": 8,
725
+ "start": 8
726
+ }
727
+ ],
728
+ "thinker.audio_tower.layers.17.self_attn.v_proj": [
729
+ {
730
+ "alpha": 8,
731
+ "end": 8,
732
+ "rank": 8,
733
+ "start": 0
734
+ },
735
+ {
736
+ "alpha": 8,
737
+ "end": 16,
738
+ "rank": 8,
739
+ "start": 8
740
+ }
741
+ ],
742
+ "thinker.audio_tower.layers.18.fc1": [
743
+ {
744
+ "alpha": 8,
745
+ "end": 8,
746
+ "rank": 8,
747
+ "start": 0
748
+ }
749
+ ],
750
+ "thinker.audio_tower.layers.18.fc2": [
751
+ {
752
+ "alpha": 8,
753
+ "end": 8,
754
+ "rank": 8,
755
+ "start": 0
756
+ }
757
+ ],
758
+ "thinker.audio_tower.layers.18.self_attn.k_proj": [
759
+ {
760
+ "alpha": 8,
761
+ "end": 8,
762
+ "rank": 8,
763
+ "start": 0
764
+ },
765
+ {
766
+ "alpha": 8,
767
+ "end": 16,
768
+ "rank": 8,
769
+ "start": 8
770
+ }
771
+ ],
772
+ "thinker.audio_tower.layers.18.self_attn.out_proj": [
773
+ {
774
+ "alpha": 8,
775
+ "end": 8,
776
+ "rank": 8,
777
+ "start": 0
778
+ },
779
+ {
780
+ "alpha": 8,
781
+ "end": 16,
782
+ "rank": 8,
783
+ "start": 8
784
+ }
785
+ ],
786
+ "thinker.audio_tower.layers.18.self_attn.q_proj": [
787
+ {
788
+ "alpha": 8,
789
+ "end": 8,
790
+ "rank": 8,
791
+ "start": 0
792
+ },
793
+ {
794
+ "alpha": 8,
795
+ "end": 16,
796
+ "rank": 8,
797
+ "start": 8
798
+ }
799
+ ],
800
+ "thinker.audio_tower.layers.18.self_attn.v_proj": [
801
+ {
802
+ "alpha": 8,
803
+ "end": 8,
804
+ "rank": 8,
805
+ "start": 0
806
+ },
807
+ {
808
+ "alpha": 8,
809
+ "end": 16,
810
+ "rank": 8,
811
+ "start": 8
812
+ }
813
+ ],
814
+ "thinker.audio_tower.layers.19.fc1": [
815
+ {
816
+ "alpha": 8,
817
+ "end": 8,
818
+ "rank": 8,
819
+ "start": 0
820
+ }
821
+ ],
822
+ "thinker.audio_tower.layers.19.fc2": [
823
+ {
824
+ "alpha": 8,
825
+ "end": 8,
826
+ "rank": 8,
827
+ "start": 0
828
+ }
829
+ ],
830
+ "thinker.audio_tower.layers.19.self_attn.k_proj": [
831
+ {
832
+ "alpha": 8,
833
+ "end": 8,
834
+ "rank": 8,
835
+ "start": 0
836
+ },
837
+ {
838
+ "alpha": 8,
839
+ "end": 16,
840
+ "rank": 8,
841
+ "start": 8
842
+ }
843
+ ],
844
+ "thinker.audio_tower.layers.19.self_attn.out_proj": [
845
+ {
846
+ "alpha": 8,
847
+ "end": 8,
848
+ "rank": 8,
849
+ "start": 0
850
+ },
851
+ {
852
+ "alpha": 8,
853
+ "end": 16,
854
+ "rank": 8,
855
+ "start": 8
856
+ }
857
+ ],
858
+ "thinker.audio_tower.layers.19.self_attn.q_proj": [
859
+ {
860
+ "alpha": 8,
861
+ "end": 8,
862
+ "rank": 8,
863
+ "start": 0
864
+ },
865
+ {
866
+ "alpha": 8,
867
+ "end": 16,
868
+ "rank": 8,
869
+ "start": 8
870
+ }
871
+ ],
872
+ "thinker.audio_tower.layers.19.self_attn.v_proj": [
873
+ {
874
+ "alpha": 8,
875
+ "end": 8,
876
+ "rank": 8,
877
+ "start": 0
878
+ },
879
+ {
880
+ "alpha": 8,
881
+ "end": 16,
882
+ "rank": 8,
883
+ "start": 8
884
+ }
885
+ ],
886
+ "thinker.audio_tower.layers.2.fc1": [
887
+ {
888
+ "alpha": 8,
889
+ "end": 8,
890
+ "rank": 8,
891
+ "start": 0
892
+ }
893
+ ],
894
+ "thinker.audio_tower.layers.2.fc2": [
895
+ {
896
+ "alpha": 8,
897
+ "end": 8,
898
+ "rank": 8,
899
+ "start": 0
900
+ }
901
+ ],
902
+ "thinker.audio_tower.layers.2.self_attn.k_proj": [
903
+ {
904
+ "alpha": 8,
905
+ "end": 8,
906
+ "rank": 8,
907
+ "start": 0
908
+ },
909
+ {
910
+ "alpha": 8,
911
+ "end": 16,
912
+ "rank": 8,
913
+ "start": 8
914
+ }
915
+ ],
916
+ "thinker.audio_tower.layers.2.self_attn.out_proj": [
917
+ {
918
+ "alpha": 8,
919
+ "end": 8,
920
+ "rank": 8,
921
+ "start": 0
922
+ },
923
+ {
924
+ "alpha": 8,
925
+ "end": 16,
926
+ "rank": 8,
927
+ "start": 8
928
+ }
929
+ ],
930
+ "thinker.audio_tower.layers.2.self_attn.q_proj": [
931
+ {
932
+ "alpha": 8,
933
+ "end": 8,
934
+ "rank": 8,
935
+ "start": 0
936
+ },
937
+ {
938
+ "alpha": 8,
939
+ "end": 16,
940
+ "rank": 8,
941
+ "start": 8
942
+ }
943
+ ],
944
+ "thinker.audio_tower.layers.2.self_attn.v_proj": [
945
+ {
946
+ "alpha": 8,
947
+ "end": 8,
948
+ "rank": 8,
949
+ "start": 0
950
+ },
951
+ {
952
+ "alpha": 8,
953
+ "end": 16,
954
+ "rank": 8,
955
+ "start": 8
956
+ }
957
+ ],
958
+ "thinker.audio_tower.layers.20.fc1": [
959
+ {
960
+ "alpha": 8,
961
+ "end": 8,
962
+ "rank": 8,
963
+ "start": 0
964
+ }
965
+ ],
966
+ "thinker.audio_tower.layers.20.fc2": [
967
+ {
968
+ "alpha": 8,
969
+ "end": 8,
970
+ "rank": 8,
971
+ "start": 0
972
+ }
973
+ ],
974
+ "thinker.audio_tower.layers.20.self_attn.k_proj": [
975
+ {
976
+ "alpha": 8,
977
+ "end": 8,
978
+ "rank": 8,
979
+ "start": 0
980
+ },
981
+ {
982
+ "alpha": 8,
983
+ "end": 16,
984
+ "rank": 8,
985
+ "start": 8
986
+ },
987
+ {
988
+ "alpha": 8,
989
+ "end": 24,
990
+ "rank": 8,
991
+ "start": 16
992
+ }
993
+ ],
994
+ "thinker.audio_tower.layers.20.self_attn.out_proj": [
995
+ {
996
+ "alpha": 8,
997
+ "end": 8,
998
+ "rank": 8,
999
+ "start": 0
1000
+ },
1001
+ {
1002
+ "alpha": 8,
1003
+ "end": 16,
1004
+ "rank": 8,
1005
+ "start": 8
1006
+ },
1007
+ {
1008
+ "alpha": 8,
1009
+ "end": 24,
1010
+ "rank": 8,
1011
+ "start": 16
1012
+ }
1013
+ ],
1014
+ "thinker.audio_tower.layers.20.self_attn.q_proj": [
1015
+ {
1016
+ "alpha": 8,
1017
+ "end": 8,
1018
+ "rank": 8,
1019
+ "start": 0
1020
+ },
1021
+ {
1022
+ "alpha": 8,
1023
+ "end": 16,
1024
+ "rank": 8,
1025
+ "start": 8
1026
+ },
1027
+ {
1028
+ "alpha": 8,
1029
+ "end": 24,
1030
+ "rank": 8,
1031
+ "start": 16
1032
+ }
1033
+ ],
1034
+ "thinker.audio_tower.layers.20.self_attn.v_proj": [
1035
+ {
1036
+ "alpha": 8,
1037
+ "end": 8,
1038
+ "rank": 8,
1039
+ "start": 0
1040
+ },
1041
+ {
1042
+ "alpha": 8,
1043
+ "end": 16,
1044
+ "rank": 8,
1045
+ "start": 8
1046
+ },
1047
+ {
1048
+ "alpha": 8,
1049
+ "end": 24,
1050
+ "rank": 8,
1051
+ "start": 16
1052
+ }
1053
+ ],
1054
+ "thinker.audio_tower.layers.21.fc1": [
1055
+ {
1056
+ "alpha": 8,
1057
+ "end": 8,
1058
+ "rank": 8,
1059
+ "start": 0
1060
+ }
1061
+ ],
1062
+ "thinker.audio_tower.layers.21.fc2": [
1063
+ {
1064
+ "alpha": 8,
1065
+ "end": 8,
1066
+ "rank": 8,
1067
+ "start": 0
1068
+ }
1069
+ ],
1070
+ "thinker.audio_tower.layers.21.self_attn.k_proj": [
1071
+ {
1072
+ "alpha": 8,
1073
+ "end": 8,
1074
+ "rank": 8,
1075
+ "start": 0
1076
+ },
1077
+ {
1078
+ "alpha": 8,
1079
+ "end": 16,
1080
+ "rank": 8,
1081
+ "start": 8
1082
+ },
1083
+ {
1084
+ "alpha": 8,
1085
+ "end": 24,
1086
+ "rank": 8,
1087
+ "start": 16
1088
+ }
1089
+ ],
1090
+ "thinker.audio_tower.layers.21.self_attn.out_proj": [
1091
+ {
1092
+ "alpha": 8,
1093
+ "end": 8,
1094
+ "rank": 8,
1095
+ "start": 0
1096
+ },
1097
+ {
1098
+ "alpha": 8,
1099
+ "end": 16,
1100
+ "rank": 8,
1101
+ "start": 8
1102
+ },
1103
+ {
1104
+ "alpha": 8,
1105
+ "end": 24,
1106
+ "rank": 8,
1107
+ "start": 16
1108
+ }
1109
+ ],
1110
+ "thinker.audio_tower.layers.21.self_attn.q_proj": [
1111
+ {
1112
+ "alpha": 8,
1113
+ "end": 8,
1114
+ "rank": 8,
1115
+ "start": 0
1116
+ },
1117
+ {
1118
+ "alpha": 8,
1119
+ "end": 16,
1120
+ "rank": 8,
1121
+ "start": 8
1122
+ },
1123
+ {
1124
+ "alpha": 8,
1125
+ "end": 24,
1126
+ "rank": 8,
1127
+ "start": 16
1128
+ }
1129
+ ],
1130
+ "thinker.audio_tower.layers.21.self_attn.v_proj": [
1131
+ {
1132
+ "alpha": 8,
1133
+ "end": 8,
1134
+ "rank": 8,
1135
+ "start": 0
1136
+ },
1137
+ {
1138
+ "alpha": 8,
1139
+ "end": 16,
1140
+ "rank": 8,
1141
+ "start": 8
1142
+ },
1143
+ {
1144
+ "alpha": 8,
1145
+ "end": 24,
1146
+ "rank": 8,
1147
+ "start": 16
1148
+ }
1149
+ ],
1150
+ "thinker.audio_tower.layers.22.fc1": [
1151
+ {
1152
+ "alpha": 8,
1153
+ "end": 8,
1154
+ "rank": 8,
1155
+ "start": 0
1156
+ }
1157
+ ],
1158
+ "thinker.audio_tower.layers.22.fc2": [
1159
+ {
1160
+ "alpha": 8,
1161
+ "end": 8,
1162
+ "rank": 8,
1163
+ "start": 0
1164
+ }
1165
+ ],
1166
+ "thinker.audio_tower.layers.22.self_attn.k_proj": [
1167
+ {
1168
+ "alpha": 8,
1169
+ "end": 8,
1170
+ "rank": 8,
1171
+ "start": 0
1172
+ },
1173
+ {
1174
+ "alpha": 8,
1175
+ "end": 16,
1176
+ "rank": 8,
1177
+ "start": 8
1178
+ },
1179
+ {
1180
+ "alpha": 8,
1181
+ "end": 24,
1182
+ "rank": 8,
1183
+ "start": 16
1184
+ }
1185
+ ],
1186
+ "thinker.audio_tower.layers.22.self_attn.out_proj": [
1187
+ {
1188
+ "alpha": 8,
1189
+ "end": 8,
1190
+ "rank": 8,
1191
+ "start": 0
1192
+ },
1193
+ {
1194
+ "alpha": 8,
1195
+ "end": 16,
1196
+ "rank": 8,
1197
+ "start": 8
1198
+ },
1199
+ {
1200
+ "alpha": 8,
1201
+ "end": 24,
1202
+ "rank": 8,
1203
+ "start": 16
1204
+ }
1205
+ ],
1206
+ "thinker.audio_tower.layers.22.self_attn.q_proj": [
1207
+ {
1208
+ "alpha": 8,
1209
+ "end": 8,
1210
+ "rank": 8,
1211
+ "start": 0
1212
+ },
1213
+ {
1214
+ "alpha": 8,
1215
+ "end": 16,
1216
+ "rank": 8,
1217
+ "start": 8
1218
+ },
1219
+ {
1220
+ "alpha": 8,
1221
+ "end": 24,
1222
+ "rank": 8,
1223
+ "start": 16
1224
+ }
1225
+ ],
1226
+ "thinker.audio_tower.layers.22.self_attn.v_proj": [
1227
+ {
1228
+ "alpha": 8,
1229
+ "end": 8,
1230
+ "rank": 8,
1231
+ "start": 0
1232
+ },
1233
+ {
1234
+ "alpha": 8,
1235
+ "end": 16,
1236
+ "rank": 8,
1237
+ "start": 8
1238
+ },
1239
+ {
1240
+ "alpha": 8,
1241
+ "end": 24,
1242
+ "rank": 8,
1243
+ "start": 16
1244
+ }
1245
+ ],
1246
+ "thinker.audio_tower.layers.23.fc1": [
1247
+ {
1248
+ "alpha": 8,
1249
+ "end": 8,
1250
+ "rank": 8,
1251
+ "start": 0
1252
+ }
1253
+ ],
1254
+ "thinker.audio_tower.layers.23.fc2": [
1255
+ {
1256
+ "alpha": 8,
1257
+ "end": 8,
1258
+ "rank": 8,
1259
+ "start": 0
1260
+ }
1261
+ ],
1262
+ "thinker.audio_tower.layers.23.self_attn.k_proj": [
1263
+ {
1264
+ "alpha": 8,
1265
+ "end": 8,
1266
+ "rank": 8,
1267
+ "start": 0
1268
+ },
1269
+ {
1270
+ "alpha": 8,
1271
+ "end": 16,
1272
+ "rank": 8,
1273
+ "start": 8
1274
+ },
1275
+ {
1276
+ "alpha": 8,
1277
+ "end": 24,
1278
+ "rank": 8,
1279
+ "start": 16
1280
+ }
1281
+ ],
1282
+ "thinker.audio_tower.layers.23.self_attn.out_proj": [
1283
+ {
1284
+ "alpha": 8,
1285
+ "end": 8,
1286
+ "rank": 8,
1287
+ "start": 0
1288
+ },
1289
+ {
1290
+ "alpha": 8,
1291
+ "end": 16,
1292
+ "rank": 8,
1293
+ "start": 8
1294
+ },
1295
+ {
1296
+ "alpha": 8,
1297
+ "end": 24,
1298
+ "rank": 8,
1299
+ "start": 16
1300
+ }
1301
+ ],
1302
+ "thinker.audio_tower.layers.23.self_attn.q_proj": [
1303
+ {
1304
+ "alpha": 8,
1305
+ "end": 8,
1306
+ "rank": 8,
1307
+ "start": 0
1308
+ },
1309
+ {
1310
+ "alpha": 8,
1311
+ "end": 16,
1312
+ "rank": 8,
1313
+ "start": 8
1314
+ },
1315
+ {
1316
+ "alpha": 8,
1317
+ "end": 24,
1318
+ "rank": 8,
1319
+ "start": 16
1320
+ }
1321
+ ],
1322
+ "thinker.audio_tower.layers.23.self_attn.v_proj": [
1323
+ {
1324
+ "alpha": 8,
1325
+ "end": 8,
1326
+ "rank": 8,
1327
+ "start": 0
1328
+ },
1329
+ {
1330
+ "alpha": 8,
1331
+ "end": 16,
1332
+ "rank": 8,
1333
+ "start": 8
1334
+ },
1335
+ {
1336
+ "alpha": 8,
1337
+ "end": 24,
1338
+ "rank": 8,
1339
+ "start": 16
1340
+ }
1341
+ ],
1342
+ "thinker.audio_tower.layers.3.fc1": [
1343
+ {
1344
+ "alpha": 8,
1345
+ "end": 8,
1346
+ "rank": 8,
1347
+ "start": 0
1348
+ }
1349
+ ],
1350
+ "thinker.audio_tower.layers.3.fc2": [
1351
+ {
1352
+ "alpha": 8,
1353
+ "end": 8,
1354
+ "rank": 8,
1355
+ "start": 0
1356
+ }
1357
+ ],
1358
+ "thinker.audio_tower.layers.3.self_attn.k_proj": [
1359
+ {
1360
+ "alpha": 8,
1361
+ "end": 8,
1362
+ "rank": 8,
1363
+ "start": 0
1364
+ },
1365
+ {
1366
+ "alpha": 8,
1367
+ "end": 16,
1368
+ "rank": 8,
1369
+ "start": 8
1370
+ }
1371
+ ],
1372
+ "thinker.audio_tower.layers.3.self_attn.out_proj": [
1373
+ {
1374
+ "alpha": 8,
1375
+ "end": 8,
1376
+ "rank": 8,
1377
+ "start": 0
1378
+ },
1379
+ {
1380
+ "alpha": 8,
1381
+ "end": 16,
1382
+ "rank": 8,
1383
+ "start": 8
1384
+ }
1385
+ ],
1386
+ "thinker.audio_tower.layers.3.self_attn.q_proj": [
1387
+ {
1388
+ "alpha": 8,
1389
+ "end": 8,
1390
+ "rank": 8,
1391
+ "start": 0
1392
+ },
1393
+ {
1394
+ "alpha": 8,
1395
+ "end": 16,
1396
+ "rank": 8,
1397
+ "start": 8
1398
+ }
1399
+ ],
1400
+ "thinker.audio_tower.layers.3.self_attn.v_proj": [
1401
+ {
1402
+ "alpha": 8,
1403
+ "end": 8,
1404
+ "rank": 8,
1405
+ "start": 0
1406
+ },
1407
+ {
1408
+ "alpha": 8,
1409
+ "end": 16,
1410
+ "rank": 8,
1411
+ "start": 8
1412
+ }
1413
+ ],
1414
+ "thinker.audio_tower.layers.4.fc1": [
1415
+ {
1416
+ "alpha": 8,
1417
+ "end": 8,
1418
+ "rank": 8,
1419
+ "start": 0
1420
+ }
1421
+ ],
1422
+ "thinker.audio_tower.layers.4.fc2": [
1423
+ {
1424
+ "alpha": 8,
1425
+ "end": 8,
1426
+ "rank": 8,
1427
+ "start": 0
1428
+ }
1429
+ ],
1430
+ "thinker.audio_tower.layers.4.self_attn.k_proj": [
1431
+ {
1432
+ "alpha": 8,
1433
+ "end": 8,
1434
+ "rank": 8,
1435
+ "start": 0
1436
+ },
1437
+ {
1438
+ "alpha": 8,
1439
+ "end": 16,
1440
+ "rank": 8,
1441
+ "start": 8
1442
+ }
1443
+ ],
1444
+ "thinker.audio_tower.layers.4.self_attn.out_proj": [
1445
+ {
1446
+ "alpha": 8,
1447
+ "end": 8,
1448
+ "rank": 8,
1449
+ "start": 0
1450
+ },
1451
+ {
1452
+ "alpha": 8,
1453
+ "end": 16,
1454
+ "rank": 8,
1455
+ "start": 8
1456
+ }
1457
+ ],
1458
+ "thinker.audio_tower.layers.4.self_attn.q_proj": [
1459
+ {
1460
+ "alpha": 8,
1461
+ "end": 8,
1462
+ "rank": 8,
1463
+ "start": 0
1464
+ },
1465
+ {
1466
+ "alpha": 8,
1467
+ "end": 16,
1468
+ "rank": 8,
1469
+ "start": 8
1470
+ }
1471
+ ],
1472
+ "thinker.audio_tower.layers.4.self_attn.v_proj": [
1473
+ {
1474
+ "alpha": 8,
1475
+ "end": 8,
1476
+ "rank": 8,
1477
+ "start": 0
1478
+ },
1479
+ {
1480
+ "alpha": 8,
1481
+ "end": 16,
1482
+ "rank": 8,
1483
+ "start": 8
1484
+ }
1485
+ ],
1486
+ "thinker.audio_tower.layers.5.fc1": [
1487
+ {
1488
+ "alpha": 8,
1489
+ "end": 8,
1490
+ "rank": 8,
1491
+ "start": 0
1492
+ }
1493
+ ],
1494
+ "thinker.audio_tower.layers.5.fc2": [
1495
+ {
1496
+ "alpha": 8,
1497
+ "end": 8,
1498
+ "rank": 8,
1499
+ "start": 0
1500
+ }
1501
+ ],
1502
+ "thinker.audio_tower.layers.5.self_attn.k_proj": [
1503
+ {
1504
+ "alpha": 8,
1505
+ "end": 8,
1506
+ "rank": 8,
1507
+ "start": 0
1508
+ },
1509
+ {
1510
+ "alpha": 8,
1511
+ "end": 16,
1512
+ "rank": 8,
1513
+ "start": 8
1514
+ }
1515
+ ],
1516
+ "thinker.audio_tower.layers.5.self_attn.out_proj": [
1517
+ {
1518
+ "alpha": 8,
1519
+ "end": 8,
1520
+ "rank": 8,
1521
+ "start": 0
1522
+ },
1523
+ {
1524
+ "alpha": 8,
1525
+ "end": 16,
1526
+ "rank": 8,
1527
+ "start": 8
1528
+ }
1529
+ ],
1530
+ "thinker.audio_tower.layers.5.self_attn.q_proj": [
1531
+ {
1532
+ "alpha": 8,
1533
+ "end": 8,
1534
+ "rank": 8,
1535
+ "start": 0
1536
+ },
1537
+ {
1538
+ "alpha": 8,
1539
+ "end": 16,
1540
+ "rank": 8,
1541
+ "start": 8
1542
+ }
1543
+ ],
1544
+ "thinker.audio_tower.layers.5.self_attn.v_proj": [
1545
+ {
1546
+ "alpha": 8,
1547
+ "end": 8,
1548
+ "rank": 8,
1549
+ "start": 0
1550
+ },
1551
+ {
1552
+ "alpha": 8,
1553
+ "end": 16,
1554
+ "rank": 8,
1555
+ "start": 8
1556
+ }
1557
+ ],
1558
+ "thinker.audio_tower.layers.6.fc1": [
1559
+ {
1560
+ "alpha": 8,
1561
+ "end": 8,
1562
+ "rank": 8,
1563
+ "start": 0
1564
+ }
1565
+ ],
1566
+ "thinker.audio_tower.layers.6.fc2": [
1567
+ {
1568
+ "alpha": 8,
1569
+ "end": 8,
1570
+ "rank": 8,
1571
+ "start": 0
1572
+ }
1573
+ ],
1574
+ "thinker.audio_tower.layers.6.self_attn.k_proj": [
1575
+ {
1576
+ "alpha": 8,
1577
+ "end": 8,
1578
+ "rank": 8,
1579
+ "start": 0
1580
+ },
1581
+ {
1582
+ "alpha": 8,
1583
+ "end": 16,
1584
+ "rank": 8,
1585
+ "start": 8
1586
+ }
1587
+ ],
1588
+ "thinker.audio_tower.layers.6.self_attn.out_proj": [
1589
+ {
1590
+ "alpha": 8,
1591
+ "end": 8,
1592
+ "rank": 8,
1593
+ "start": 0
1594
+ },
1595
+ {
1596
+ "alpha": 8,
1597
+ "end": 16,
1598
+ "rank": 8,
1599
+ "start": 8
1600
+ }
1601
+ ],
1602
+ "thinker.audio_tower.layers.6.self_attn.q_proj": [
1603
+ {
1604
+ "alpha": 8,
1605
+ "end": 8,
1606
+ "rank": 8,
1607
+ "start": 0
1608
+ },
1609
+ {
1610
+ "alpha": 8,
1611
+ "end": 16,
1612
+ "rank": 8,
1613
+ "start": 8
1614
+ }
1615
+ ],
1616
+ "thinker.audio_tower.layers.6.self_attn.v_proj": [
1617
+ {
1618
+ "alpha": 8,
1619
+ "end": 8,
1620
+ "rank": 8,
1621
+ "start": 0
1622
+ },
1623
+ {
1624
+ "alpha": 8,
1625
+ "end": 16,
1626
+ "rank": 8,
1627
+ "start": 8
1628
+ }
1629
+ ],
1630
+ "thinker.audio_tower.layers.7.fc1": [
1631
+ {
1632
+ "alpha": 8,
1633
+ "end": 8,
1634
+ "rank": 8,
1635
+ "start": 0
1636
+ }
1637
+ ],
1638
+ "thinker.audio_tower.layers.7.fc2": [
1639
+ {
1640
+ "alpha": 8,
1641
+ "end": 8,
1642
+ "rank": 8,
1643
+ "start": 0
1644
+ }
1645
+ ],
1646
+ "thinker.audio_tower.layers.7.self_attn.k_proj": [
1647
+ {
1648
+ "alpha": 8,
1649
+ "end": 8,
1650
+ "rank": 8,
1651
+ "start": 0
1652
+ },
1653
+ {
1654
+ "alpha": 8,
1655
+ "end": 16,
1656
+ "rank": 8,
1657
+ "start": 8
1658
+ }
1659
+ ],
1660
+ "thinker.audio_tower.layers.7.self_attn.out_proj": [
1661
+ {
1662
+ "alpha": 8,
1663
+ "end": 8,
1664
+ "rank": 8,
1665
+ "start": 0
1666
+ },
1667
+ {
1668
+ "alpha": 8,
1669
+ "end": 16,
1670
+ "rank": 8,
1671
+ "start": 8
1672
+ }
1673
+ ],
1674
+ "thinker.audio_tower.layers.7.self_attn.q_proj": [
1675
+ {
1676
+ "alpha": 8,
1677
+ "end": 8,
1678
+ "rank": 8,
1679
+ "start": 0
1680
+ },
1681
+ {
1682
+ "alpha": 8,
1683
+ "end": 16,
1684
+ "rank": 8,
1685
+ "start": 8
1686
+ }
1687
+ ],
1688
+ "thinker.audio_tower.layers.7.self_attn.v_proj": [
1689
+ {
1690
+ "alpha": 8,
1691
+ "end": 8,
1692
+ "rank": 8,
1693
+ "start": 0
1694
+ },
1695
+ {
1696
+ "alpha": 8,
1697
+ "end": 16,
1698
+ "rank": 8,
1699
+ "start": 8
1700
+ }
1701
+ ],
1702
+ "thinker.audio_tower.layers.8.fc1": [
1703
+ {
1704
+ "alpha": 8,
1705
+ "end": 8,
1706
+ "rank": 8,
1707
+ "start": 0
1708
+ }
1709
+ ],
1710
+ "thinker.audio_tower.layers.8.fc2": [
1711
+ {
1712
+ "alpha": 8,
1713
+ "end": 8,
1714
+ "rank": 8,
1715
+ "start": 0
1716
+ }
1717
+ ],
1718
+ "thinker.audio_tower.layers.8.self_attn.k_proj": [
1719
+ {
1720
+ "alpha": 8,
1721
+ "end": 8,
1722
+ "rank": 8,
1723
+ "start": 0
1724
+ },
1725
+ {
1726
+ "alpha": 8,
1727
+ "end": 16,
1728
+ "rank": 8,
1729
+ "start": 8
1730
+ }
1731
+ ],
1732
+ "thinker.audio_tower.layers.8.self_attn.out_proj": [
1733
+ {
1734
+ "alpha": 8,
1735
+ "end": 8,
1736
+ "rank": 8,
1737
+ "start": 0
1738
+ },
1739
+ {
1740
+ "alpha": 8,
1741
+ "end": 16,
1742
+ "rank": 8,
1743
+ "start": 8
1744
+ }
1745
+ ],
1746
+ "thinker.audio_tower.layers.8.self_attn.q_proj": [
1747
+ {
1748
+ "alpha": 8,
1749
+ "end": 8,
1750
+ "rank": 8,
1751
+ "start": 0
1752
+ },
1753
+ {
1754
+ "alpha": 8,
1755
+ "end": 16,
1756
+ "rank": 8,
1757
+ "start": 8
1758
+ }
1759
+ ],
1760
+ "thinker.audio_tower.layers.8.self_attn.v_proj": [
1761
+ {
1762
+ "alpha": 8,
1763
+ "end": 8,
1764
+ "rank": 8,
1765
+ "start": 0
1766
+ },
1767
+ {
1768
+ "alpha": 8,
1769
+ "end": 16,
1770
+ "rank": 8,
1771
+ "start": 8
1772
+ }
1773
+ ],
1774
+ "thinker.audio_tower.layers.9.fc1": [
1775
+ {
1776
+ "alpha": 8,
1777
+ "end": 8,
1778
+ "rank": 8,
1779
+ "start": 0
1780
+ }
1781
+ ],
1782
+ "thinker.audio_tower.layers.9.fc2": [
1783
+ {
1784
+ "alpha": 8,
1785
+ "end": 8,
1786
+ "rank": 8,
1787
+ "start": 0
1788
+ }
1789
+ ],
1790
+ "thinker.audio_tower.layers.9.self_attn.k_proj": [
1791
+ {
1792
+ "alpha": 8,
1793
+ "end": 8,
1794
+ "rank": 8,
1795
+ "start": 0
1796
+ },
1797
+ {
1798
+ "alpha": 8,
1799
+ "end": 16,
1800
+ "rank": 8,
1801
+ "start": 8
1802
+ }
1803
+ ],
1804
+ "thinker.audio_tower.layers.9.self_attn.out_proj": [
1805
+ {
1806
+ "alpha": 8,
1807
+ "end": 8,
1808
+ "rank": 8,
1809
+ "start": 0
1810
+ },
1811
+ {
1812
+ "alpha": 8,
1813
+ "end": 16,
1814
+ "rank": 8,
1815
+ "start": 8
1816
+ }
1817
+ ],
1818
+ "thinker.audio_tower.layers.9.self_attn.q_proj": [
1819
+ {
1820
+ "alpha": 8,
1821
+ "end": 8,
1822
+ "rank": 8,
1823
+ "start": 0
1824
+ },
1825
+ {
1826
+ "alpha": 8,
1827
+ "end": 16,
1828
+ "rank": 8,
1829
+ "start": 8
1830
+ }
1831
+ ],
1832
+ "thinker.audio_tower.layers.9.self_attn.v_proj": [
1833
+ {
1834
+ "alpha": 8,
1835
+ "end": 8,
1836
+ "rank": 8,
1837
+ "start": 0
1838
+ },
1839
+ {
1840
+ "alpha": 8,
1841
+ "end": 16,
1842
+ "rank": 8,
1843
+ "start": 8
1844
+ }
1845
+ ],
1846
+ "thinker.audio_tower.proj1": [
1847
+ {
1848
+ "alpha": 8,
1849
+ "end": 8,
1850
+ "rank": 8,
1851
+ "start": 0
1852
+ },
1853
+ {
1854
+ "alpha": 8,
1855
+ "end": 16,
1856
+ "rank": 8,
1857
+ "start": 8
1858
+ }
1859
+ ],
1860
+ "thinker.audio_tower.proj2": [
1861
+ {
1862
+ "alpha": 8,
1863
+ "end": 8,
1864
+ "rank": 8,
1865
+ "start": 0
1866
+ },
1867
+ {
1868
+ "alpha": 8,
1869
+ "end": 16,
1870
+ "rank": 8,
1871
+ "start": 8
1872
+ }
1873
+ ],
1874
+ "thinker.layers.0.mlp.down_proj": [
1875
+ {
1876
+ "alpha": 8,
1877
+ "end": 8,
1878
+ "rank": 8,
1879
+ "start": 0
1880
+ }
1881
+ ],
1882
+ "thinker.layers.0.mlp.gate_proj": [
1883
+ {
1884
+ "alpha": 8,
1885
+ "end": 8,
1886
+ "rank": 8,
1887
+ "start": 0
1888
+ }
1889
+ ],
1890
+ "thinker.layers.0.mlp.up_proj": [
1891
+ {
1892
+ "alpha": 8,
1893
+ "end": 8,
1894
+ "rank": 8,
1895
+ "start": 0
1896
+ }
1897
+ ],
1898
+ "thinker.layers.0.self_attn.k_proj": [
1899
+ {
1900
+ "alpha": 8,
1901
+ "end": 8,
1902
+ "rank": 8,
1903
+ "start": 0
1904
+ }
1905
+ ],
1906
+ "thinker.layers.0.self_attn.o_proj": [
1907
+ {
1908
+ "alpha": 8,
1909
+ "end": 8,
1910
+ "rank": 8,
1911
+ "start": 0
1912
+ }
1913
+ ],
1914
+ "thinker.layers.0.self_attn.q_proj": [
1915
+ {
1916
+ "alpha": 8,
1917
+ "end": 8,
1918
+ "rank": 8,
1919
+ "start": 0
1920
+ }
1921
+ ],
1922
+ "thinker.layers.0.self_attn.v_proj": [
1923
+ {
1924
+ "alpha": 8,
1925
+ "end": 8,
1926
+ "rank": 8,
1927
+ "start": 0
1928
+ }
1929
+ ],
1930
+ "thinker.layers.1.mlp.down_proj": [
1931
+ {
1932
+ "alpha": 8,
1933
+ "end": 8,
1934
+ "rank": 8,
1935
+ "start": 0
1936
+ }
1937
+ ],
1938
+ "thinker.layers.1.mlp.gate_proj": [
1939
+ {
1940
+ "alpha": 8,
1941
+ "end": 8,
1942
+ "rank": 8,
1943
+ "start": 0
1944
+ }
1945
+ ],
1946
+ "thinker.layers.1.mlp.up_proj": [
1947
+ {
1948
+ "alpha": 8,
1949
+ "end": 8,
1950
+ "rank": 8,
1951
+ "start": 0
1952
+ }
1953
+ ],
1954
+ "thinker.layers.1.self_attn.k_proj": [
1955
+ {
1956
+ "alpha": 8,
1957
+ "end": 8,
1958
+ "rank": 8,
1959
+ "start": 0
1960
+ }
1961
+ ],
1962
+ "thinker.layers.1.self_attn.o_proj": [
1963
+ {
1964
+ "alpha": 8,
1965
+ "end": 8,
1966
+ "rank": 8,
1967
+ "start": 0
1968
+ }
1969
+ ],
1970
+ "thinker.layers.1.self_attn.q_proj": [
1971
+ {
1972
+ "alpha": 8,
1973
+ "end": 8,
1974
+ "rank": 8,
1975
+ "start": 0
1976
+ }
1977
+ ],
1978
+ "thinker.layers.1.self_attn.v_proj": [
1979
+ {
1980
+ "alpha": 8,
1981
+ "end": 8,
1982
+ "rank": 8,
1983
+ "start": 0
1984
+ }
1985
+ ],
1986
+ "thinker.layers.10.mlp.down_proj": [
1987
+ {
1988
+ "alpha": 8,
1989
+ "end": 8,
1990
+ "rank": 8,
1991
+ "start": 0
1992
+ }
1993
+ ],
1994
+ "thinker.layers.10.mlp.gate_proj": [
1995
+ {
1996
+ "alpha": 8,
1997
+ "end": 8,
1998
+ "rank": 8,
1999
+ "start": 0
2000
+ }
2001
+ ],
2002
+ "thinker.layers.10.mlp.up_proj": [
2003
+ {
2004
+ "alpha": 8,
2005
+ "end": 8,
2006
+ "rank": 8,
2007
+ "start": 0
2008
+ }
2009
+ ],
2010
+ "thinker.layers.10.self_attn.k_proj": [
2011
+ {
2012
+ "alpha": 8,
2013
+ "end": 8,
2014
+ "rank": 8,
2015
+ "start": 0
2016
+ }
2017
+ ],
2018
+ "thinker.layers.10.self_attn.o_proj": [
2019
+ {
2020
+ "alpha": 8,
2021
+ "end": 8,
2022
+ "rank": 8,
2023
+ "start": 0
2024
+ }
2025
+ ],
2026
+ "thinker.layers.10.self_attn.q_proj": [
2027
+ {
2028
+ "alpha": 8,
2029
+ "end": 8,
2030
+ "rank": 8,
2031
+ "start": 0
2032
+ }
2033
+ ],
2034
+ "thinker.layers.10.self_attn.v_proj": [
2035
+ {
2036
+ "alpha": 8,
2037
+ "end": 8,
2038
+ "rank": 8,
2039
+ "start": 0
2040
+ }
2041
+ ],
2042
+ "thinker.layers.11.mlp.down_proj": [
2043
+ {
2044
+ "alpha": 8,
2045
+ "end": 8,
2046
+ "rank": 8,
2047
+ "start": 0
2048
+ }
2049
+ ],
2050
+ "thinker.layers.11.mlp.gate_proj": [
2051
+ {
2052
+ "alpha": 8,
2053
+ "end": 8,
2054
+ "rank": 8,
2055
+ "start": 0
2056
+ }
2057
+ ],
2058
+ "thinker.layers.11.mlp.up_proj": [
2059
+ {
2060
+ "alpha": 8,
2061
+ "end": 8,
2062
+ "rank": 8,
2063
+ "start": 0
2064
+ }
2065
+ ],
2066
+ "thinker.layers.11.self_attn.k_proj": [
2067
+ {
2068
+ "alpha": 8,
2069
+ "end": 8,
2070
+ "rank": 8,
2071
+ "start": 0
2072
+ }
2073
+ ],
2074
+ "thinker.layers.11.self_attn.o_proj": [
2075
+ {
2076
+ "alpha": 8,
2077
+ "end": 8,
2078
+ "rank": 8,
2079
+ "start": 0
2080
+ }
2081
+ ],
2082
+ "thinker.layers.11.self_attn.q_proj": [
2083
+ {
2084
+ "alpha": 8,
2085
+ "end": 8,
2086
+ "rank": 8,
2087
+ "start": 0
2088
+ }
2089
+ ],
2090
+ "thinker.layers.11.self_attn.v_proj": [
2091
+ {
2092
+ "alpha": 8,
2093
+ "end": 8,
2094
+ "rank": 8,
2095
+ "start": 0
2096
+ }
2097
+ ],
2098
+ "thinker.layers.12.mlp.down_proj": [
2099
+ {
2100
+ "alpha": 8,
2101
+ "end": 8,
2102
+ "rank": 8,
2103
+ "start": 0
2104
+ }
2105
+ ],
2106
+ "thinker.layers.12.mlp.gate_proj": [
2107
+ {
2108
+ "alpha": 8,
2109
+ "end": 8,
2110
+ "rank": 8,
2111
+ "start": 0
2112
+ }
2113
+ ],
2114
+ "thinker.layers.12.mlp.up_proj": [
2115
+ {
2116
+ "alpha": 8,
2117
+ "end": 8,
2118
+ "rank": 8,
2119
+ "start": 0
2120
+ }
2121
+ ],
2122
+ "thinker.layers.12.self_attn.k_proj": [
2123
+ {
2124
+ "alpha": 8,
2125
+ "end": 8,
2126
+ "rank": 8,
2127
+ "start": 0
2128
+ }
2129
+ ],
2130
+ "thinker.layers.12.self_attn.o_proj": [
2131
+ {
2132
+ "alpha": 8,
2133
+ "end": 8,
2134
+ "rank": 8,
2135
+ "start": 0
2136
+ }
2137
+ ],
2138
+ "thinker.layers.12.self_attn.q_proj": [
2139
+ {
2140
+ "alpha": 8,
2141
+ "end": 8,
2142
+ "rank": 8,
2143
+ "start": 0
2144
+ }
2145
+ ],
2146
+ "thinker.layers.12.self_attn.v_proj": [
2147
+ {
2148
+ "alpha": 8,
2149
+ "end": 8,
2150
+ "rank": 8,
2151
+ "start": 0
2152
+ }
2153
+ ],
2154
+ "thinker.layers.13.mlp.down_proj": [
2155
+ {
2156
+ "alpha": 8,
2157
+ "end": 8,
2158
+ "rank": 8,
2159
+ "start": 0
2160
+ }
2161
+ ],
2162
+ "thinker.layers.13.mlp.gate_proj": [
2163
+ {
2164
+ "alpha": 8,
2165
+ "end": 8,
2166
+ "rank": 8,
2167
+ "start": 0
2168
+ }
2169
+ ],
2170
+ "thinker.layers.13.mlp.up_proj": [
2171
+ {
2172
+ "alpha": 8,
2173
+ "end": 8,
2174
+ "rank": 8,
2175
+ "start": 0
2176
+ }
2177
+ ],
2178
+ "thinker.layers.13.self_attn.k_proj": [
2179
+ {
2180
+ "alpha": 8,
2181
+ "end": 8,
2182
+ "rank": 8,
2183
+ "start": 0
2184
+ }
2185
+ ],
2186
+ "thinker.layers.13.self_attn.o_proj": [
2187
+ {
2188
+ "alpha": 8,
2189
+ "end": 8,
2190
+ "rank": 8,
2191
+ "start": 0
2192
+ }
2193
+ ],
2194
+ "thinker.layers.13.self_attn.q_proj": [
2195
+ {
2196
+ "alpha": 8,
2197
+ "end": 8,
2198
+ "rank": 8,
2199
+ "start": 0
2200
+ }
2201
+ ],
2202
+ "thinker.layers.13.self_attn.v_proj": [
2203
+ {
2204
+ "alpha": 8,
2205
+ "end": 8,
2206
+ "rank": 8,
2207
+ "start": 0
2208
+ }
2209
+ ],
2210
+ "thinker.layers.14.mlp.down_proj": [
2211
+ {
2212
+ "alpha": 8,
2213
+ "end": 8,
2214
+ "rank": 8,
2215
+ "start": 0
2216
+ }
2217
+ ],
2218
+ "thinker.layers.14.mlp.gate_proj": [
2219
+ {
2220
+ "alpha": 8,
2221
+ "end": 8,
2222
+ "rank": 8,
2223
+ "start": 0
2224
+ }
2225
+ ],
2226
+ "thinker.layers.14.mlp.up_proj": [
2227
+ {
2228
+ "alpha": 8,
2229
+ "end": 8,
2230
+ "rank": 8,
2231
+ "start": 0
2232
+ }
2233
+ ],
2234
+ "thinker.layers.14.self_attn.k_proj": [
2235
+ {
2236
+ "alpha": 8,
2237
+ "end": 8,
2238
+ "rank": 8,
2239
+ "start": 0
2240
+ }
2241
+ ],
2242
+ "thinker.layers.14.self_attn.o_proj": [
2243
+ {
2244
+ "alpha": 8,
2245
+ "end": 8,
2246
+ "rank": 8,
2247
+ "start": 0
2248
+ }
2249
+ ],
2250
+ "thinker.layers.14.self_attn.q_proj": [
2251
+ {
2252
+ "alpha": 8,
2253
+ "end": 8,
2254
+ "rank": 8,
2255
+ "start": 0
2256
+ }
2257
+ ],
2258
+ "thinker.layers.14.self_attn.v_proj": [
2259
+ {
2260
+ "alpha": 8,
2261
+ "end": 8,
2262
+ "rank": 8,
2263
+ "start": 0
2264
+ }
2265
+ ],
2266
+ "thinker.layers.15.mlp.down_proj": [
2267
+ {
2268
+ "alpha": 8,
2269
+ "end": 8,
2270
+ "rank": 8,
2271
+ "start": 0
2272
+ }
2273
+ ],
2274
+ "thinker.layers.15.mlp.gate_proj": [
2275
+ {
2276
+ "alpha": 8,
2277
+ "end": 8,
2278
+ "rank": 8,
2279
+ "start": 0
2280
+ }
2281
+ ],
2282
+ "thinker.layers.15.mlp.up_proj": [
2283
+ {
2284
+ "alpha": 8,
2285
+ "end": 8,
2286
+ "rank": 8,
2287
+ "start": 0
2288
+ }
2289
+ ],
2290
+ "thinker.layers.15.self_attn.k_proj": [
2291
+ {
2292
+ "alpha": 8,
2293
+ "end": 8,
2294
+ "rank": 8,
2295
+ "start": 0
2296
+ }
2297
+ ],
2298
+ "thinker.layers.15.self_attn.o_proj": [
2299
+ {
2300
+ "alpha": 8,
2301
+ "end": 8,
2302
+ "rank": 8,
2303
+ "start": 0
2304
+ }
2305
+ ],
2306
+ "thinker.layers.15.self_attn.q_proj": [
2307
+ {
2308
+ "alpha": 8,
2309
+ "end": 8,
2310
+ "rank": 8,
2311
+ "start": 0
2312
+ }
2313
+ ],
2314
+ "thinker.layers.15.self_attn.v_proj": [
2315
+ {
2316
+ "alpha": 8,
2317
+ "end": 8,
2318
+ "rank": 8,
2319
+ "start": 0
2320
+ }
2321
+ ],
2322
+ "thinker.layers.16.mlp.down_proj": [
2323
+ {
2324
+ "alpha": 8,
2325
+ "end": 8,
2326
+ "rank": 8,
2327
+ "start": 0
2328
+ }
2329
+ ],
2330
+ "thinker.layers.16.mlp.gate_proj": [
2331
+ {
2332
+ "alpha": 8,
2333
+ "end": 8,
2334
+ "rank": 8,
2335
+ "start": 0
2336
+ }
2337
+ ],
2338
+ "thinker.layers.16.mlp.up_proj": [
2339
+ {
2340
+ "alpha": 8,
2341
+ "end": 8,
2342
+ "rank": 8,
2343
+ "start": 0
2344
+ }
2345
+ ],
2346
+ "thinker.layers.16.self_attn.k_proj": [
2347
+ {
2348
+ "alpha": 8,
2349
+ "end": 8,
2350
+ "rank": 8,
2351
+ "start": 0
2352
+ }
2353
+ ],
2354
+ "thinker.layers.16.self_attn.o_proj": [
2355
+ {
2356
+ "alpha": 8,
2357
+ "end": 8,
2358
+ "rank": 8,
2359
+ "start": 0
2360
+ }
2361
+ ],
2362
+ "thinker.layers.16.self_attn.q_proj": [
2363
+ {
2364
+ "alpha": 8,
2365
+ "end": 8,
2366
+ "rank": 8,
2367
+ "start": 0
2368
+ }
2369
+ ],
2370
+ "thinker.layers.16.self_attn.v_proj": [
2371
+ {
2372
+ "alpha": 8,
2373
+ "end": 8,
2374
+ "rank": 8,
2375
+ "start": 0
2376
+ }
2377
+ ],
2378
+ "thinker.layers.17.mlp.down_proj": [
2379
+ {
2380
+ "alpha": 8,
2381
+ "end": 8,
2382
+ "rank": 8,
2383
+ "start": 0
2384
+ }
2385
+ ],
2386
+ "thinker.layers.17.mlp.gate_proj": [
2387
+ {
2388
+ "alpha": 8,
2389
+ "end": 8,
2390
+ "rank": 8,
2391
+ "start": 0
2392
+ }
2393
+ ],
2394
+ "thinker.layers.17.mlp.up_proj": [
2395
+ {
2396
+ "alpha": 8,
2397
+ "end": 8,
2398
+ "rank": 8,
2399
+ "start": 0
2400
+ }
2401
+ ],
2402
+ "thinker.layers.17.self_attn.k_proj": [
2403
+ {
2404
+ "alpha": 8,
2405
+ "end": 8,
2406
+ "rank": 8,
2407
+ "start": 0
2408
+ }
2409
+ ],
2410
+ "thinker.layers.17.self_attn.o_proj": [
2411
+ {
2412
+ "alpha": 8,
2413
+ "end": 8,
2414
+ "rank": 8,
2415
+ "start": 0
2416
+ }
2417
+ ],
2418
+ "thinker.layers.17.self_attn.q_proj": [
2419
+ {
2420
+ "alpha": 8,
2421
+ "end": 8,
2422
+ "rank": 8,
2423
+ "start": 0
2424
+ }
2425
+ ],
2426
+ "thinker.layers.17.self_attn.v_proj": [
2427
+ {
2428
+ "alpha": 8,
2429
+ "end": 8,
2430
+ "rank": 8,
2431
+ "start": 0
2432
+ }
2433
+ ],
2434
+ "thinker.layers.18.mlp.down_proj": [
2435
+ {
2436
+ "alpha": 8,
2437
+ "end": 8,
2438
+ "rank": 8,
2439
+ "start": 0
2440
+ }
2441
+ ],
2442
+ "thinker.layers.18.mlp.gate_proj": [
2443
+ {
2444
+ "alpha": 8,
2445
+ "end": 8,
2446
+ "rank": 8,
2447
+ "start": 0
2448
+ }
2449
+ ],
2450
+ "thinker.layers.18.mlp.up_proj": [
2451
+ {
2452
+ "alpha": 8,
2453
+ "end": 8,
2454
+ "rank": 8,
2455
+ "start": 0
2456
+ }
2457
+ ],
2458
+ "thinker.layers.18.self_attn.k_proj": [
2459
+ {
2460
+ "alpha": 8,
2461
+ "end": 8,
2462
+ "rank": 8,
2463
+ "start": 0
2464
+ }
2465
+ ],
2466
+ "thinker.layers.18.self_attn.o_proj": [
2467
+ {
2468
+ "alpha": 8,
2469
+ "end": 8,
2470
+ "rank": 8,
2471
+ "start": 0
2472
+ }
2473
+ ],
2474
+ "thinker.layers.18.self_attn.q_proj": [
2475
+ {
2476
+ "alpha": 8,
2477
+ "end": 8,
2478
+ "rank": 8,
2479
+ "start": 0
2480
+ }
2481
+ ],
2482
+ "thinker.layers.18.self_attn.v_proj": [
2483
+ {
2484
+ "alpha": 8,
2485
+ "end": 8,
2486
+ "rank": 8,
2487
+ "start": 0
2488
+ }
2489
+ ],
2490
+ "thinker.layers.19.mlp.down_proj": [
2491
+ {
2492
+ "alpha": 8,
2493
+ "end": 8,
2494
+ "rank": 8,
2495
+ "start": 0
2496
+ }
2497
+ ],
2498
+ "thinker.layers.19.mlp.gate_proj": [
2499
+ {
2500
+ "alpha": 8,
2501
+ "end": 8,
2502
+ "rank": 8,
2503
+ "start": 0
2504
+ }
2505
+ ],
2506
+ "thinker.layers.19.mlp.up_proj": [
2507
+ {
2508
+ "alpha": 8,
2509
+ "end": 8,
2510
+ "rank": 8,
2511
+ "start": 0
2512
+ }
2513
+ ],
2514
+ "thinker.layers.19.self_attn.k_proj": [
2515
+ {
2516
+ "alpha": 8,
2517
+ "end": 8,
2518
+ "rank": 8,
2519
+ "start": 0
2520
+ }
2521
+ ],
2522
+ "thinker.layers.19.self_attn.o_proj": [
2523
+ {
2524
+ "alpha": 8,
2525
+ "end": 8,
2526
+ "rank": 8,
2527
+ "start": 0
2528
+ }
2529
+ ],
2530
+ "thinker.layers.19.self_attn.q_proj": [
2531
+ {
2532
+ "alpha": 8,
2533
+ "end": 8,
2534
+ "rank": 8,
2535
+ "start": 0
2536
+ }
2537
+ ],
2538
+ "thinker.layers.19.self_attn.v_proj": [
2539
+ {
2540
+ "alpha": 8,
2541
+ "end": 8,
2542
+ "rank": 8,
2543
+ "start": 0
2544
+ }
2545
+ ],
2546
+ "thinker.layers.2.mlp.down_proj": [
2547
+ {
2548
+ "alpha": 8,
2549
+ "end": 8,
2550
+ "rank": 8,
2551
+ "start": 0
2552
+ }
2553
+ ],
2554
+ "thinker.layers.2.mlp.gate_proj": [
2555
+ {
2556
+ "alpha": 8,
2557
+ "end": 8,
2558
+ "rank": 8,
2559
+ "start": 0
2560
+ }
2561
+ ],
2562
+ "thinker.layers.2.mlp.up_proj": [
2563
+ {
2564
+ "alpha": 8,
2565
+ "end": 8,
2566
+ "rank": 8,
2567
+ "start": 0
2568
+ }
2569
+ ],
2570
+ "thinker.layers.2.self_attn.k_proj": [
2571
+ {
2572
+ "alpha": 8,
2573
+ "end": 8,
2574
+ "rank": 8,
2575
+ "start": 0
2576
+ }
2577
+ ],
2578
+ "thinker.layers.2.self_attn.o_proj": [
2579
+ {
2580
+ "alpha": 8,
2581
+ "end": 8,
2582
+ "rank": 8,
2583
+ "start": 0
2584
+ }
2585
+ ],
2586
+ "thinker.layers.2.self_attn.q_proj": [
2587
+ {
2588
+ "alpha": 8,
2589
+ "end": 8,
2590
+ "rank": 8,
2591
+ "start": 0
2592
+ }
2593
+ ],
2594
+ "thinker.layers.2.self_attn.v_proj": [
2595
+ {
2596
+ "alpha": 8,
2597
+ "end": 8,
2598
+ "rank": 8,
2599
+ "start": 0
2600
+ }
2601
+ ],
2602
+ "thinker.layers.20.mlp.down_proj": [
2603
+ {
2604
+ "alpha": 8,
2605
+ "end": 8,
2606
+ "rank": 8,
2607
+ "start": 0
2608
+ }
2609
+ ],
2610
+ "thinker.layers.20.mlp.gate_proj": [
2611
+ {
2612
+ "alpha": 8,
2613
+ "end": 8,
2614
+ "rank": 8,
2615
+ "start": 0
2616
+ }
2617
+ ],
2618
+ "thinker.layers.20.mlp.up_proj": [
2619
+ {
2620
+ "alpha": 8,
2621
+ "end": 8,
2622
+ "rank": 8,
2623
+ "start": 0
2624
+ }
2625
+ ],
2626
+ "thinker.layers.20.self_attn.k_proj": [
2627
+ {
2628
+ "alpha": 8,
2629
+ "end": 8,
2630
+ "rank": 8,
2631
+ "start": 0
2632
+ }
2633
+ ],
2634
+ "thinker.layers.20.self_attn.o_proj": [
2635
+ {
2636
+ "alpha": 8,
2637
+ "end": 8,
2638
+ "rank": 8,
2639
+ "start": 0
2640
+ }
2641
+ ],
2642
+ "thinker.layers.20.self_attn.q_proj": [
2643
+ {
2644
+ "alpha": 8,
2645
+ "end": 8,
2646
+ "rank": 8,
2647
+ "start": 0
2648
+ }
2649
+ ],
2650
+ "thinker.layers.20.self_attn.v_proj": [
2651
+ {
2652
+ "alpha": 8,
2653
+ "end": 8,
2654
+ "rank": 8,
2655
+ "start": 0
2656
+ }
2657
+ ],
2658
+ "thinker.layers.21.mlp.down_proj": [
2659
+ {
2660
+ "alpha": 8,
2661
+ "end": 8,
2662
+ "rank": 8,
2663
+ "start": 0
2664
+ }
2665
+ ],
2666
+ "thinker.layers.21.mlp.gate_proj": [
2667
+ {
2668
+ "alpha": 8,
2669
+ "end": 8,
2670
+ "rank": 8,
2671
+ "start": 0
2672
+ }
2673
+ ],
2674
+ "thinker.layers.21.mlp.up_proj": [
2675
+ {
2676
+ "alpha": 8,
2677
+ "end": 8,
2678
+ "rank": 8,
2679
+ "start": 0
2680
+ }
2681
+ ],
2682
+ "thinker.layers.21.self_attn.k_proj": [
2683
+ {
2684
+ "alpha": 8,
2685
+ "end": 8,
2686
+ "rank": 8,
2687
+ "start": 0
2688
+ }
2689
+ ],
2690
+ "thinker.layers.21.self_attn.o_proj": [
2691
+ {
2692
+ "alpha": 8,
2693
+ "end": 8,
2694
+ "rank": 8,
2695
+ "start": 0
2696
+ }
2697
+ ],
2698
+ "thinker.layers.21.self_attn.q_proj": [
2699
+ {
2700
+ "alpha": 8,
2701
+ "end": 8,
2702
+ "rank": 8,
2703
+ "start": 0
2704
+ }
2705
+ ],
2706
+ "thinker.layers.21.self_attn.v_proj": [
2707
+ {
2708
+ "alpha": 8,
2709
+ "end": 8,
2710
+ "rank": 8,
2711
+ "start": 0
2712
+ }
2713
+ ],
2714
+ "thinker.layers.22.mlp.down_proj": [
2715
+ {
2716
+ "alpha": 8,
2717
+ "end": 8,
2718
+ "rank": 8,
2719
+ "start": 0
2720
+ }
2721
+ ],
2722
+ "thinker.layers.22.mlp.gate_proj": [
2723
+ {
2724
+ "alpha": 8,
2725
+ "end": 8,
2726
+ "rank": 8,
2727
+ "start": 0
2728
+ }
2729
+ ],
2730
+ "thinker.layers.22.mlp.up_proj": [
2731
+ {
2732
+ "alpha": 8,
2733
+ "end": 8,
2734
+ "rank": 8,
2735
+ "start": 0
2736
+ }
2737
+ ],
2738
+ "thinker.layers.22.self_attn.k_proj": [
2739
+ {
2740
+ "alpha": 8,
2741
+ "end": 8,
2742
+ "rank": 8,
2743
+ "start": 0
2744
+ }
2745
+ ],
2746
+ "thinker.layers.22.self_attn.o_proj": [
2747
+ {
2748
+ "alpha": 8,
2749
+ "end": 8,
2750
+ "rank": 8,
2751
+ "start": 0
2752
+ }
2753
+ ],
2754
+ "thinker.layers.22.self_attn.q_proj": [
2755
+ {
2756
+ "alpha": 8,
2757
+ "end": 8,
2758
+ "rank": 8,
2759
+ "start": 0
2760
+ }
2761
+ ],
2762
+ "thinker.layers.22.self_attn.v_proj": [
2763
+ {
2764
+ "alpha": 8,
2765
+ "end": 8,
2766
+ "rank": 8,
2767
+ "start": 0
2768
+ }
2769
+ ],
2770
+ "thinker.layers.23.mlp.down_proj": [
2771
+ {
2772
+ "alpha": 8,
2773
+ "end": 8,
2774
+ "rank": 8,
2775
+ "start": 0
2776
+ }
2777
+ ],
2778
+ "thinker.layers.23.mlp.gate_proj": [
2779
+ {
2780
+ "alpha": 8,
2781
+ "end": 8,
2782
+ "rank": 8,
2783
+ "start": 0
2784
+ }
2785
+ ],
2786
+ "thinker.layers.23.mlp.up_proj": [
2787
+ {
2788
+ "alpha": 8,
2789
+ "end": 8,
2790
+ "rank": 8,
2791
+ "start": 0
2792
+ }
2793
+ ],
2794
+ "thinker.layers.23.self_attn.k_proj": [
2795
+ {
2796
+ "alpha": 8,
2797
+ "end": 8,
2798
+ "rank": 8,
2799
+ "start": 0
2800
+ }
2801
+ ],
2802
+ "thinker.layers.23.self_attn.o_proj": [
2803
+ {
2804
+ "alpha": 8,
2805
+ "end": 8,
2806
+ "rank": 8,
2807
+ "start": 0
2808
+ }
2809
+ ],
2810
+ "thinker.layers.23.self_attn.q_proj": [
2811
+ {
2812
+ "alpha": 8,
2813
+ "end": 8,
2814
+ "rank": 8,
2815
+ "start": 0
2816
+ }
2817
+ ],
2818
+ "thinker.layers.23.self_attn.v_proj": [
2819
+ {
2820
+ "alpha": 8,
2821
+ "end": 8,
2822
+ "rank": 8,
2823
+ "start": 0
2824
+ }
2825
+ ],
2826
+ "thinker.layers.24.mlp.down_proj": [
2827
+ {
2828
+ "alpha": 8,
2829
+ "end": 8,
2830
+ "rank": 8,
2831
+ "start": 0
2832
+ }
2833
+ ],
2834
+ "thinker.layers.24.mlp.gate_proj": [
2835
+ {
2836
+ "alpha": 8,
2837
+ "end": 8,
2838
+ "rank": 8,
2839
+ "start": 0
2840
+ }
2841
+ ],
2842
+ "thinker.layers.24.mlp.up_proj": [
2843
+ {
2844
+ "alpha": 8,
2845
+ "end": 8,
2846
+ "rank": 8,
2847
+ "start": 0
2848
+ }
2849
+ ],
2850
+ "thinker.layers.24.self_attn.k_proj": [
2851
+ {
2852
+ "alpha": 8,
2853
+ "end": 8,
2854
+ "rank": 8,
2855
+ "start": 0
2856
+ }
2857
+ ],
2858
+ "thinker.layers.24.self_attn.o_proj": [
2859
+ {
2860
+ "alpha": 8,
2861
+ "end": 8,
2862
+ "rank": 8,
2863
+ "start": 0
2864
+ }
2865
+ ],
2866
+ "thinker.layers.24.self_attn.q_proj": [
2867
+ {
2868
+ "alpha": 8,
2869
+ "end": 8,
2870
+ "rank": 8,
2871
+ "start": 0
2872
+ }
2873
+ ],
2874
+ "thinker.layers.24.self_attn.v_proj": [
2875
+ {
2876
+ "alpha": 8,
2877
+ "end": 8,
2878
+ "rank": 8,
2879
+ "start": 0
2880
+ }
2881
+ ],
2882
+ "thinker.layers.25.mlp.down_proj": [
2883
+ {
2884
+ "alpha": 8,
2885
+ "end": 8,
2886
+ "rank": 8,
2887
+ "start": 0
2888
+ }
2889
+ ],
2890
+ "thinker.layers.25.mlp.gate_proj": [
2891
+ {
2892
+ "alpha": 8,
2893
+ "end": 8,
2894
+ "rank": 8,
2895
+ "start": 0
2896
+ }
2897
+ ],
2898
+ "thinker.layers.25.mlp.up_proj": [
2899
+ {
2900
+ "alpha": 8,
2901
+ "end": 8,
2902
+ "rank": 8,
2903
+ "start": 0
2904
+ }
2905
+ ],
2906
+ "thinker.layers.25.self_attn.k_proj": [
2907
+ {
2908
+ "alpha": 8,
2909
+ "end": 8,
2910
+ "rank": 8,
2911
+ "start": 0
2912
+ }
2913
+ ],
2914
+ "thinker.layers.25.self_attn.o_proj": [
2915
+ {
2916
+ "alpha": 8,
2917
+ "end": 8,
2918
+ "rank": 8,
2919
+ "start": 0
2920
+ }
2921
+ ],
2922
+ "thinker.layers.25.self_attn.q_proj": [
2923
+ {
2924
+ "alpha": 8,
2925
+ "end": 8,
2926
+ "rank": 8,
2927
+ "start": 0
2928
+ }
2929
+ ],
2930
+ "thinker.layers.25.self_attn.v_proj": [
2931
+ {
2932
+ "alpha": 8,
2933
+ "end": 8,
2934
+ "rank": 8,
2935
+ "start": 0
2936
+ }
2937
+ ],
2938
+ "thinker.layers.26.mlp.down_proj": [
2939
+ {
2940
+ "alpha": 8,
2941
+ "end": 8,
2942
+ "rank": 8,
2943
+ "start": 0
2944
+ }
2945
+ ],
2946
+ "thinker.layers.26.mlp.gate_proj": [
2947
+ {
2948
+ "alpha": 8,
2949
+ "end": 8,
2950
+ "rank": 8,
2951
+ "start": 0
2952
+ }
2953
+ ],
2954
+ "thinker.layers.26.mlp.up_proj": [
2955
+ {
2956
+ "alpha": 8,
2957
+ "end": 8,
2958
+ "rank": 8,
2959
+ "start": 0
2960
+ }
2961
+ ],
2962
+ "thinker.layers.26.self_attn.k_proj": [
2963
+ {
2964
+ "alpha": 8,
2965
+ "end": 8,
2966
+ "rank": 8,
2967
+ "start": 0
2968
+ }
2969
+ ],
2970
+ "thinker.layers.26.self_attn.o_proj": [
2971
+ {
2972
+ "alpha": 8,
2973
+ "end": 8,
2974
+ "rank": 8,
2975
+ "start": 0
2976
+ }
2977
+ ],
2978
+ "thinker.layers.26.self_attn.q_proj": [
2979
+ {
2980
+ "alpha": 8,
2981
+ "end": 8,
2982
+ "rank": 8,
2983
+ "start": 0
2984
+ }
2985
+ ],
2986
+ "thinker.layers.26.self_attn.v_proj": [
2987
+ {
2988
+ "alpha": 8,
2989
+ "end": 8,
2990
+ "rank": 8,
2991
+ "start": 0
2992
+ }
2993
+ ],
2994
+ "thinker.layers.27.mlp.down_proj": [
2995
+ {
2996
+ "alpha": 8,
2997
+ "end": 8,
2998
+ "rank": 8,
2999
+ "start": 0
3000
+ }
3001
+ ],
3002
+ "thinker.layers.27.mlp.gate_proj": [
3003
+ {
3004
+ "alpha": 8,
3005
+ "end": 8,
3006
+ "rank": 8,
3007
+ "start": 0
3008
+ }
3009
+ ],
3010
+ "thinker.layers.27.mlp.up_proj": [
3011
+ {
3012
+ "alpha": 8,
3013
+ "end": 8,
3014
+ "rank": 8,
3015
+ "start": 0
3016
+ }
3017
+ ],
3018
+ "thinker.layers.27.self_attn.k_proj": [
3019
+ {
3020
+ "alpha": 8,
3021
+ "end": 8,
3022
+ "rank": 8,
3023
+ "start": 0
3024
+ }
3025
+ ],
3026
+ "thinker.layers.27.self_attn.o_proj": [
3027
+ {
3028
+ "alpha": 8,
3029
+ "end": 8,
3030
+ "rank": 8,
3031
+ "start": 0
3032
+ }
3033
+ ],
3034
+ "thinker.layers.27.self_attn.q_proj": [
3035
+ {
3036
+ "alpha": 8,
3037
+ "end": 8,
3038
+ "rank": 8,
3039
+ "start": 0
3040
+ }
3041
+ ],
3042
+ "thinker.layers.27.self_attn.v_proj": [
3043
+ {
3044
+ "alpha": 8,
3045
+ "end": 8,
3046
+ "rank": 8,
3047
+ "start": 0
3048
+ }
3049
+ ],
3050
+ "thinker.layers.3.mlp.down_proj": [
3051
+ {
3052
+ "alpha": 8,
3053
+ "end": 8,
3054
+ "rank": 8,
3055
+ "start": 0
3056
+ }
3057
+ ],
3058
+ "thinker.layers.3.mlp.gate_proj": [
3059
+ {
3060
+ "alpha": 8,
3061
+ "end": 8,
3062
+ "rank": 8,
3063
+ "start": 0
3064
+ }
3065
+ ],
3066
+ "thinker.layers.3.mlp.up_proj": [
3067
+ {
3068
+ "alpha": 8,
3069
+ "end": 8,
3070
+ "rank": 8,
3071
+ "start": 0
3072
+ }
3073
+ ],
3074
+ "thinker.layers.3.self_attn.k_proj": [
3075
+ {
3076
+ "alpha": 8,
3077
+ "end": 8,
3078
+ "rank": 8,
3079
+ "start": 0
3080
+ }
3081
+ ],
3082
+ "thinker.layers.3.self_attn.o_proj": [
3083
+ {
3084
+ "alpha": 8,
3085
+ "end": 8,
3086
+ "rank": 8,
3087
+ "start": 0
3088
+ }
3089
+ ],
3090
+ "thinker.layers.3.self_attn.q_proj": [
3091
+ {
3092
+ "alpha": 8,
3093
+ "end": 8,
3094
+ "rank": 8,
3095
+ "start": 0
3096
+ }
3097
+ ],
3098
+ "thinker.layers.3.self_attn.v_proj": [
3099
+ {
3100
+ "alpha": 8,
3101
+ "end": 8,
3102
+ "rank": 8,
3103
+ "start": 0
3104
+ }
3105
+ ],
3106
+ "thinker.layers.4.mlp.down_proj": [
3107
+ {
3108
+ "alpha": 8,
3109
+ "end": 8,
3110
+ "rank": 8,
3111
+ "start": 0
3112
+ }
3113
+ ],
3114
+ "thinker.layers.4.mlp.gate_proj": [
3115
+ {
3116
+ "alpha": 8,
3117
+ "end": 8,
3118
+ "rank": 8,
3119
+ "start": 0
3120
+ }
3121
+ ],
3122
+ "thinker.layers.4.mlp.up_proj": [
3123
+ {
3124
+ "alpha": 8,
3125
+ "end": 8,
3126
+ "rank": 8,
3127
+ "start": 0
3128
+ }
3129
+ ],
3130
+ "thinker.layers.4.self_attn.k_proj": [
3131
+ {
3132
+ "alpha": 8,
3133
+ "end": 8,
3134
+ "rank": 8,
3135
+ "start": 0
3136
+ }
3137
+ ],
3138
+ "thinker.layers.4.self_attn.o_proj": [
3139
+ {
3140
+ "alpha": 8,
3141
+ "end": 8,
3142
+ "rank": 8,
3143
+ "start": 0
3144
+ }
3145
+ ],
3146
+ "thinker.layers.4.self_attn.q_proj": [
3147
+ {
3148
+ "alpha": 8,
3149
+ "end": 8,
3150
+ "rank": 8,
3151
+ "start": 0
3152
+ }
3153
+ ],
3154
+ "thinker.layers.4.self_attn.v_proj": [
3155
+ {
3156
+ "alpha": 8,
3157
+ "end": 8,
3158
+ "rank": 8,
3159
+ "start": 0
3160
+ }
3161
+ ],
3162
+ "thinker.layers.5.mlp.down_proj": [
3163
+ {
3164
+ "alpha": 8,
3165
+ "end": 8,
3166
+ "rank": 8,
3167
+ "start": 0
3168
+ }
3169
+ ],
3170
+ "thinker.layers.5.mlp.gate_proj": [
3171
+ {
3172
+ "alpha": 8,
3173
+ "end": 8,
3174
+ "rank": 8,
3175
+ "start": 0
3176
+ }
3177
+ ],
3178
+ "thinker.layers.5.mlp.up_proj": [
3179
+ {
3180
+ "alpha": 8,
3181
+ "end": 8,
3182
+ "rank": 8,
3183
+ "start": 0
3184
+ }
3185
+ ],
3186
+ "thinker.layers.5.self_attn.k_proj": [
3187
+ {
3188
+ "alpha": 8,
3189
+ "end": 8,
3190
+ "rank": 8,
3191
+ "start": 0
3192
+ }
3193
+ ],
3194
+ "thinker.layers.5.self_attn.o_proj": [
3195
+ {
3196
+ "alpha": 8,
3197
+ "end": 8,
3198
+ "rank": 8,
3199
+ "start": 0
3200
+ }
3201
+ ],
3202
+ "thinker.layers.5.self_attn.q_proj": [
3203
+ {
3204
+ "alpha": 8,
3205
+ "end": 8,
3206
+ "rank": 8,
3207
+ "start": 0
3208
+ }
3209
+ ],
3210
+ "thinker.layers.5.self_attn.v_proj": [
3211
+ {
3212
+ "alpha": 8,
3213
+ "end": 8,
3214
+ "rank": 8,
3215
+ "start": 0
3216
+ }
3217
+ ],
3218
+ "thinker.layers.6.mlp.down_proj": [
3219
+ {
3220
+ "alpha": 8,
3221
+ "end": 8,
3222
+ "rank": 8,
3223
+ "start": 0
3224
+ }
3225
+ ],
3226
+ "thinker.layers.6.mlp.gate_proj": [
3227
+ {
3228
+ "alpha": 8,
3229
+ "end": 8,
3230
+ "rank": 8,
3231
+ "start": 0
3232
+ }
3233
+ ],
3234
+ "thinker.layers.6.mlp.up_proj": [
3235
+ {
3236
+ "alpha": 8,
3237
+ "end": 8,
3238
+ "rank": 8,
3239
+ "start": 0
3240
+ }
3241
+ ],
3242
+ "thinker.layers.6.self_attn.k_proj": [
3243
+ {
3244
+ "alpha": 8,
3245
+ "end": 8,
3246
+ "rank": 8,
3247
+ "start": 0
3248
+ }
3249
+ ],
3250
+ "thinker.layers.6.self_attn.o_proj": [
3251
+ {
3252
+ "alpha": 8,
3253
+ "end": 8,
3254
+ "rank": 8,
3255
+ "start": 0
3256
+ }
3257
+ ],
3258
+ "thinker.layers.6.self_attn.q_proj": [
3259
+ {
3260
+ "alpha": 8,
3261
+ "end": 8,
3262
+ "rank": 8,
3263
+ "start": 0
3264
+ }
3265
+ ],
3266
+ "thinker.layers.6.self_attn.v_proj": [
3267
+ {
3268
+ "alpha": 8,
3269
+ "end": 8,
3270
+ "rank": 8,
3271
+ "start": 0
3272
+ }
3273
+ ],
3274
+ "thinker.layers.7.mlp.down_proj": [
3275
+ {
3276
+ "alpha": 8,
3277
+ "end": 8,
3278
+ "rank": 8,
3279
+ "start": 0
3280
+ }
3281
+ ],
3282
+ "thinker.layers.7.mlp.gate_proj": [
3283
+ {
3284
+ "alpha": 8,
3285
+ "end": 8,
3286
+ "rank": 8,
3287
+ "start": 0
3288
+ }
3289
+ ],
3290
+ "thinker.layers.7.mlp.up_proj": [
3291
+ {
3292
+ "alpha": 8,
3293
+ "end": 8,
3294
+ "rank": 8,
3295
+ "start": 0
3296
+ }
3297
+ ],
3298
+ "thinker.layers.7.self_attn.k_proj": [
3299
+ {
3300
+ "alpha": 8,
3301
+ "end": 8,
3302
+ "rank": 8,
3303
+ "start": 0
3304
+ }
3305
+ ],
3306
+ "thinker.layers.7.self_attn.o_proj": [
3307
+ {
3308
+ "alpha": 8,
3309
+ "end": 8,
3310
+ "rank": 8,
3311
+ "start": 0
3312
+ }
3313
+ ],
3314
+ "thinker.layers.7.self_attn.q_proj": [
3315
+ {
3316
+ "alpha": 8,
3317
+ "end": 8,
3318
+ "rank": 8,
3319
+ "start": 0
3320
+ }
3321
+ ],
3322
+ "thinker.layers.7.self_attn.v_proj": [
3323
+ {
3324
+ "alpha": 8,
3325
+ "end": 8,
3326
+ "rank": 8,
3327
+ "start": 0
3328
+ }
3329
+ ],
3330
+ "thinker.layers.8.mlp.down_proj": [
3331
+ {
3332
+ "alpha": 8,
3333
+ "end": 8,
3334
+ "rank": 8,
3335
+ "start": 0
3336
+ }
3337
+ ],
3338
+ "thinker.layers.8.mlp.gate_proj": [
3339
+ {
3340
+ "alpha": 8,
3341
+ "end": 8,
3342
+ "rank": 8,
3343
+ "start": 0
3344
+ }
3345
+ ],
3346
+ "thinker.layers.8.mlp.up_proj": [
3347
+ {
3348
+ "alpha": 8,
3349
+ "end": 8,
3350
+ "rank": 8,
3351
+ "start": 0
3352
+ }
3353
+ ],
3354
+ "thinker.layers.8.self_attn.k_proj": [
3355
+ {
3356
+ "alpha": 8,
3357
+ "end": 8,
3358
+ "rank": 8,
3359
+ "start": 0
3360
+ }
3361
+ ],
3362
+ "thinker.layers.8.self_attn.o_proj": [
3363
+ {
3364
+ "alpha": 8,
3365
+ "end": 8,
3366
+ "rank": 8,
3367
+ "start": 0
3368
+ }
3369
+ ],
3370
+ "thinker.layers.8.self_attn.q_proj": [
3371
+ {
3372
+ "alpha": 8,
3373
+ "end": 8,
3374
+ "rank": 8,
3375
+ "start": 0
3376
+ }
3377
+ ],
3378
+ "thinker.layers.8.self_attn.v_proj": [
3379
+ {
3380
+ "alpha": 8,
3381
+ "end": 8,
3382
+ "rank": 8,
3383
+ "start": 0
3384
+ }
3385
+ ],
3386
+ "thinker.layers.9.mlp.down_proj": [
3387
+ {
3388
+ "alpha": 8,
3389
+ "end": 8,
3390
+ "rank": 8,
3391
+ "start": 0
3392
+ }
3393
+ ],
3394
+ "thinker.layers.9.mlp.gate_proj": [
3395
+ {
3396
+ "alpha": 8,
3397
+ "end": 8,
3398
+ "rank": 8,
3399
+ "start": 0
3400
+ }
3401
+ ],
3402
+ "thinker.layers.9.mlp.up_proj": [
3403
+ {
3404
+ "alpha": 8,
3405
+ "end": 8,
3406
+ "rank": 8,
3407
+ "start": 0
3408
+ }
3409
+ ],
3410
+ "thinker.layers.9.self_attn.k_proj": [
3411
+ {
3412
+ "alpha": 8,
3413
+ "end": 8,
3414
+ "rank": 8,
3415
+ "start": 0
3416
+ }
3417
+ ],
3418
+ "thinker.layers.9.self_attn.o_proj": [
3419
+ {
3420
+ "alpha": 8,
3421
+ "end": 8,
3422
+ "rank": 8,
3423
+ "start": 0
3424
+ }
3425
+ ],
3426
+ "thinker.layers.9.self_attn.q_proj": [
3427
+ {
3428
+ "alpha": 8,
3429
+ "end": 8,
3430
+ "rank": 8,
3431
+ "start": 0
3432
+ }
3433
+ ],
3434
+ "thinker.layers.9.self_attn.v_proj": [
3435
+ {
3436
+ "alpha": 8,
3437
+ "end": 8,
3438
+ "rank": 8,
3439
+ "start": 0
3440
+ }
3441
+ ],
3442
+ "thinker.model.layers.0.mlp.down_proj": [
3443
+ {
3444
+ "alpha": 8,
3445
+ "end": 8,
3446
+ "rank": 8,
3447
+ "start": 0
3448
+ }
3449
+ ],
3450
+ "thinker.model.layers.0.mlp.gate_proj": [
3451
+ {
3452
+ "alpha": 8,
3453
+ "end": 8,
3454
+ "rank": 8,
3455
+ "start": 0
3456
+ }
3457
+ ],
3458
+ "thinker.model.layers.0.mlp.up_proj": [
3459
+ {
3460
+ "alpha": 8,
3461
+ "end": 8,
3462
+ "rank": 8,
3463
+ "start": 0
3464
+ }
3465
+ ],
3466
+ "thinker.model.layers.0.self_attn.k_proj": [
3467
+ {
3468
+ "alpha": 8,
3469
+ "end": 8,
3470
+ "rank": 8,
3471
+ "start": 0
3472
+ }
3473
+ ],
3474
+ "thinker.model.layers.0.self_attn.o_proj": [
3475
+ {
3476
+ "alpha": 8,
3477
+ "end": 8,
3478
+ "rank": 8,
3479
+ "start": 0
3480
+ }
3481
+ ],
3482
+ "thinker.model.layers.0.self_attn.q_proj": [
3483
+ {
3484
+ "alpha": 8,
3485
+ "end": 8,
3486
+ "rank": 8,
3487
+ "start": 0
3488
+ }
3489
+ ],
3490
+ "thinker.model.layers.0.self_attn.v_proj": [
3491
+ {
3492
+ "alpha": 8,
3493
+ "end": 8,
3494
+ "rank": 8,
3495
+ "start": 0
3496
+ }
3497
+ ],
3498
+ "thinker.model.layers.1.mlp.down_proj": [
3499
+ {
3500
+ "alpha": 8,
3501
+ "end": 8,
3502
+ "rank": 8,
3503
+ "start": 0
3504
+ }
3505
+ ],
3506
+ "thinker.model.layers.1.mlp.gate_proj": [
3507
+ {
3508
+ "alpha": 8,
3509
+ "end": 8,
3510
+ "rank": 8,
3511
+ "start": 0
3512
+ }
3513
+ ],
3514
+ "thinker.model.layers.1.mlp.up_proj": [
3515
+ {
3516
+ "alpha": 8,
3517
+ "end": 8,
3518
+ "rank": 8,
3519
+ "start": 0
3520
+ }
3521
+ ],
3522
+ "thinker.model.layers.1.self_attn.k_proj": [
3523
+ {
3524
+ "alpha": 8,
3525
+ "end": 8,
3526
+ "rank": 8,
3527
+ "start": 0
3528
+ }
3529
+ ],
3530
+ "thinker.model.layers.1.self_attn.o_proj": [
3531
+ {
3532
+ "alpha": 8,
3533
+ "end": 8,
3534
+ "rank": 8,
3535
+ "start": 0
3536
+ }
3537
+ ],
3538
+ "thinker.model.layers.1.self_attn.q_proj": [
3539
+ {
3540
+ "alpha": 8,
3541
+ "end": 8,
3542
+ "rank": 8,
3543
+ "start": 0
3544
+ }
3545
+ ],
3546
+ "thinker.model.layers.1.self_attn.v_proj": [
3547
+ {
3548
+ "alpha": 8,
3549
+ "end": 8,
3550
+ "rank": 8,
3551
+ "start": 0
3552
+ }
3553
+ ],
3554
+ "thinker.model.layers.10.mlp.down_proj": [
3555
+ {
3556
+ "alpha": 8,
3557
+ "end": 8,
3558
+ "rank": 8,
3559
+ "start": 0
3560
+ }
3561
+ ],
3562
+ "thinker.model.layers.10.mlp.gate_proj": [
3563
+ {
3564
+ "alpha": 8,
3565
+ "end": 8,
3566
+ "rank": 8,
3567
+ "start": 0
3568
+ }
3569
+ ],
3570
+ "thinker.model.layers.10.mlp.up_proj": [
3571
+ {
3572
+ "alpha": 8,
3573
+ "end": 8,
3574
+ "rank": 8,
3575
+ "start": 0
3576
+ }
3577
+ ],
3578
+ "thinker.model.layers.10.self_attn.k_proj": [
3579
+ {
3580
+ "alpha": 8,
3581
+ "end": 8,
3582
+ "rank": 8,
3583
+ "start": 0
3584
+ }
3585
+ ],
3586
+ "thinker.model.layers.10.self_attn.o_proj": [
3587
+ {
3588
+ "alpha": 8,
3589
+ "end": 8,
3590
+ "rank": 8,
3591
+ "start": 0
3592
+ }
3593
+ ],
3594
+ "thinker.model.layers.10.self_attn.q_proj": [
3595
+ {
3596
+ "alpha": 8,
3597
+ "end": 8,
3598
+ "rank": 8,
3599
+ "start": 0
3600
+ }
3601
+ ],
3602
+ "thinker.model.layers.10.self_attn.v_proj": [
3603
+ {
3604
+ "alpha": 8,
3605
+ "end": 8,
3606
+ "rank": 8,
3607
+ "start": 0
3608
+ }
3609
+ ],
3610
+ "thinker.model.layers.11.mlp.down_proj": [
3611
+ {
3612
+ "alpha": 8,
3613
+ "end": 8,
3614
+ "rank": 8,
3615
+ "start": 0
3616
+ }
3617
+ ],
3618
+ "thinker.model.layers.11.mlp.gate_proj": [
3619
+ {
3620
+ "alpha": 8,
3621
+ "end": 8,
3622
+ "rank": 8,
3623
+ "start": 0
3624
+ }
3625
+ ],
3626
+ "thinker.model.layers.11.mlp.up_proj": [
3627
+ {
3628
+ "alpha": 8,
3629
+ "end": 8,
3630
+ "rank": 8,
3631
+ "start": 0
3632
+ }
3633
+ ],
3634
+ "thinker.model.layers.11.self_attn.k_proj": [
3635
+ {
3636
+ "alpha": 8,
3637
+ "end": 8,
3638
+ "rank": 8,
3639
+ "start": 0
3640
+ }
3641
+ ],
3642
+ "thinker.model.layers.11.self_attn.o_proj": [
3643
+ {
3644
+ "alpha": 8,
3645
+ "end": 8,
3646
+ "rank": 8,
3647
+ "start": 0
3648
+ }
3649
+ ],
3650
+ "thinker.model.layers.11.self_attn.q_proj": [
3651
+ {
3652
+ "alpha": 8,
3653
+ "end": 8,
3654
+ "rank": 8,
3655
+ "start": 0
3656
+ }
3657
+ ],
3658
+ "thinker.model.layers.11.self_attn.v_proj": [
3659
+ {
3660
+ "alpha": 8,
3661
+ "end": 8,
3662
+ "rank": 8,
3663
+ "start": 0
3664
+ }
3665
+ ],
3666
+ "thinker.model.layers.12.mlp.down_proj": [
3667
+ {
3668
+ "alpha": 8,
3669
+ "end": 8,
3670
+ "rank": 8,
3671
+ "start": 0
3672
+ }
3673
+ ],
3674
+ "thinker.model.layers.12.mlp.gate_proj": [
3675
+ {
3676
+ "alpha": 8,
3677
+ "end": 8,
3678
+ "rank": 8,
3679
+ "start": 0
3680
+ }
3681
+ ],
3682
+ "thinker.model.layers.12.mlp.up_proj": [
3683
+ {
3684
+ "alpha": 8,
3685
+ "end": 8,
3686
+ "rank": 8,
3687
+ "start": 0
3688
+ }
3689
+ ],
3690
+ "thinker.model.layers.12.self_attn.k_proj": [
3691
+ {
3692
+ "alpha": 8,
3693
+ "end": 8,
3694
+ "rank": 8,
3695
+ "start": 0
3696
+ }
3697
+ ],
3698
+ "thinker.model.layers.12.self_attn.o_proj": [
3699
+ {
3700
+ "alpha": 8,
3701
+ "end": 8,
3702
+ "rank": 8,
3703
+ "start": 0
3704
+ }
3705
+ ],
3706
+ "thinker.model.layers.12.self_attn.q_proj": [
3707
+ {
3708
+ "alpha": 8,
3709
+ "end": 8,
3710
+ "rank": 8,
3711
+ "start": 0
3712
+ }
3713
+ ],
3714
+ "thinker.model.layers.12.self_attn.v_proj": [
3715
+ {
3716
+ "alpha": 8,
3717
+ "end": 8,
3718
+ "rank": 8,
3719
+ "start": 0
3720
+ }
3721
+ ],
3722
+ "thinker.model.layers.13.mlp.down_proj": [
3723
+ {
3724
+ "alpha": 8,
3725
+ "end": 8,
3726
+ "rank": 8,
3727
+ "start": 0
3728
+ }
3729
+ ],
3730
+ "thinker.model.layers.13.mlp.gate_proj": [
3731
+ {
3732
+ "alpha": 8,
3733
+ "end": 8,
3734
+ "rank": 8,
3735
+ "start": 0
3736
+ }
3737
+ ],
3738
+ "thinker.model.layers.13.mlp.up_proj": [
3739
+ {
3740
+ "alpha": 8,
3741
+ "end": 8,
3742
+ "rank": 8,
3743
+ "start": 0
3744
+ }
3745
+ ],
3746
+ "thinker.model.layers.13.self_attn.k_proj": [
3747
+ {
3748
+ "alpha": 8,
3749
+ "end": 8,
3750
+ "rank": 8,
3751
+ "start": 0
3752
+ }
3753
+ ],
3754
+ "thinker.model.layers.13.self_attn.o_proj": [
3755
+ {
3756
+ "alpha": 8,
3757
+ "end": 8,
3758
+ "rank": 8,
3759
+ "start": 0
3760
+ }
3761
+ ],
3762
+ "thinker.model.layers.13.self_attn.q_proj": [
3763
+ {
3764
+ "alpha": 8,
3765
+ "end": 8,
3766
+ "rank": 8,
3767
+ "start": 0
3768
+ }
3769
+ ],
3770
+ "thinker.model.layers.13.self_attn.v_proj": [
3771
+ {
3772
+ "alpha": 8,
3773
+ "end": 8,
3774
+ "rank": 8,
3775
+ "start": 0
3776
+ }
3777
+ ],
3778
+ "thinker.model.layers.14.mlp.down_proj": [
3779
+ {
3780
+ "alpha": 8,
3781
+ "end": 8,
3782
+ "rank": 8,
3783
+ "start": 0
3784
+ }
3785
+ ],
3786
+ "thinker.model.layers.14.mlp.gate_proj": [
3787
+ {
3788
+ "alpha": 8,
3789
+ "end": 8,
3790
+ "rank": 8,
3791
+ "start": 0
3792
+ }
3793
+ ],
3794
+ "thinker.model.layers.14.mlp.up_proj": [
3795
+ {
3796
+ "alpha": 8,
3797
+ "end": 8,
3798
+ "rank": 8,
3799
+ "start": 0
3800
+ }
3801
+ ],
3802
+ "thinker.model.layers.14.self_attn.k_proj": [
3803
+ {
3804
+ "alpha": 8,
3805
+ "end": 8,
3806
+ "rank": 8,
3807
+ "start": 0
3808
+ }
3809
+ ],
3810
+ "thinker.model.layers.14.self_attn.o_proj": [
3811
+ {
3812
+ "alpha": 8,
3813
+ "end": 8,
3814
+ "rank": 8,
3815
+ "start": 0
3816
+ }
3817
+ ],
3818
+ "thinker.model.layers.14.self_attn.q_proj": [
3819
+ {
3820
+ "alpha": 8,
3821
+ "end": 8,
3822
+ "rank": 8,
3823
+ "start": 0
3824
+ }
3825
+ ],
3826
+ "thinker.model.layers.14.self_attn.v_proj": [
3827
+ {
3828
+ "alpha": 8,
3829
+ "end": 8,
3830
+ "rank": 8,
3831
+ "start": 0
3832
+ }
3833
+ ],
3834
+ "thinker.model.layers.15.mlp.down_proj": [
3835
+ {
3836
+ "alpha": 8,
3837
+ "end": 8,
3838
+ "rank": 8,
3839
+ "start": 0
3840
+ }
3841
+ ],
3842
+ "thinker.model.layers.15.mlp.gate_proj": [
3843
+ {
3844
+ "alpha": 8,
3845
+ "end": 8,
3846
+ "rank": 8,
3847
+ "start": 0
3848
+ }
3849
+ ],
3850
+ "thinker.model.layers.15.mlp.up_proj": [
3851
+ {
3852
+ "alpha": 8,
3853
+ "end": 8,
3854
+ "rank": 8,
3855
+ "start": 0
3856
+ }
3857
+ ],
3858
+ "thinker.model.layers.15.self_attn.k_proj": [
3859
+ {
3860
+ "alpha": 8,
3861
+ "end": 8,
3862
+ "rank": 8,
3863
+ "start": 0
3864
+ }
3865
+ ],
3866
+ "thinker.model.layers.15.self_attn.o_proj": [
3867
+ {
3868
+ "alpha": 8,
3869
+ "end": 8,
3870
+ "rank": 8,
3871
+ "start": 0
3872
+ }
3873
+ ],
3874
+ "thinker.model.layers.15.self_attn.q_proj": [
3875
+ {
3876
+ "alpha": 8,
3877
+ "end": 8,
3878
+ "rank": 8,
3879
+ "start": 0
3880
+ }
3881
+ ],
3882
+ "thinker.model.layers.15.self_attn.v_proj": [
3883
+ {
3884
+ "alpha": 8,
3885
+ "end": 8,
3886
+ "rank": 8,
3887
+ "start": 0
3888
+ }
3889
+ ],
3890
+ "thinker.model.layers.16.mlp.down_proj": [
3891
+ {
3892
+ "alpha": 8,
3893
+ "end": 8,
3894
+ "rank": 8,
3895
+ "start": 0
3896
+ }
3897
+ ],
3898
+ "thinker.model.layers.16.mlp.gate_proj": [
3899
+ {
3900
+ "alpha": 8,
3901
+ "end": 8,
3902
+ "rank": 8,
3903
+ "start": 0
3904
+ }
3905
+ ],
3906
+ "thinker.model.layers.16.mlp.up_proj": [
3907
+ {
3908
+ "alpha": 8,
3909
+ "end": 8,
3910
+ "rank": 8,
3911
+ "start": 0
3912
+ }
3913
+ ],
3914
+ "thinker.model.layers.16.self_attn.k_proj": [
3915
+ {
3916
+ "alpha": 8,
3917
+ "end": 8,
3918
+ "rank": 8,
3919
+ "start": 0
3920
+ }
3921
+ ],
3922
+ "thinker.model.layers.16.self_attn.o_proj": [
3923
+ {
3924
+ "alpha": 8,
3925
+ "end": 8,
3926
+ "rank": 8,
3927
+ "start": 0
3928
+ }
3929
+ ],
3930
+ "thinker.model.layers.16.self_attn.q_proj": [
3931
+ {
3932
+ "alpha": 8,
3933
+ "end": 8,
3934
+ "rank": 8,
3935
+ "start": 0
3936
+ }
3937
+ ],
3938
+ "thinker.model.layers.16.self_attn.v_proj": [
3939
+ {
3940
+ "alpha": 8,
3941
+ "end": 8,
3942
+ "rank": 8,
3943
+ "start": 0
3944
+ }
3945
+ ],
3946
+ "thinker.model.layers.17.mlp.down_proj": [
3947
+ {
3948
+ "alpha": 8,
3949
+ "end": 8,
3950
+ "rank": 8,
3951
+ "start": 0
3952
+ }
3953
+ ],
3954
+ "thinker.model.layers.17.mlp.gate_proj": [
3955
+ {
3956
+ "alpha": 8,
3957
+ "end": 8,
3958
+ "rank": 8,
3959
+ "start": 0
3960
+ }
3961
+ ],
3962
+ "thinker.model.layers.17.mlp.up_proj": [
3963
+ {
3964
+ "alpha": 8,
3965
+ "end": 8,
3966
+ "rank": 8,
3967
+ "start": 0
3968
+ }
3969
+ ],
3970
+ "thinker.model.layers.17.self_attn.k_proj": [
3971
+ {
3972
+ "alpha": 8,
3973
+ "end": 8,
3974
+ "rank": 8,
3975
+ "start": 0
3976
+ }
3977
+ ],
3978
+ "thinker.model.layers.17.self_attn.o_proj": [
3979
+ {
3980
+ "alpha": 8,
3981
+ "end": 8,
3982
+ "rank": 8,
3983
+ "start": 0
3984
+ }
3985
+ ],
3986
+ "thinker.model.layers.17.self_attn.q_proj": [
3987
+ {
3988
+ "alpha": 8,
3989
+ "end": 8,
3990
+ "rank": 8,
3991
+ "start": 0
3992
+ }
3993
+ ],
3994
+ "thinker.model.layers.17.self_attn.v_proj": [
3995
+ {
3996
+ "alpha": 8,
3997
+ "end": 8,
3998
+ "rank": 8,
3999
+ "start": 0
4000
+ }
4001
+ ],
4002
+ "thinker.model.layers.18.mlp.down_proj": [
4003
+ {
4004
+ "alpha": 8,
4005
+ "end": 8,
4006
+ "rank": 8,
4007
+ "start": 0
4008
+ }
4009
+ ],
4010
+ "thinker.model.layers.18.mlp.gate_proj": [
4011
+ {
4012
+ "alpha": 8,
4013
+ "end": 8,
4014
+ "rank": 8,
4015
+ "start": 0
4016
+ }
4017
+ ],
4018
+ "thinker.model.layers.18.mlp.up_proj": [
4019
+ {
4020
+ "alpha": 8,
4021
+ "end": 8,
4022
+ "rank": 8,
4023
+ "start": 0
4024
+ }
4025
+ ],
4026
+ "thinker.model.layers.18.self_attn.k_proj": [
4027
+ {
4028
+ "alpha": 8,
4029
+ "end": 8,
4030
+ "rank": 8,
4031
+ "start": 0
4032
+ }
4033
+ ],
4034
+ "thinker.model.layers.18.self_attn.o_proj": [
4035
+ {
4036
+ "alpha": 8,
4037
+ "end": 8,
4038
+ "rank": 8,
4039
+ "start": 0
4040
+ }
4041
+ ],
4042
+ "thinker.model.layers.18.self_attn.q_proj": [
4043
+ {
4044
+ "alpha": 8,
4045
+ "end": 8,
4046
+ "rank": 8,
4047
+ "start": 0
4048
+ }
4049
+ ],
4050
+ "thinker.model.layers.18.self_attn.v_proj": [
4051
+ {
4052
+ "alpha": 8,
4053
+ "end": 8,
4054
+ "rank": 8,
4055
+ "start": 0
4056
+ }
4057
+ ],
4058
+ "thinker.model.layers.19.mlp.down_proj": [
4059
+ {
4060
+ "alpha": 8,
4061
+ "end": 8,
4062
+ "rank": 8,
4063
+ "start": 0
4064
+ }
4065
+ ],
4066
+ "thinker.model.layers.19.mlp.gate_proj": [
4067
+ {
4068
+ "alpha": 8,
4069
+ "end": 8,
4070
+ "rank": 8,
4071
+ "start": 0
4072
+ }
4073
+ ],
4074
+ "thinker.model.layers.19.mlp.up_proj": [
4075
+ {
4076
+ "alpha": 8,
4077
+ "end": 8,
4078
+ "rank": 8,
4079
+ "start": 0
4080
+ }
4081
+ ],
4082
+ "thinker.model.layers.19.self_attn.k_proj": [
4083
+ {
4084
+ "alpha": 8,
4085
+ "end": 8,
4086
+ "rank": 8,
4087
+ "start": 0
4088
+ }
4089
+ ],
4090
+ "thinker.model.layers.19.self_attn.o_proj": [
4091
+ {
4092
+ "alpha": 8,
4093
+ "end": 8,
4094
+ "rank": 8,
4095
+ "start": 0
4096
+ }
4097
+ ],
4098
+ "thinker.model.layers.19.self_attn.q_proj": [
4099
+ {
4100
+ "alpha": 8,
4101
+ "end": 8,
4102
+ "rank": 8,
4103
+ "start": 0
4104
+ }
4105
+ ],
4106
+ "thinker.model.layers.19.self_attn.v_proj": [
4107
+ {
4108
+ "alpha": 8,
4109
+ "end": 8,
4110
+ "rank": 8,
4111
+ "start": 0
4112
+ }
4113
+ ],
4114
+ "thinker.model.layers.2.mlp.down_proj": [
4115
+ {
4116
+ "alpha": 8,
4117
+ "end": 8,
4118
+ "rank": 8,
4119
+ "start": 0
4120
+ }
4121
+ ],
4122
+ "thinker.model.layers.2.mlp.gate_proj": [
4123
+ {
4124
+ "alpha": 8,
4125
+ "end": 8,
4126
+ "rank": 8,
4127
+ "start": 0
4128
+ }
4129
+ ],
4130
+ "thinker.model.layers.2.mlp.up_proj": [
4131
+ {
4132
+ "alpha": 8,
4133
+ "end": 8,
4134
+ "rank": 8,
4135
+ "start": 0
4136
+ }
4137
+ ],
4138
+ "thinker.model.layers.2.self_attn.k_proj": [
4139
+ {
4140
+ "alpha": 8,
4141
+ "end": 8,
4142
+ "rank": 8,
4143
+ "start": 0
4144
+ }
4145
+ ],
4146
+ "thinker.model.layers.2.self_attn.o_proj": [
4147
+ {
4148
+ "alpha": 8,
4149
+ "end": 8,
4150
+ "rank": 8,
4151
+ "start": 0
4152
+ }
4153
+ ],
4154
+ "thinker.model.layers.2.self_attn.q_proj": [
4155
+ {
4156
+ "alpha": 8,
4157
+ "end": 8,
4158
+ "rank": 8,
4159
+ "start": 0
4160
+ }
4161
+ ],
4162
+ "thinker.model.layers.2.self_attn.v_proj": [
4163
+ {
4164
+ "alpha": 8,
4165
+ "end": 8,
4166
+ "rank": 8,
4167
+ "start": 0
4168
+ }
4169
+ ],
4170
+ "thinker.model.layers.20.mlp.down_proj": [
4171
+ {
4172
+ "alpha": 8,
4173
+ "end": 8,
4174
+ "rank": 8,
4175
+ "start": 0
4176
+ }
4177
+ ],
4178
+ "thinker.model.layers.20.mlp.gate_proj": [
4179
+ {
4180
+ "alpha": 8,
4181
+ "end": 8,
4182
+ "rank": 8,
4183
+ "start": 0
4184
+ }
4185
+ ],
4186
+ "thinker.model.layers.20.mlp.up_proj": [
4187
+ {
4188
+ "alpha": 8,
4189
+ "end": 8,
4190
+ "rank": 8,
4191
+ "start": 0
4192
+ }
4193
+ ],
4194
+ "thinker.model.layers.20.self_attn.k_proj": [
4195
+ {
4196
+ "alpha": 8,
4197
+ "end": 8,
4198
+ "rank": 8,
4199
+ "start": 0
4200
+ }
4201
+ ],
4202
+ "thinker.model.layers.20.self_attn.o_proj": [
4203
+ {
4204
+ "alpha": 8,
4205
+ "end": 8,
4206
+ "rank": 8,
4207
+ "start": 0
4208
+ }
4209
+ ],
4210
+ "thinker.model.layers.20.self_attn.q_proj": [
4211
+ {
4212
+ "alpha": 8,
4213
+ "end": 8,
4214
+ "rank": 8,
4215
+ "start": 0
4216
+ }
4217
+ ],
4218
+ "thinker.model.layers.20.self_attn.v_proj": [
4219
+ {
4220
+ "alpha": 8,
4221
+ "end": 8,
4222
+ "rank": 8,
4223
+ "start": 0
4224
+ }
4225
+ ],
4226
+ "thinker.model.layers.21.mlp.down_proj": [
4227
+ {
4228
+ "alpha": 8,
4229
+ "end": 8,
4230
+ "rank": 8,
4231
+ "start": 0
4232
+ }
4233
+ ],
4234
+ "thinker.model.layers.21.mlp.gate_proj": [
4235
+ {
4236
+ "alpha": 8,
4237
+ "end": 8,
4238
+ "rank": 8,
4239
+ "start": 0
4240
+ }
4241
+ ],
4242
+ "thinker.model.layers.21.mlp.up_proj": [
4243
+ {
4244
+ "alpha": 8,
4245
+ "end": 8,
4246
+ "rank": 8,
4247
+ "start": 0
4248
+ }
4249
+ ],
4250
+ "thinker.model.layers.21.self_attn.k_proj": [
4251
+ {
4252
+ "alpha": 8,
4253
+ "end": 8,
4254
+ "rank": 8,
4255
+ "start": 0
4256
+ }
4257
+ ],
4258
+ "thinker.model.layers.21.self_attn.o_proj": [
4259
+ {
4260
+ "alpha": 8,
4261
+ "end": 8,
4262
+ "rank": 8,
4263
+ "start": 0
4264
+ }
4265
+ ],
4266
+ "thinker.model.layers.21.self_attn.q_proj": [
4267
+ {
4268
+ "alpha": 8,
4269
+ "end": 8,
4270
+ "rank": 8,
4271
+ "start": 0
4272
+ }
4273
+ ],
4274
+ "thinker.model.layers.21.self_attn.v_proj": [
4275
+ {
4276
+ "alpha": 8,
4277
+ "end": 8,
4278
+ "rank": 8,
4279
+ "start": 0
4280
+ }
4281
+ ],
4282
+ "thinker.model.layers.22.mlp.down_proj": [
4283
+ {
4284
+ "alpha": 8,
4285
+ "end": 8,
4286
+ "rank": 8,
4287
+ "start": 0
4288
+ }
4289
+ ],
4290
+ "thinker.model.layers.22.mlp.gate_proj": [
4291
+ {
4292
+ "alpha": 8,
4293
+ "end": 8,
4294
+ "rank": 8,
4295
+ "start": 0
4296
+ }
4297
+ ],
4298
+ "thinker.model.layers.22.mlp.up_proj": [
4299
+ {
4300
+ "alpha": 8,
4301
+ "end": 8,
4302
+ "rank": 8,
4303
+ "start": 0
4304
+ }
4305
+ ],
4306
+ "thinker.model.layers.22.self_attn.k_proj": [
4307
+ {
4308
+ "alpha": 8,
4309
+ "end": 8,
4310
+ "rank": 8,
4311
+ "start": 0
4312
+ }
4313
+ ],
4314
+ "thinker.model.layers.22.self_attn.o_proj": [
4315
+ {
4316
+ "alpha": 8,
4317
+ "end": 8,
4318
+ "rank": 8,
4319
+ "start": 0
4320
+ }
4321
+ ],
4322
+ "thinker.model.layers.22.self_attn.q_proj": [
4323
+ {
4324
+ "alpha": 8,
4325
+ "end": 8,
4326
+ "rank": 8,
4327
+ "start": 0
4328
+ }
4329
+ ],
4330
+ "thinker.model.layers.22.self_attn.v_proj": [
4331
+ {
4332
+ "alpha": 8,
4333
+ "end": 8,
4334
+ "rank": 8,
4335
+ "start": 0
4336
+ }
4337
+ ],
4338
+ "thinker.model.layers.23.mlp.down_proj": [
4339
+ {
4340
+ "alpha": 8,
4341
+ "end": 8,
4342
+ "rank": 8,
4343
+ "start": 0
4344
+ }
4345
+ ],
4346
+ "thinker.model.layers.23.mlp.gate_proj": [
4347
+ {
4348
+ "alpha": 8,
4349
+ "end": 8,
4350
+ "rank": 8,
4351
+ "start": 0
4352
+ }
4353
+ ],
4354
+ "thinker.model.layers.23.mlp.up_proj": [
4355
+ {
4356
+ "alpha": 8,
4357
+ "end": 8,
4358
+ "rank": 8,
4359
+ "start": 0
4360
+ }
4361
+ ],
4362
+ "thinker.model.layers.23.self_attn.k_proj": [
4363
+ {
4364
+ "alpha": 8,
4365
+ "end": 8,
4366
+ "rank": 8,
4367
+ "start": 0
4368
+ }
4369
+ ],
4370
+ "thinker.model.layers.23.self_attn.o_proj": [
4371
+ {
4372
+ "alpha": 8,
4373
+ "end": 8,
4374
+ "rank": 8,
4375
+ "start": 0
4376
+ }
4377
+ ],
4378
+ "thinker.model.layers.23.self_attn.q_proj": [
4379
+ {
4380
+ "alpha": 8,
4381
+ "end": 8,
4382
+ "rank": 8,
4383
+ "start": 0
4384
+ }
4385
+ ],
4386
+ "thinker.model.layers.23.self_attn.v_proj": [
4387
+ {
4388
+ "alpha": 8,
4389
+ "end": 8,
4390
+ "rank": 8,
4391
+ "start": 0
4392
+ }
4393
+ ],
4394
+ "thinker.model.layers.24.mlp.down_proj": [
4395
+ {
4396
+ "alpha": 8,
4397
+ "end": 8,
4398
+ "rank": 8,
4399
+ "start": 0
4400
+ }
4401
+ ],
4402
+ "thinker.model.layers.24.mlp.gate_proj": [
4403
+ {
4404
+ "alpha": 8,
4405
+ "end": 8,
4406
+ "rank": 8,
4407
+ "start": 0
4408
+ }
4409
+ ],
4410
+ "thinker.model.layers.24.mlp.up_proj": [
4411
+ {
4412
+ "alpha": 8,
4413
+ "end": 8,
4414
+ "rank": 8,
4415
+ "start": 0
4416
+ }
4417
+ ],
4418
+ "thinker.model.layers.24.self_attn.k_proj": [
4419
+ {
4420
+ "alpha": 8,
4421
+ "end": 8,
4422
+ "rank": 8,
4423
+ "start": 0
4424
+ }
4425
+ ],
4426
+ "thinker.model.layers.24.self_attn.o_proj": [
4427
+ {
4428
+ "alpha": 8,
4429
+ "end": 8,
4430
+ "rank": 8,
4431
+ "start": 0
4432
+ }
4433
+ ],
4434
+ "thinker.model.layers.24.self_attn.q_proj": [
4435
+ {
4436
+ "alpha": 8,
4437
+ "end": 8,
4438
+ "rank": 8,
4439
+ "start": 0
4440
+ }
4441
+ ],
4442
+ "thinker.model.layers.24.self_attn.v_proj": [
4443
+ {
4444
+ "alpha": 8,
4445
+ "end": 8,
4446
+ "rank": 8,
4447
+ "start": 0
4448
+ }
4449
+ ],
4450
+ "thinker.model.layers.25.mlp.down_proj": [
4451
+ {
4452
+ "alpha": 8,
4453
+ "end": 8,
4454
+ "rank": 8,
4455
+ "start": 0
4456
+ }
4457
+ ],
4458
+ "thinker.model.layers.25.mlp.gate_proj": [
4459
+ {
4460
+ "alpha": 8,
4461
+ "end": 8,
4462
+ "rank": 8,
4463
+ "start": 0
4464
+ }
4465
+ ],
4466
+ "thinker.model.layers.25.mlp.up_proj": [
4467
+ {
4468
+ "alpha": 8,
4469
+ "end": 8,
4470
+ "rank": 8,
4471
+ "start": 0
4472
+ }
4473
+ ],
4474
+ "thinker.model.layers.25.self_attn.k_proj": [
4475
+ {
4476
+ "alpha": 8,
4477
+ "end": 8,
4478
+ "rank": 8,
4479
+ "start": 0
4480
+ }
4481
+ ],
4482
+ "thinker.model.layers.25.self_attn.o_proj": [
4483
+ {
4484
+ "alpha": 8,
4485
+ "end": 8,
4486
+ "rank": 8,
4487
+ "start": 0
4488
+ }
4489
+ ],
4490
+ "thinker.model.layers.25.self_attn.q_proj": [
4491
+ {
4492
+ "alpha": 8,
4493
+ "end": 8,
4494
+ "rank": 8,
4495
+ "start": 0
4496
+ }
4497
+ ],
4498
+ "thinker.model.layers.25.self_attn.v_proj": [
4499
+ {
4500
+ "alpha": 8,
4501
+ "end": 8,
4502
+ "rank": 8,
4503
+ "start": 0
4504
+ }
4505
+ ],
4506
+ "thinker.model.layers.26.mlp.down_proj": [
4507
+ {
4508
+ "alpha": 8,
4509
+ "end": 8,
4510
+ "rank": 8,
4511
+ "start": 0
4512
+ }
4513
+ ],
4514
+ "thinker.model.layers.26.mlp.gate_proj": [
4515
+ {
4516
+ "alpha": 8,
4517
+ "end": 8,
4518
+ "rank": 8,
4519
+ "start": 0
4520
+ }
4521
+ ],
4522
+ "thinker.model.layers.26.mlp.up_proj": [
4523
+ {
4524
+ "alpha": 8,
4525
+ "end": 8,
4526
+ "rank": 8,
4527
+ "start": 0
4528
+ }
4529
+ ],
4530
+ "thinker.model.layers.26.self_attn.k_proj": [
4531
+ {
4532
+ "alpha": 8,
4533
+ "end": 8,
4534
+ "rank": 8,
4535
+ "start": 0
4536
+ }
4537
+ ],
4538
+ "thinker.model.layers.26.self_attn.o_proj": [
4539
+ {
4540
+ "alpha": 8,
4541
+ "end": 8,
4542
+ "rank": 8,
4543
+ "start": 0
4544
+ }
4545
+ ],
4546
+ "thinker.model.layers.26.self_attn.q_proj": [
4547
+ {
4548
+ "alpha": 8,
4549
+ "end": 8,
4550
+ "rank": 8,
4551
+ "start": 0
4552
+ }
4553
+ ],
4554
+ "thinker.model.layers.26.self_attn.v_proj": [
4555
+ {
4556
+ "alpha": 8,
4557
+ "end": 8,
4558
+ "rank": 8,
4559
+ "start": 0
4560
+ }
4561
+ ],
4562
+ "thinker.model.layers.27.mlp.down_proj": [
4563
+ {
4564
+ "alpha": 8,
4565
+ "end": 8,
4566
+ "rank": 8,
4567
+ "start": 0
4568
+ }
4569
+ ],
4570
+ "thinker.model.layers.27.mlp.gate_proj": [
4571
+ {
4572
+ "alpha": 8,
4573
+ "end": 8,
4574
+ "rank": 8,
4575
+ "start": 0
4576
+ }
4577
+ ],
4578
+ "thinker.model.layers.27.mlp.up_proj": [
4579
+ {
4580
+ "alpha": 8,
4581
+ "end": 8,
4582
+ "rank": 8,
4583
+ "start": 0
4584
+ }
4585
+ ],
4586
+ "thinker.model.layers.27.self_attn.k_proj": [
4587
+ {
4588
+ "alpha": 8,
4589
+ "end": 8,
4590
+ "rank": 8,
4591
+ "start": 0
4592
+ }
4593
+ ],
4594
+ "thinker.model.layers.27.self_attn.o_proj": [
4595
+ {
4596
+ "alpha": 8,
4597
+ "end": 8,
4598
+ "rank": 8,
4599
+ "start": 0
4600
+ }
4601
+ ],
4602
+ "thinker.model.layers.27.self_attn.q_proj": [
4603
+ {
4604
+ "alpha": 8,
4605
+ "end": 8,
4606
+ "rank": 8,
4607
+ "start": 0
4608
+ }
4609
+ ],
4610
+ "thinker.model.layers.27.self_attn.v_proj": [
4611
+ {
4612
+ "alpha": 8,
4613
+ "end": 8,
4614
+ "rank": 8,
4615
+ "start": 0
4616
+ }
4617
+ ],
4618
+ "thinker.model.layers.3.mlp.down_proj": [
4619
+ {
4620
+ "alpha": 8,
4621
+ "end": 8,
4622
+ "rank": 8,
4623
+ "start": 0
4624
+ }
4625
+ ],
4626
+ "thinker.model.layers.3.mlp.gate_proj": [
4627
+ {
4628
+ "alpha": 8,
4629
+ "end": 8,
4630
+ "rank": 8,
4631
+ "start": 0
4632
+ }
4633
+ ],
4634
+ "thinker.model.layers.3.mlp.up_proj": [
4635
+ {
4636
+ "alpha": 8,
4637
+ "end": 8,
4638
+ "rank": 8,
4639
+ "start": 0
4640
+ }
4641
+ ],
4642
+ "thinker.model.layers.3.self_attn.k_proj": [
4643
+ {
4644
+ "alpha": 8,
4645
+ "end": 8,
4646
+ "rank": 8,
4647
+ "start": 0
4648
+ }
4649
+ ],
4650
+ "thinker.model.layers.3.self_attn.o_proj": [
4651
+ {
4652
+ "alpha": 8,
4653
+ "end": 8,
4654
+ "rank": 8,
4655
+ "start": 0
4656
+ }
4657
+ ],
4658
+ "thinker.model.layers.3.self_attn.q_proj": [
4659
+ {
4660
+ "alpha": 8,
4661
+ "end": 8,
4662
+ "rank": 8,
4663
+ "start": 0
4664
+ }
4665
+ ],
4666
+ "thinker.model.layers.3.self_attn.v_proj": [
4667
+ {
4668
+ "alpha": 8,
4669
+ "end": 8,
4670
+ "rank": 8,
4671
+ "start": 0
4672
+ }
4673
+ ],
4674
+ "thinker.model.layers.4.mlp.down_proj": [
4675
+ {
4676
+ "alpha": 8,
4677
+ "end": 8,
4678
+ "rank": 8,
4679
+ "start": 0
4680
+ }
4681
+ ],
4682
+ "thinker.model.layers.4.mlp.gate_proj": [
4683
+ {
4684
+ "alpha": 8,
4685
+ "end": 8,
4686
+ "rank": 8,
4687
+ "start": 0
4688
+ }
4689
+ ],
4690
+ "thinker.model.layers.4.mlp.up_proj": [
4691
+ {
4692
+ "alpha": 8,
4693
+ "end": 8,
4694
+ "rank": 8,
4695
+ "start": 0
4696
+ }
4697
+ ],
4698
+ "thinker.model.layers.4.self_attn.k_proj": [
4699
+ {
4700
+ "alpha": 8,
4701
+ "end": 8,
4702
+ "rank": 8,
4703
+ "start": 0
4704
+ }
4705
+ ],
4706
+ "thinker.model.layers.4.self_attn.o_proj": [
4707
+ {
4708
+ "alpha": 8,
4709
+ "end": 8,
4710
+ "rank": 8,
4711
+ "start": 0
4712
+ }
4713
+ ],
4714
+ "thinker.model.layers.4.self_attn.q_proj": [
4715
+ {
4716
+ "alpha": 8,
4717
+ "end": 8,
4718
+ "rank": 8,
4719
+ "start": 0
4720
+ }
4721
+ ],
4722
+ "thinker.model.layers.4.self_attn.v_proj": [
4723
+ {
4724
+ "alpha": 8,
4725
+ "end": 8,
4726
+ "rank": 8,
4727
+ "start": 0
4728
+ }
4729
+ ],
4730
+ "thinker.model.layers.5.mlp.down_proj": [
4731
+ {
4732
+ "alpha": 8,
4733
+ "end": 8,
4734
+ "rank": 8,
4735
+ "start": 0
4736
+ }
4737
+ ],
4738
+ "thinker.model.layers.5.mlp.gate_proj": [
4739
+ {
4740
+ "alpha": 8,
4741
+ "end": 8,
4742
+ "rank": 8,
4743
+ "start": 0
4744
+ }
4745
+ ],
4746
+ "thinker.model.layers.5.mlp.up_proj": [
4747
+ {
4748
+ "alpha": 8,
4749
+ "end": 8,
4750
+ "rank": 8,
4751
+ "start": 0
4752
+ }
4753
+ ],
4754
+ "thinker.model.layers.5.self_attn.k_proj": [
4755
+ {
4756
+ "alpha": 8,
4757
+ "end": 8,
4758
+ "rank": 8,
4759
+ "start": 0
4760
+ }
4761
+ ],
4762
+ "thinker.model.layers.5.self_attn.o_proj": [
4763
+ {
4764
+ "alpha": 8,
4765
+ "end": 8,
4766
+ "rank": 8,
4767
+ "start": 0
4768
+ }
4769
+ ],
4770
+ "thinker.model.layers.5.self_attn.q_proj": [
4771
+ {
4772
+ "alpha": 8,
4773
+ "end": 8,
4774
+ "rank": 8,
4775
+ "start": 0
4776
+ }
4777
+ ],
4778
+ "thinker.model.layers.5.self_attn.v_proj": [
4779
+ {
4780
+ "alpha": 8,
4781
+ "end": 8,
4782
+ "rank": 8,
4783
+ "start": 0
4784
+ }
4785
+ ],
4786
+ "thinker.model.layers.6.mlp.down_proj": [
4787
+ {
4788
+ "alpha": 8,
4789
+ "end": 8,
4790
+ "rank": 8,
4791
+ "start": 0
4792
+ }
4793
+ ],
4794
+ "thinker.model.layers.6.mlp.gate_proj": [
4795
+ {
4796
+ "alpha": 8,
4797
+ "end": 8,
4798
+ "rank": 8,
4799
+ "start": 0
4800
+ }
4801
+ ],
4802
+ "thinker.model.layers.6.mlp.up_proj": [
4803
+ {
4804
+ "alpha": 8,
4805
+ "end": 8,
4806
+ "rank": 8,
4807
+ "start": 0
4808
+ }
4809
+ ],
4810
+ "thinker.model.layers.6.self_attn.k_proj": [
4811
+ {
4812
+ "alpha": 8,
4813
+ "end": 8,
4814
+ "rank": 8,
4815
+ "start": 0
4816
+ }
4817
+ ],
4818
+ "thinker.model.layers.6.self_attn.o_proj": [
4819
+ {
4820
+ "alpha": 8,
4821
+ "end": 8,
4822
+ "rank": 8,
4823
+ "start": 0
4824
+ }
4825
+ ],
4826
+ "thinker.model.layers.6.self_attn.q_proj": [
4827
+ {
4828
+ "alpha": 8,
4829
+ "end": 8,
4830
+ "rank": 8,
4831
+ "start": 0
4832
+ }
4833
+ ],
4834
+ "thinker.model.layers.6.self_attn.v_proj": [
4835
+ {
4836
+ "alpha": 8,
4837
+ "end": 8,
4838
+ "rank": 8,
4839
+ "start": 0
4840
+ }
4841
+ ],
4842
+ "thinker.model.layers.7.mlp.down_proj": [
4843
+ {
4844
+ "alpha": 8,
4845
+ "end": 8,
4846
+ "rank": 8,
4847
+ "start": 0
4848
+ }
4849
+ ],
4850
+ "thinker.model.layers.7.mlp.gate_proj": [
4851
+ {
4852
+ "alpha": 8,
4853
+ "end": 8,
4854
+ "rank": 8,
4855
+ "start": 0
4856
+ }
4857
+ ],
4858
+ "thinker.model.layers.7.mlp.up_proj": [
4859
+ {
4860
+ "alpha": 8,
4861
+ "end": 8,
4862
+ "rank": 8,
4863
+ "start": 0
4864
+ }
4865
+ ],
4866
+ "thinker.model.layers.7.self_attn.k_proj": [
4867
+ {
4868
+ "alpha": 8,
4869
+ "end": 8,
4870
+ "rank": 8,
4871
+ "start": 0
4872
+ }
4873
+ ],
4874
+ "thinker.model.layers.7.self_attn.o_proj": [
4875
+ {
4876
+ "alpha": 8,
4877
+ "end": 8,
4878
+ "rank": 8,
4879
+ "start": 0
4880
+ }
4881
+ ],
4882
+ "thinker.model.layers.7.self_attn.q_proj": [
4883
+ {
4884
+ "alpha": 8,
4885
+ "end": 8,
4886
+ "rank": 8,
4887
+ "start": 0
4888
+ }
4889
+ ],
4890
+ "thinker.model.layers.7.self_attn.v_proj": [
4891
+ {
4892
+ "alpha": 8,
4893
+ "end": 8,
4894
+ "rank": 8,
4895
+ "start": 0
4896
+ }
4897
+ ],
4898
+ "thinker.model.layers.8.mlp.down_proj": [
4899
+ {
4900
+ "alpha": 8,
4901
+ "end": 8,
4902
+ "rank": 8,
4903
+ "start": 0
4904
+ }
4905
+ ],
4906
+ "thinker.model.layers.8.mlp.gate_proj": [
4907
+ {
4908
+ "alpha": 8,
4909
+ "end": 8,
4910
+ "rank": 8,
4911
+ "start": 0
4912
+ }
4913
+ ],
4914
+ "thinker.model.layers.8.mlp.up_proj": [
4915
+ {
4916
+ "alpha": 8,
4917
+ "end": 8,
4918
+ "rank": 8,
4919
+ "start": 0
4920
+ }
4921
+ ],
4922
+ "thinker.model.layers.8.self_attn.k_proj": [
4923
+ {
4924
+ "alpha": 8,
4925
+ "end": 8,
4926
+ "rank": 8,
4927
+ "start": 0
4928
+ }
4929
+ ],
4930
+ "thinker.model.layers.8.self_attn.o_proj": [
4931
+ {
4932
+ "alpha": 8,
4933
+ "end": 8,
4934
+ "rank": 8,
4935
+ "start": 0
4936
+ }
4937
+ ],
4938
+ "thinker.model.layers.8.self_attn.q_proj": [
4939
+ {
4940
+ "alpha": 8,
4941
+ "end": 8,
4942
+ "rank": 8,
4943
+ "start": 0
4944
+ }
4945
+ ],
4946
+ "thinker.model.layers.8.self_attn.v_proj": [
4947
+ {
4948
+ "alpha": 8,
4949
+ "end": 8,
4950
+ "rank": 8,
4951
+ "start": 0
4952
+ }
4953
+ ],
4954
+ "thinker.model.layers.9.mlp.down_proj": [
4955
+ {
4956
+ "alpha": 8,
4957
+ "end": 8,
4958
+ "rank": 8,
4959
+ "start": 0
4960
+ }
4961
+ ],
4962
+ "thinker.model.layers.9.mlp.gate_proj": [
4963
+ {
4964
+ "alpha": 8,
4965
+ "end": 8,
4966
+ "rank": 8,
4967
+ "start": 0
4968
+ }
4969
+ ],
4970
+ "thinker.model.layers.9.mlp.up_proj": [
4971
+ {
4972
+ "alpha": 8,
4973
+ "end": 8,
4974
+ "rank": 8,
4975
+ "start": 0
4976
+ }
4977
+ ],
4978
+ "thinker.model.layers.9.self_attn.k_proj": [
4979
+ {
4980
+ "alpha": 8,
4981
+ "end": 8,
4982
+ "rank": 8,
4983
+ "start": 0
4984
+ }
4985
+ ],
4986
+ "thinker.model.layers.9.self_attn.o_proj": [
4987
+ {
4988
+ "alpha": 8,
4989
+ "end": 8,
4990
+ "rank": 8,
4991
+ "start": 0
4992
+ }
4993
+ ],
4994
+ "thinker.model.layers.9.self_attn.q_proj": [
4995
+ {
4996
+ "alpha": 8,
4997
+ "end": 8,
4998
+ "rank": 8,
4999
+ "start": 0
5000
+ }
5001
+ ],
5002
+ "thinker.model.layers.9.self_attn.v_proj": [
5003
+ {
5004
+ "alpha": 8,
5005
+ "end": 8,
5006
+ "rank": 8,
5007
+ "start": 0
5008
+ }
5009
+ ]
5010
+ }