zhifeixie commited on
Commit
72ee2c2
·
verified ·
1 Parent(s): 02abc58

Add files using upload-large-folder tool

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. .gitattributes +2 -0
  2. lora/lora-stage1/README.md +207 -0
  3. lora/lora-stage1/adapter_config.json +38 -0
  4. lora/lora-stage1/adapter_model.safetensors +3 -0
  5. lora/lora-stage1/added_tokens.json +64 -0
  6. lora/lora-stage1/base_model.txt +1 -0
  7. lora/lora-stage1/chat_template.jinja +31 -0
  8. lora/lora-stage1/chat_template.json +1 -0
  9. lora/lora-stage1/config.json +221 -0
  10. lora/lora-stage1/generation_config.json +7 -0
  11. lora/lora-stage1/merges.txt +0 -0
  12. lora/lora-stage1/preprocessor_config.json +14 -0
  13. lora/lora-stage1/rng_state_0.pth +3 -0
  14. lora/lora-stage1/special_tokens_map.json +44 -0
  15. lora/lora-stage1/tokenizer.json +3 -0
  16. lora/lora-stage1/tokenizer_config.json +549 -0
  17. lora/lora-stage1/trainer_state.json +774 -0
  18. lora/lora-stage1/vocab.json +0 -0
  19. lora/lora-stage2/README.md +207 -0
  20. lora/lora-stage2/adapter_config.json +38 -0
  21. lora/lora-stage2/adapter_model.safetensors +3 -0
  22. lora/lora-stage2/added_tokens.json +64 -0
  23. lora/lora-stage2/base_model.txt +1 -0
  24. lora/lora-stage2/chat_template.jinja +31 -0
  25. lora/lora-stage2/chat_template.json +1 -0
  26. lora/lora-stage2/config.json +221 -0
  27. lora/lora-stage2/generation_config.json +7 -0
  28. lora/lora-stage2/merged_from_lora.txt +1 -0
  29. lora/lora-stage2/merges.txt +0 -0
  30. lora/lora-stage2/optimizer.pt +3 -0
  31. lora/lora-stage2/preprocessor_config.json +14 -0
  32. lora/lora-stage2/rng_state_0.pth +3 -0
  33. lora/lora-stage2/rng_state_1.pth +3 -0
  34. lora/lora-stage2/scheduler.pt +3 -0
  35. lora/lora-stage2/special_tokens_map.json +44 -0
  36. lora/lora-stage2/tokenizer.json +3 -0
  37. lora/lora-stage2/tokenizer_config.json +549 -0
  38. lora/lora-stage2/trainer_state.json +0 -0
  39. lora/lora-stage2/vocab.json +0 -0
  40. lora/lora-stage3/README.md +207 -0
  41. lora/lora-stage3/adapter_config.json +38 -0
  42. lora/lora-stage3/adapter_model.safetensors +3 -0
  43. lora/lora-stage3/additional_config.json +1 -0
  44. lora/lora-stage3/args.json +502 -0
  45. lora/lora-stage3/optimizer.pt +3 -0
  46. lora/lora-stage3/rng_state_0.pth +3 -0
  47. lora/lora-stage3/rng_state_1.pth +3 -0
  48. lora/lora-stage3/rng_state_2.pth +3 -0
  49. lora/lora-stage3/scheduler.pt +3 -0
  50. lora/lora-stage3/trainer_state.json +0 -0
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ lora/lora-stage1/tokenizer.json filter=lfs diff=lfs merge=lfs -text
37
+ lora/lora-stage2/tokenizer.json filter=lfs diff=lfs merge=lfs -text
lora/lora-stage1/README.md ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: ''
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - 'base_model:adapter:'
7
+ - lora
8
+ - transformers
9
+ ---
10
+
11
+ # Model Card for Model ID
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ <!-- Provide a longer summary of what this model is. -->
22
+
23
+
24
+
25
+ - **Developed by:** [More Information Needed]
26
+ - **Funded by [optional]:** [More Information Needed]
27
+ - **Shared by [optional]:** [More Information Needed]
28
+ - **Model type:** [More Information Needed]
29
+ - **Language(s) (NLP):** [More Information Needed]
30
+ - **License:** [More Information Needed]
31
+ - **Finetuned from model [optional]:** [More Information Needed]
32
+
33
+ ### Model Sources [optional]
34
+
35
+ <!-- Provide the basic links for the model. -->
36
+
37
+ - **Repository:** [More Information Needed]
38
+ - **Paper [optional]:** [More Information Needed]
39
+ - **Demo [optional]:** [More Information Needed]
40
+
41
+ ## Uses
42
+
43
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
+
45
+ ### Direct Use
46
+
47
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Downstream Use [optional]
52
+
53
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
+
55
+ [More Information Needed]
56
+
57
+ ### Out-of-Scope Use
58
+
59
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ## Bias, Risks, and Limitations
64
+
65
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
+
67
+ [More Information Needed]
68
+
69
+ ### Recommendations
70
+
71
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
+
73
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
+
75
+ ## How to Get Started with the Model
76
+
77
+ Use the code below to get started with the model.
78
+
79
+ [More Information Needed]
80
+
81
+ ## Training Details
82
+
83
+ ### Training Data
84
+
85
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
+
87
+ [More Information Needed]
88
+
89
+ ### Training Procedure
90
+
91
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
+
93
+ #### Preprocessing [optional]
94
+
95
+ [More Information Needed]
96
+
97
+
98
+ #### Training Hyperparameters
99
+
100
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
+
102
+ #### Speeds, Sizes, Times [optional]
103
+
104
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
+
106
+ [More Information Needed]
107
+
108
+ ## Evaluation
109
+
110
+ <!-- This section describes the evaluation protocols and provides the results. -->
111
+
112
+ ### Testing Data, Factors & Metrics
113
+
114
+ #### Testing Data
115
+
116
+ <!-- This should link to a Dataset Card if possible. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Factors
121
+
122
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
+
124
+ [More Information Needed]
125
+
126
+ #### Metrics
127
+
128
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
+
130
+ [More Information Needed]
131
+
132
+ ### Results
133
+
134
+ [More Information Needed]
135
+
136
+ #### Summary
137
+
138
+
139
+
140
+ ## Model Examination [optional]
141
+
142
+ <!-- Relevant interpretability work for the model goes here -->
143
+
144
+ [More Information Needed]
145
+
146
+ ## Environmental Impact
147
+
148
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
+
150
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
+
152
+ - **Hardware Type:** [More Information Needed]
153
+ - **Hours used:** [More Information Needed]
154
+ - **Cloud Provider:** [More Information Needed]
155
+ - **Compute Region:** [More Information Needed]
156
+ - **Carbon Emitted:** [More Information Needed]
157
+
158
+ ## Technical Specifications [optional]
159
+
160
+ ### Model Architecture and Objective
161
+
162
+ [More Information Needed]
163
+
164
+ ### Compute Infrastructure
165
+
166
+ [More Information Needed]
167
+
168
+ #### Hardware
169
+
170
+ [More Information Needed]
171
+
172
+ #### Software
173
+
174
+ [More Information Needed]
175
+
176
+ ## Citation [optional]
177
+
178
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
+
180
+ **BibTeX:**
181
+
182
+ [More Information Needed]
183
+
184
+ **APA:**
185
+
186
+ [More Information Needed]
187
+
188
+ ## Glossary [optional]
189
+
190
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
+
192
+ [More Information Needed]
193
+
194
+ ## More Information [optional]
195
+
196
+ [More Information Needed]
197
+
198
+ ## Model Card Authors [optional]
199
+
200
+ [More Information Needed]
201
+
202
+ ## Model Card Contact
203
+
204
+ [More Information Needed]
205
+ ### Framework versions
206
+
207
+ - PEFT 0.18.1
lora/lora-stage1/adapter_config.json ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alora_invocation_tokens": null,
3
+ "alpha_pattern": {},
4
+ "arrow_config": null,
5
+ "auto_mapping": null,
6
+ "base_model_name_or_path": "",
7
+ "bias": "none",
8
+ "corda_config": null,
9
+ "ensure_weight_tying": false,
10
+ "eva_config": null,
11
+ "exclude_modules": null,
12
+ "fan_in_fan_out": false,
13
+ "inference_mode": true,
14
+ "init_lora_weights": true,
15
+ "layer_replication": null,
16
+ "layers_pattern": null,
17
+ "layers_to_transform": null,
18
+ "loftq_config": {},
19
+ "lora_alpha": 16,
20
+ "lora_bias": false,
21
+ "lora_dropout": 0.05,
22
+ "megatron_config": null,
23
+ "megatron_core": "megatron.core",
24
+ "modules_to_save": null,
25
+ "peft_type": "LORA",
26
+ "peft_version": "0.18.1",
27
+ "qalora_group_size": 16,
28
+ "r": 8,
29
+ "rank_pattern": {},
30
+ "revision": null,
31
+ "target_modules": "^(audio_tower\\.(conv_out|proj1|proj2)$|audio_tower\\.layers\\.(20|21|22|23)\\..*\\.(q_proj|k_proj|v_proj|out_proj|fc1|fc2)$)",
32
+ "target_parameters": null,
33
+ "task_type": "CAUSAL_LM",
34
+ "trainable_token_indices": null,
35
+ "use_dora": false,
36
+ "use_qalora": false,
37
+ "use_rslora": false
38
+ }
lora/lora-stage1/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:31052a993cbb582a250886db7dfcc327ab86ee8adc5229882bd48227b892c752
3
+ size 1496072
lora/lora-stage1/added_tokens.json ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</think>": 151668,
3
+ "</tool_call>": 151658,
4
+ "</tool_response>": 151666,
5
+ "<asr_text>": 151704,
6
+ "<blank10>": 151686,
7
+ "<blank11>": 151687,
8
+ "<blank12>": 151688,
9
+ "<blank13>": 151689,
10
+ "<blank14>": 151690,
11
+ "<blank15>": 151691,
12
+ "<blank16>": 151692,
13
+ "<blank17>": 151693,
14
+ "<blank18>": 151694,
15
+ "<blank19>": 151695,
16
+ "<blank1>": 151677,
17
+ "<blank20>": 151696,
18
+ "<blank21>": 151697,
19
+ "<blank22>": 151698,
20
+ "<blank23>": 151699,
21
+ "<blank24>": 151700,
22
+ "<blank25>": 151701,
23
+ "<blank26>": 151702,
24
+ "<blank27>": 151703,
25
+ "<blank2>": 151678,
26
+ "<blank3>": 151679,
27
+ "<blank4>": 151680,
28
+ "<blank5>": 151681,
29
+ "<blank6>": 151682,
30
+ "<blank7>": 151683,
31
+ "<blank8>": 151684,
32
+ "<blank9>": 151685,
33
+ "<non_speech>": 151675,
34
+ "<think>": 151667,
35
+ "<tool_call>": 151657,
36
+ "<tool_response>": 151665,
37
+ "<tts_pad>": 151671,
38
+ "<tts_text_bos>": 151672,
39
+ "<tts_text_bos_single>": 151674,
40
+ "<tts_text_eod>": 151673,
41
+ "<|audio_end|>": 151670,
42
+ "<|audio_pad|>": 151676,
43
+ "<|audio_start|>": 151669,
44
+ "<|box_end|>": 151649,
45
+ "<|box_start|>": 151648,
46
+ "<|endoftext|>": 151643,
47
+ "<|file_sep|>": 151664,
48
+ "<|fim_middle|>": 151660,
49
+ "<|fim_pad|>": 151662,
50
+ "<|fim_prefix|>": 151659,
51
+ "<|fim_suffix|>": 151661,
52
+ "<|im_end|>": 151645,
53
+ "<|im_start|>": 151644,
54
+ "<|image_pad|>": 151655,
55
+ "<|object_ref_end|>": 151647,
56
+ "<|object_ref_start|>": 151646,
57
+ "<|quad_end|>": 151651,
58
+ "<|quad_start|>": 151650,
59
+ "<|repo_name|>": 151663,
60
+ "<|video_pad|>": 151656,
61
+ "<|vision_end|>": 151653,
62
+ "<|vision_pad|>": 151654,
63
+ "<|vision_start|>": 151652
64
+ }
lora/lora-stage1/base_model.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ /data/haobin/pky_train/qwen3/Qwen3-ASR-1.7B
lora/lora-stage1/chat_template.jinja ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- set ns = namespace(system_text="") -%}
2
+ {%- for m in messages -%}
3
+ {%- if m.role == 'system' -%}
4
+ {%- if m.content is string -%}
5
+ {%- set ns.system_text = ns.system_text + m.content -%}
6
+ {%- else -%}
7
+ {%- for c in m.content -%}
8
+ {%- if c.type == 'text' and (c.text is defined) -%}
9
+ {%- set ns.system_text = ns.system_text + c.text -%}
10
+ {%- endif -%}
11
+ {%- endfor -%}
12
+ {%- endif -%}
13
+ {%- endif -%}
14
+ {%- endfor -%}
15
+
16
+ {%- set ns2 = namespace(audio_tokens="") -%}
17
+ {%- for m in messages -%}
18
+ {%- if m.content is not string -%}
19
+ {%- for c in m.content -%}
20
+ {%- if c.type == 'audio' or ('audio' in c) or ('audio_url' in c) -%}
21
+ {%- set ns2.audio_tokens = ns2.audio_tokens + "<|audio_start|><|audio_pad|><|audio_end|>" -%}
22
+ {%- endif -%}
23
+ {%- endfor -%}
24
+ {%- endif -%}
25
+ {%- endfor -%}
26
+
27
+ {{- '<|im_start|>system\n' + (ns.system_text if ns.system_text is string else '') + '<|im_end|>\n' -}}
28
+ {{- '<|im_start|>user\n' + ns2.audio_tokens + '<|im_end|>\n' -}}
29
+ {%- if add_generation_prompt -%}
30
+ {{- '<|im_start|>assistant\n' -}}
31
+ {%- endif -%}
lora/lora-stage1/chat_template.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"chat_template": "{%- set ns = namespace(system_text=\"\") -%}\n{%- for m in messages -%}\n {%- if m.role == 'system' -%}\n {%- if m.content is string -%}\n {%- set ns.system_text = ns.system_text + m.content -%}\n {%- else -%}\n {%- for c in m.content -%}\n {%- if c.type == 'text' and (c.text is defined) -%}\n {%- set ns.system_text = ns.system_text + c.text -%}\n {%- endif -%}\n {%- endfor -%}\n {%- endif -%}\n {%- endif -%}\n{%- endfor -%}\n\n{%- set ns2 = namespace(audio_tokens=\"\") -%}\n{%- for m in messages -%}\n {%- if m.content is not string -%}\n {%- for c in m.content -%}\n {%- if c.type == 'audio' or ('audio' in c) or ('audio_url' in c) -%}\n {%- set ns2.audio_tokens = ns2.audio_tokens + \"<|audio_start|><|audio_pad|><|audio_end|>\" -%}\n {%- endif -%}\n {%- endfor -%}\n {%- endif -%}\n{%- endfor -%}\n\n{{- '<|im_start|>system\\n' + (ns.system_text if ns.system_text is string else '') + '<|im_end|>\\n' -}}\n{{- '<|im_start|>user\\n' + ns2.audio_tokens + '<|im_end|>\\n' -}}\n{%- if add_generation_prompt -%}\n{{- '<|im_start|>assistant\\n' -}}\n{%- endif -%}"}
lora/lora-stage1/config.json ADDED
@@ -0,0 +1,221 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "Qwen3ASRForConditionalGeneration"
4
+ ],
5
+ "model_type": "qwen3_asr",
6
+ "support_languages": [
7
+ "Chinese",
8
+ "English",
9
+ "Cantonese",
10
+ "Arabic",
11
+ "German",
12
+ "French",
13
+ "Spanish",
14
+ "Portuguese",
15
+ "Indonesian",
16
+ "Italian",
17
+ "Korean",
18
+ "Russian",
19
+ "Thai",
20
+ "Vietnamese",
21
+ "Japanese",
22
+ "Turkish",
23
+ "Hindi",
24
+ "Malay",
25
+ "Dutch",
26
+ "Swedish",
27
+ "Danish",
28
+ "Finnish",
29
+ "Polish",
30
+ "Czech",
31
+ "Filipino",
32
+ "Persian",
33
+ "Greek",
34
+ "Romanian",
35
+ "Hungarian",
36
+ "Macedonian"
37
+ ],
38
+ "thinker_config": {
39
+ "model_type": "qwen3_asr",
40
+ "architectures": [
41
+ "Qwen3ASRForConditionalGeneration"
42
+ ],
43
+ "audio_config": {
44
+ "_name_or_path": "",
45
+ "activation_dropout": 0,
46
+ "activation_function": "gelu",
47
+ "add_cross_attention": false,
48
+ "architectures": null,
49
+ "attention_dropout": 0,
50
+ "bad_words_ids": null,
51
+ "begin_suppress_tokens": null,
52
+ "bos_token_id": null,
53
+ "chunk_size_feed_forward": 0,
54
+ "conv_chunksize": 500,
55
+ "cross_attention_hidden_size": null,
56
+ "d_model": 1024,
57
+ "decoder_start_token_id": null,
58
+ "diversity_penalty": 0.0,
59
+ "do_sample": false,
60
+ "downsample_hidden_size": 480,
61
+ "dropout": 0,
62
+ "dtype": null,
63
+ "early_stopping": false,
64
+ "encoder_attention_heads": 16,
65
+ "encoder_ffn_dim": 4096,
66
+ "encoder_layers": 24,
67
+ "encoder_no_repeat_ngram_size": 0,
68
+ "eos_token_id": null,
69
+ "exponential_decay_length_penalty": null,
70
+ "finetuning_task": null,
71
+ "forced_bos_token_id": null,
72
+ "forced_eos_token_id": null,
73
+ "id2label": {
74
+ "0": "LABEL_0",
75
+ "1": "LABEL_1"
76
+ },
77
+ "initializer_range": 0.02,
78
+ "is_decoder": false,
79
+ "is_encoder_decoder": false,
80
+ "label2id": {
81
+ "LABEL_0": 0,
82
+ "LABEL_1": 1
83
+ },
84
+ "length_penalty": 1.0,
85
+ "max_length": 20,
86
+ "max_source_positions": 1500,
87
+ "min_length": 0,
88
+ "model_type": "qwen3_asr_audio_encoder",
89
+ "n_window": 50,
90
+ "n_window_infer": 800,
91
+ "no_repeat_ngram_size": 0,
92
+ "num_beam_groups": 1,
93
+ "num_beams": 1,
94
+ "num_hidden_layers": 24,
95
+ "num_mel_bins": 128,
96
+ "num_return_sequences": 1,
97
+ "output_attentions": false,
98
+ "output_dim": 2048,
99
+ "output_hidden_states": false,
100
+ "output_scores": false,
101
+ "pad_token_id": null,
102
+ "prefix": null,
103
+ "problem_type": null,
104
+ "pruned_heads": {},
105
+ "remove_invalid_values": false,
106
+ "repetition_penalty": 1.0,
107
+ "return_dict": true,
108
+ "return_dict_in_generate": false,
109
+ "scale_embedding": false,
110
+ "sep_token_id": null,
111
+ "suppress_tokens": null,
112
+ "task_specific_params": null,
113
+ "temperature": 1.0,
114
+ "tf_legacy_loss": false,
115
+ "tie_encoder_decoder": false,
116
+ "tie_word_embeddings": true,
117
+ "tokenizer_class": null,
118
+ "top_k": 50,
119
+ "top_p": 1.0,
120
+ "torchscript": false,
121
+ "typical_p": 1.0,
122
+ "use_bfloat16": false
123
+ },
124
+ "audio_end_token_id": 151670,
125
+ "audio_start_token_id": 151669,
126
+ "audio_token_id": 151676,
127
+ "dtype": "bfloat16",
128
+ "initializer_range": 0.02,
129
+ "text_config": {
130
+ "_name_or_path": "",
131
+ "add_cross_attention": false,
132
+ "architectures": null,
133
+ "attention_bias": false,
134
+ "attention_dropout": 0.0,
135
+ "bad_words_ids": null,
136
+ "begin_suppress_tokens": null,
137
+ "bos_token_id": null,
138
+ "chunk_size_feed_forward": 0,
139
+ "cross_attention_hidden_size": null,
140
+ "decoder_start_token_id": null,
141
+ "diversity_penalty": 0.0,
142
+ "do_sample": false,
143
+ "dtype": null,
144
+ "early_stopping": false,
145
+ "encoder_no_repeat_ngram_size": 0,
146
+ "eos_token_id": null,
147
+ "exponential_decay_length_penalty": null,
148
+ "finetuning_task": null,
149
+ "forced_bos_token_id": null,
150
+ "forced_eos_token_id": null,
151
+ "head_dim": 128,
152
+ "hidden_act": "silu",
153
+ "hidden_size": 2048,
154
+ "id2label": {
155
+ "0": "LABEL_0",
156
+ "1": "LABEL_1"
157
+ },
158
+ "initializer_range": 0.02,
159
+ "intermediate_size": 6144,
160
+ "is_decoder": false,
161
+ "is_encoder_decoder": false,
162
+ "label2id": {
163
+ "LABEL_0": 0,
164
+ "LABEL_1": 1
165
+ },
166
+ "length_penalty": 1.0,
167
+ "max_length": 20,
168
+ "max_position_embeddings": 65536,
169
+ "min_length": 0,
170
+ "model_type": "qwen3",
171
+ "no_repeat_ngram_size": 0,
172
+ "num_attention_heads": 16,
173
+ "num_beam_groups": 1,
174
+ "num_beams": 1,
175
+ "num_hidden_layers": 28,
176
+ "num_key_value_heads": 8,
177
+ "num_return_sequences": 1,
178
+ "output_attentions": false,
179
+ "output_hidden_states": false,
180
+ "output_scores": false,
181
+ "pad_token_id": null,
182
+ "prefix": null,
183
+ "problem_type": null,
184
+ "pruned_heads": {},
185
+ "remove_invalid_values": false,
186
+ "repetition_penalty": 1.0,
187
+ "return_dict": true,
188
+ "return_dict_in_generate": false,
189
+ "rms_norm_eps": 1e-06,
190
+ "rope_scaling": {
191
+ "interleaved": true,
192
+ "mrope_interleaved": true,
193
+ "mrope_section": [
194
+ 24,
195
+ 20,
196
+ 20
197
+ ],
198
+ "rope_type": "default",
199
+ "type": "default"
200
+ },
201
+ "rope_theta": 1000000,
202
+ "sep_token_id": null,
203
+ "suppress_tokens": null,
204
+ "task_specific_params": null,
205
+ "temperature": 1.0,
206
+ "tf_legacy_loss": false,
207
+ "tie_encoder_decoder": false,
208
+ "tie_word_embeddings": true,
209
+ "tokenizer_class": null,
210
+ "top_k": 50,
211
+ "top_p": 1.0,
212
+ "torchscript": false,
213
+ "typical_p": 1.0,
214
+ "use_bfloat16": false,
215
+ "use_cache": true,
216
+ "vocab_size": 151936
217
+ }
218
+ },
219
+ "transformers_version": "4.57.6"
220
+ }
221
+
lora/lora-stage1/generation_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "eos_token_id": [151643,151645],
4
+ "pad_token_id": 151643,
5
+ "do_sample": false,
6
+ "temperature": 0.000001
7
+ }
lora/lora-stage1/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
lora/lora-stage1/preprocessor_config.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "chunk_length": 30,
3
+ "dither": 0.0,
4
+ "feature_extractor_type": "WhisperFeatureExtractor",
5
+ "feature_size": 128,
6
+ "hop_length": 160,
7
+ "n_fft": 400,
8
+ "n_samples": 480000,
9
+ "nb_max_frames": 3000,
10
+ "padding_side": "right",
11
+ "padding_value": 0.0,
12
+ "processor_class": "Qwen3ASRProcessor",
13
+ "return_attention_mask": true
14
+ }
lora/lora-stage1/rng_state_0.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:916059f3d5e18a65741db0b5dc2209e8c6aad0736bace4b346dacc3a0ed5408c
3
+ size 14917
lora/lora-stage1/special_tokens_map.json ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|im_start|>",
4
+ "<|im_end|>",
5
+ "<|object_ref_start|>",
6
+ "<|object_ref_end|>",
7
+ "<|box_start|>",
8
+ "<|box_end|>",
9
+ "<|quad_start|>",
10
+ "<|quad_end|>",
11
+ "<|vision_start|>",
12
+ "<|vision_end|>",
13
+ "<|vision_pad|>",
14
+ "<|image_pad|>",
15
+ "<|video_pad|>",
16
+ "<|audio_start|>",
17
+ "<|audio_end|>",
18
+ "<tts_pad>",
19
+ "<tts_text_bos>",
20
+ "<tts_text_bos_single>",
21
+ "<|audio_pad|>"
22
+ ],
23
+ "audio_bos_token": "<|audio_start|>",
24
+ "audio_eos_token": "<|audio_end|>",
25
+ "audio_token": "<|audio_pad|>",
26
+ "eos_token": {
27
+ "content": "<|im_end|>",
28
+ "lstrip": false,
29
+ "normalized": false,
30
+ "rstrip": false,
31
+ "single_word": false
32
+ },
33
+ "image_token": "<|image_pad|>",
34
+ "pad_token": {
35
+ "content": "<|endoftext|>",
36
+ "lstrip": false,
37
+ "normalized": false,
38
+ "rstrip": false,
39
+ "single_word": false
40
+ },
41
+ "video_token": "<|video_pad|>",
42
+ "vision_bos_token": "<|vision_start|>",
43
+ "vision_eos_token": "<|vision_end|>"
44
+ }
lora/lora-stage1/tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0499602714160467f2d68b910651d6216020689f1e016be87a2d0019ee3baeab
3
+ size 11429499
lora/lora-stage1/tokenizer_config.json ADDED
@@ -0,0 +1,549 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "151646": {
30
+ "content": "<|object_ref_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "151647": {
38
+ "content": "<|object_ref_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "151648": {
46
+ "content": "<|box_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "151649": {
54
+ "content": "<|box_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "151650": {
62
+ "content": "<|quad_start|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "151651": {
70
+ "content": "<|quad_end|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "151652": {
78
+ "content": "<|vision_start|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "151653": {
86
+ "content": "<|vision_end|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "151654": {
94
+ "content": "<|vision_pad|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "151655": {
102
+ "content": "<|image_pad|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "151656": {
110
+ "content": "<|video_pad|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "151657": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "151658": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "151659": {
134
+ "content": "<|fim_prefix|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "151660": {
142
+ "content": "<|fim_middle|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "151661": {
150
+ "content": "<|fim_suffix|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "151662": {
158
+ "content": "<|fim_pad|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "151663": {
166
+ "content": "<|repo_name|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "151664": {
174
+ "content": "<|file_sep|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ },
181
+ "151665": {
182
+ "content": "<tool_response>",
183
+ "lstrip": false,
184
+ "normalized": false,
185
+ "rstrip": false,
186
+ "single_word": false,
187
+ "special": false
188
+ },
189
+ "151666": {
190
+ "content": "</tool_response>",
191
+ "lstrip": false,
192
+ "normalized": false,
193
+ "rstrip": false,
194
+ "single_word": false,
195
+ "special": false
196
+ },
197
+ "151667": {
198
+ "content": "<think>",
199
+ "lstrip": false,
200
+ "normalized": false,
201
+ "rstrip": false,
202
+ "single_word": false,
203
+ "special": false
204
+ },
205
+ "151668": {
206
+ "content": "</think>",
207
+ "lstrip": false,
208
+ "normalized": false,
209
+ "rstrip": false,
210
+ "single_word": false,
211
+ "special": false
212
+ },
213
+ "151669": {
214
+ "content": "<|audio_start|>",
215
+ "lstrip": false,
216
+ "normalized": false,
217
+ "rstrip": false,
218
+ "single_word": false,
219
+ "special": true
220
+ },
221
+ "151670": {
222
+ "content": "<|audio_end|>",
223
+ "lstrip": false,
224
+ "normalized": false,
225
+ "rstrip": false,
226
+ "single_word": false,
227
+ "special": true
228
+ },
229
+ "151671": {
230
+ "content": "<tts_pad>",
231
+ "lstrip": false,
232
+ "normalized": false,
233
+ "rstrip": false,
234
+ "single_word": false,
235
+ "special": true
236
+ },
237
+ "151672": {
238
+ "content": "<tts_text_bos>",
239
+ "lstrip": false,
240
+ "normalized": false,
241
+ "rstrip": false,
242
+ "single_word": false,
243
+ "special": true
244
+ },
245
+ "151673": {
246
+ "content": "<tts_text_eod>",
247
+ "lstrip": false,
248
+ "normalized": false,
249
+ "rstrip": false,
250
+ "single_word": false,
251
+ "special": true
252
+ },
253
+ "151674": {
254
+ "content": "<tts_text_bos_single>",
255
+ "lstrip": false,
256
+ "normalized": false,
257
+ "rstrip": false,
258
+ "single_word": false,
259
+ "special": true
260
+ },
261
+ "151675": {
262
+ "content": "<non_speech>",
263
+ "lstrip": false,
264
+ "normalized": false,
265
+ "rstrip": false,
266
+ "single_word": false,
267
+ "special": false
268
+ },
269
+ "151676": {
270
+ "content": "<|audio_pad|>",
271
+ "lstrip": false,
272
+ "normalized": false,
273
+ "rstrip": false,
274
+ "single_word": false,
275
+ "special": true
276
+ },
277
+ "151677": {
278
+ "content": "<blank1>",
279
+ "lstrip": false,
280
+ "normalized": false,
281
+ "rstrip": false,
282
+ "single_word": false,
283
+ "special": true
284
+ },
285
+ "151678": {
286
+ "content": "<blank2>",
287
+ "lstrip": false,
288
+ "normalized": false,
289
+ "rstrip": false,
290
+ "single_word": false,
291
+ "special": true
292
+ },
293
+ "151679": {
294
+ "content": "<blank3>",
295
+ "lstrip": false,
296
+ "normalized": false,
297
+ "rstrip": false,
298
+ "single_word": false,
299
+ "special": true
300
+ },
301
+ "151680": {
302
+ "content": "<blank4>",
303
+ "lstrip": false,
304
+ "normalized": false,
305
+ "rstrip": false,
306
+ "single_word": false,
307
+ "special": true
308
+ },
309
+ "151681": {
310
+ "content": "<blank5>",
311
+ "lstrip": false,
312
+ "normalized": false,
313
+ "rstrip": false,
314
+ "single_word": false,
315
+ "special": true
316
+ },
317
+ "151682": {
318
+ "content": "<blank6>",
319
+ "lstrip": false,
320
+ "normalized": false,
321
+ "rstrip": false,
322
+ "single_word": false,
323
+ "special": true
324
+ },
325
+ "151683": {
326
+ "content": "<blank7>",
327
+ "lstrip": false,
328
+ "normalized": false,
329
+ "rstrip": false,
330
+ "single_word": false,
331
+ "special": true
332
+ },
333
+ "151684": {
334
+ "content": "<blank8>",
335
+ "lstrip": false,
336
+ "normalized": false,
337
+ "rstrip": false,
338
+ "single_word": false,
339
+ "special": true
340
+ },
341
+ "151685": {
342
+ "content": "<blank9>",
343
+ "lstrip": false,
344
+ "normalized": false,
345
+ "rstrip": false,
346
+ "single_word": false,
347
+ "special": true
348
+ },
349
+ "151686": {
350
+ "content": "<blank10>",
351
+ "lstrip": false,
352
+ "normalized": false,
353
+ "rstrip": false,
354
+ "single_word": false,
355
+ "special": true
356
+ },
357
+ "151687": {
358
+ "content": "<blank11>",
359
+ "lstrip": false,
360
+ "normalized": false,
361
+ "rstrip": false,
362
+ "single_word": false,
363
+ "special": true
364
+ },
365
+ "151688": {
366
+ "content": "<blank12>",
367
+ "lstrip": false,
368
+ "normalized": false,
369
+ "rstrip": false,
370
+ "single_word": false,
371
+ "special": true
372
+ },
373
+ "151689": {
374
+ "content": "<blank13>",
375
+ "lstrip": false,
376
+ "normalized": false,
377
+ "rstrip": false,
378
+ "single_word": false,
379
+ "special": true
380
+ },
381
+ "151690": {
382
+ "content": "<blank14>",
383
+ "lstrip": false,
384
+ "normalized": false,
385
+ "rstrip": false,
386
+ "single_word": false,
387
+ "special": true
388
+ },
389
+ "151691": {
390
+ "content": "<blank15>",
391
+ "lstrip": false,
392
+ "normalized": false,
393
+ "rstrip": false,
394
+ "single_word": false,
395
+ "special": true
396
+ },
397
+ "151692": {
398
+ "content": "<blank16>",
399
+ "lstrip": false,
400
+ "normalized": false,
401
+ "rstrip": false,
402
+ "single_word": false,
403
+ "special": true
404
+ },
405
+ "151693": {
406
+ "content": "<blank17>",
407
+ "lstrip": false,
408
+ "normalized": false,
409
+ "rstrip": false,
410
+ "single_word": false,
411
+ "special": true
412
+ },
413
+ "151694": {
414
+ "content": "<blank18>",
415
+ "lstrip": false,
416
+ "normalized": false,
417
+ "rstrip": false,
418
+ "single_word": false,
419
+ "special": true
420
+ },
421
+ "151695": {
422
+ "content": "<blank19>",
423
+ "lstrip": false,
424
+ "normalized": false,
425
+ "rstrip": false,
426
+ "single_word": false,
427
+ "special": true
428
+ },
429
+ "151696": {
430
+ "content": "<blank20>",
431
+ "lstrip": false,
432
+ "normalized": false,
433
+ "rstrip": false,
434
+ "single_word": false,
435
+ "special": true
436
+ },
437
+ "151697": {
438
+ "content": "<blank21>",
439
+ "lstrip": false,
440
+ "normalized": false,
441
+ "rstrip": false,
442
+ "single_word": false,
443
+ "special": true
444
+ },
445
+ "151698": {
446
+ "content": "<blank22>",
447
+ "lstrip": false,
448
+ "normalized": false,
449
+ "rstrip": false,
450
+ "single_word": false,
451
+ "special": true
452
+ },
453
+ "151699": {
454
+ "content": "<blank23>",
455
+ "lstrip": false,
456
+ "normalized": false,
457
+ "rstrip": false,
458
+ "single_word": false,
459
+ "special": true
460
+ },
461
+ "151700": {
462
+ "content": "<blank24>",
463
+ "lstrip": false,
464
+ "normalized": false,
465
+ "rstrip": false,
466
+ "single_word": false,
467
+ "special": true
468
+ },
469
+ "151701": {
470
+ "content": "<blank25>",
471
+ "lstrip": false,
472
+ "normalized": false,
473
+ "rstrip": false,
474
+ "single_word": false,
475
+ "special": true
476
+ },
477
+ "151702": {
478
+ "content": "<blank26>",
479
+ "lstrip": false,
480
+ "normalized": false,
481
+ "rstrip": false,
482
+ "single_word": false,
483
+ "special": true
484
+ },
485
+ "151703": {
486
+ "content": "<blank27>",
487
+ "lstrip": false,
488
+ "normalized": false,
489
+ "rstrip": false,
490
+ "single_word": false,
491
+ "special": true
492
+ },
493
+ "151704": {
494
+ "content": "<asr_text>",
495
+ "lstrip": false,
496
+ "normalized": false,
497
+ "rstrip": false,
498
+ "single_word": false,
499
+ "special": false
500
+ }
501
+ },
502
+ "additional_special_tokens": [
503
+ "<|im_start|>",
504
+ "<|im_end|>",
505
+ "<|object_ref_start|>",
506
+ "<|object_ref_end|>",
507
+ "<|box_start|>",
508
+ "<|box_end|>",
509
+ "<|quad_start|>",
510
+ "<|quad_end|>",
511
+ "<|vision_start|>",
512
+ "<|vision_end|>",
513
+ "<|vision_pad|>",
514
+ "<|image_pad|>",
515
+ "<|video_pad|>",
516
+ "<|audio_start|>",
517
+ "<|audio_end|>",
518
+ "<tts_pad>",
519
+ "<tts_text_bos>",
520
+ "<tts_text_bos_single>",
521
+ "<|audio_pad|>"
522
+ ],
523
+ "audio_bos_token": "<|audio_start|>",
524
+ "audio_eos_token": "<|audio_end|>",
525
+ "audio_token": "<|audio_pad|>",
526
+ "bos_token": null,
527
+ "clean_up_tokenization_spaces": false,
528
+ "eos_token": "<|im_end|>",
529
+ "errors": "replace",
530
+ "extra_special_tokens": {
531
+ "audio_bos_token": "<|audio_start|>",
532
+ "audio_eos_token": "<|audio_end|>",
533
+ "audio_token": "<|audio_pad|>",
534
+ "image_token": "<|image_pad|>",
535
+ "video_token": "<|video_pad|>",
536
+ "vision_bos_token": "<|vision_start|>",
537
+ "vision_eos_token": "<|vision_end|>"
538
+ },
539
+ "image_token": "<|image_pad|>",
540
+ "model_max_length": 131072,
541
+ "pad_token": "<|endoftext|>",
542
+ "processor_class": "Qwen3ASRProcessor",
543
+ "split_special_tokens": false,
544
+ "tokenizer_class": "Qwen2Tokenizer",
545
+ "unk_token": null,
546
+ "video_token": "<|video_pad|>",
547
+ "vision_bos_token": "<|vision_start|>",
548
+ "vision_eos_token": "<|vision_end|>"
549
+ }
lora/lora-stage1/trainer_state.json ADDED
@@ -0,0 +1,774 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": null,
3
+ "best_metric": null,
4
+ "best_model_checkpoint": null,
5
+ "epoch": 0.29335191228777824,
6
+ "eval_steps": 200,
7
+ "global_step": 1000,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.002933519122877782,
14
+ "grad_norm": 31.058988571166992,
15
+ "learning_rate": 2.6392961876832844e-08,
16
+ "loss": 222.2233,
17
+ "step": 10
18
+ },
19
+ {
20
+ "epoch": 0.005867038245755564,
21
+ "grad_norm": 29.318532943725586,
22
+ "learning_rate": 5.571847507331378e-08,
23
+ "loss": 223.4508,
24
+ "step": 20
25
+ },
26
+ {
27
+ "epoch": 0.008800557368633347,
28
+ "grad_norm": 31.036029815673828,
29
+ "learning_rate": 8.504398826979471e-08,
30
+ "loss": 223.5497,
31
+ "step": 30
32
+ },
33
+ {
34
+ "epoch": 0.011734076491511128,
35
+ "grad_norm": 31.80939483642578,
36
+ "learning_rate": 1.1436950146627565e-07,
37
+ "loss": 218.2694,
38
+ "step": 40
39
+ },
40
+ {
41
+ "epoch": 0.014667595614388912,
42
+ "grad_norm": 32.80522918701172,
43
+ "learning_rate": 1.436950146627566e-07,
44
+ "loss": 219.2423,
45
+ "step": 50
46
+ },
47
+ {
48
+ "epoch": 0.017601114737266693,
49
+ "grad_norm": 32.718772888183594,
50
+ "learning_rate": 1.7302052785923753e-07,
51
+ "loss": 224.9209,
52
+ "step": 60
53
+ },
54
+ {
55
+ "epoch": 0.020534633860144477,
56
+ "grad_norm": 30.853660583496094,
57
+ "learning_rate": 2.0234604105571846e-07,
58
+ "loss": 220.9806,
59
+ "step": 70
60
+ },
61
+ {
62
+ "epoch": 0.023468152983022256,
63
+ "grad_norm": 31.83987045288086,
64
+ "learning_rate": 2.3167155425219938e-07,
65
+ "loss": 221.7758,
66
+ "step": 80
67
+ },
68
+ {
69
+ "epoch": 0.02640167210590004,
70
+ "grad_norm": 33.82211685180664,
71
+ "learning_rate": 2.609970674486803e-07,
72
+ "loss": 220.2104,
73
+ "step": 90
74
+ },
75
+ {
76
+ "epoch": 0.029335191228777823,
77
+ "grad_norm": 39.342655181884766,
78
+ "learning_rate": 2.903225806451613e-07,
79
+ "loss": 223.5162,
80
+ "step": 100
81
+ },
82
+ {
83
+ "epoch": 0.032268710351655606,
84
+ "grad_norm": 32.44097900390625,
85
+ "learning_rate": 3.196480938416422e-07,
86
+ "loss": 222.4887,
87
+ "step": 110
88
+ },
89
+ {
90
+ "epoch": 0.035202229474533386,
91
+ "grad_norm": 30.906185150146484,
92
+ "learning_rate": 3.489736070381232e-07,
93
+ "loss": 221.0162,
94
+ "step": 120
95
+ },
96
+ {
97
+ "epoch": 0.038135748597411166,
98
+ "grad_norm": 30.318588256835938,
99
+ "learning_rate": 3.7829912023460407e-07,
100
+ "loss": 219.1895,
101
+ "step": 130
102
+ },
103
+ {
104
+ "epoch": 0.04106926772028895,
105
+ "grad_norm": 33.13260269165039,
106
+ "learning_rate": 4.0762463343108505e-07,
107
+ "loss": 219.1354,
108
+ "step": 140
109
+ },
110
+ {
111
+ "epoch": 0.04400278684316673,
112
+ "grad_norm": 32.98201370239258,
113
+ "learning_rate": 4.36950146627566e-07,
114
+ "loss": 221.9035,
115
+ "step": 150
116
+ },
117
+ {
118
+ "epoch": 0.04693630596604451,
119
+ "grad_norm": 30.733919143676758,
120
+ "learning_rate": 4.6627565982404685e-07,
121
+ "loss": 219.0109,
122
+ "step": 160
123
+ },
124
+ {
125
+ "epoch": 0.0498698250889223,
126
+ "grad_norm": 35.68417739868164,
127
+ "learning_rate": 4.956011730205278e-07,
128
+ "loss": 222.1004,
129
+ "step": 170
130
+ },
131
+ {
132
+ "epoch": 0.05280334421180008,
133
+ "grad_norm": 34.876121520996094,
134
+ "learning_rate": 5.249266862170088e-07,
135
+ "loss": 220.137,
136
+ "step": 180
137
+ },
138
+ {
139
+ "epoch": 0.055736863334677866,
140
+ "grad_norm": 33.82151412963867,
141
+ "learning_rate": 5.542521994134897e-07,
142
+ "loss": 224.7452,
143
+ "step": 190
144
+ },
145
+ {
146
+ "epoch": 0.058670382457555646,
147
+ "grad_norm": 36.70476531982422,
148
+ "learning_rate": 5.835777126099707e-07,
149
+ "loss": 219.8298,
150
+ "step": 200
151
+ },
152
+ {
153
+ "epoch": 0.058670382457555646,
154
+ "eval_loss": 24.500732421875,
155
+ "eval_runtime": 98.9198,
156
+ "eval_samples_per_second": 98.019,
157
+ "eval_steps_per_second": 6.126,
158
+ "step": 200
159
+ },
160
+ {
161
+ "epoch": 0.061603901580433426,
162
+ "grad_norm": 34.49006652832031,
163
+ "learning_rate": 6.129032258064516e-07,
164
+ "loss": 223.5638,
165
+ "step": 210
166
+ },
167
+ {
168
+ "epoch": 0.06453742070331121,
169
+ "grad_norm": 32.312313079833984,
170
+ "learning_rate": 6.422287390029325e-07,
171
+ "loss": 225.3921,
172
+ "step": 220
173
+ },
174
+ {
175
+ "epoch": 0.06747093982618899,
176
+ "grad_norm": 33.46302032470703,
177
+ "learning_rate": 6.715542521994134e-07,
178
+ "loss": 219.3619,
179
+ "step": 230
180
+ },
181
+ {
182
+ "epoch": 0.07040445894906677,
183
+ "grad_norm": 47.695858001708984,
184
+ "learning_rate": 7.008797653958944e-07,
185
+ "loss": 221.6162,
186
+ "step": 240
187
+ },
188
+ {
189
+ "epoch": 0.07333797807194456,
190
+ "grad_norm": 36.99955368041992,
191
+ "learning_rate": 7.302052785923753e-07,
192
+ "loss": 224.4357,
193
+ "step": 250
194
+ },
195
+ {
196
+ "epoch": 0.07627149719482233,
197
+ "grad_norm": 33.713096618652344,
198
+ "learning_rate": 7.595307917888563e-07,
199
+ "loss": 218.6644,
200
+ "step": 260
201
+ },
202
+ {
203
+ "epoch": 0.07920501631770012,
204
+ "grad_norm": 36.349666595458984,
205
+ "learning_rate": 7.888563049853372e-07,
206
+ "loss": 221.4383,
207
+ "step": 270
208
+ },
209
+ {
210
+ "epoch": 0.0821385354405779,
211
+ "grad_norm": 36.67658615112305,
212
+ "learning_rate": 8.181818181818182e-07,
213
+ "loss": 221.6365,
214
+ "step": 280
215
+ },
216
+ {
217
+ "epoch": 0.08507205456345568,
218
+ "grad_norm": 31.31206512451172,
219
+ "learning_rate": 8.475073313782992e-07,
220
+ "loss": 219.8238,
221
+ "step": 290
222
+ },
223
+ {
224
+ "epoch": 0.08800557368633347,
225
+ "grad_norm": 33.81391525268555,
226
+ "learning_rate": 8.7683284457478e-07,
227
+ "loss": 222.3335,
228
+ "step": 300
229
+ },
230
+ {
231
+ "epoch": 0.09093909280921125,
232
+ "grad_norm": 39.456138610839844,
233
+ "learning_rate": 9.061583577712609e-07,
234
+ "loss": 225.0574,
235
+ "step": 310
236
+ },
237
+ {
238
+ "epoch": 0.09387261193208903,
239
+ "grad_norm": 62.84433364868164,
240
+ "learning_rate": 9.354838709677418e-07,
241
+ "loss": 222.3193,
242
+ "step": 320
243
+ },
244
+ {
245
+ "epoch": 0.09680613105496681,
246
+ "grad_norm": 37.60541915893555,
247
+ "learning_rate": 9.648093841642228e-07,
248
+ "loss": 215.4717,
249
+ "step": 330
250
+ },
251
+ {
252
+ "epoch": 0.0997396501778446,
253
+ "grad_norm": 42.61164855957031,
254
+ "learning_rate": 9.941348973607037e-07,
255
+ "loss": 220.7702,
256
+ "step": 340
257
+ },
258
+ {
259
+ "epoch": 0.10267316930072237,
260
+ "grad_norm": 41.35678482055664,
261
+ "learning_rate": 9.987648602748184e-07,
262
+ "loss": 221.7166,
263
+ "step": 350
264
+ },
265
+ {
266
+ "epoch": 0.10560668842360016,
267
+ "grad_norm": 41.287208557128906,
268
+ "learning_rate": 9.972209356183417e-07,
269
+ "loss": 222.1618,
270
+ "step": 360
271
+ },
272
+ {
273
+ "epoch": 0.10854020754647795,
274
+ "grad_norm": 54.5716667175293,
275
+ "learning_rate": 9.956770109618649e-07,
276
+ "loss": 221.5792,
277
+ "step": 370
278
+ },
279
+ {
280
+ "epoch": 0.11147372666935573,
281
+ "grad_norm": 40.734012603759766,
282
+ "learning_rate": 9.941330863053883e-07,
283
+ "loss": 219.901,
284
+ "step": 380
285
+ },
286
+ {
287
+ "epoch": 0.1144072457922335,
288
+ "grad_norm": 43.457218170166016,
289
+ "learning_rate": 9.925891616489115e-07,
290
+ "loss": 223.7378,
291
+ "step": 390
292
+ },
293
+ {
294
+ "epoch": 0.11734076491511129,
295
+ "grad_norm": 42.917686462402344,
296
+ "learning_rate": 9.910452369924347e-07,
297
+ "loss": 222.8944,
298
+ "step": 400
299
+ },
300
+ {
301
+ "epoch": 0.11734076491511129,
302
+ "eval_loss": 24.425460815429688,
303
+ "eval_runtime": 94.8923,
304
+ "eval_samples_per_second": 102.179,
305
+ "eval_steps_per_second": 6.386,
306
+ "step": 400
307
+ },
308
+ {
309
+ "epoch": 0.12027428403798908,
310
+ "grad_norm": 39.965293884277344,
311
+ "learning_rate": 9.89501312335958e-07,
312
+ "loss": 220.1472,
313
+ "step": 410
314
+ },
315
+ {
316
+ "epoch": 0.12320780316086685,
317
+ "grad_norm": 45.19244384765625,
318
+ "learning_rate": 9.879573876794812e-07,
319
+ "loss": 224.2056,
320
+ "step": 420
321
+ },
322
+ {
323
+ "epoch": 0.12614132228374464,
324
+ "grad_norm": 41.27251434326172,
325
+ "learning_rate": 9.864134630230044e-07,
326
+ "loss": 217.4446,
327
+ "step": 430
328
+ },
329
+ {
330
+ "epoch": 0.12907484140662243,
331
+ "grad_norm": 49.71922302246094,
332
+ "learning_rate": 9.848695383665276e-07,
333
+ "loss": 220.2578,
334
+ "step": 440
335
+ },
336
+ {
337
+ "epoch": 0.1320083605295002,
338
+ "grad_norm": 65.56668853759766,
339
+ "learning_rate": 9.833256137100508e-07,
340
+ "loss": 221.2077,
341
+ "step": 450
342
+ },
343
+ {
344
+ "epoch": 0.13494187965237797,
345
+ "grad_norm": 41.73335266113281,
346
+ "learning_rate": 9.817816890535742e-07,
347
+ "loss": 219.8,
348
+ "step": 460
349
+ },
350
+ {
351
+ "epoch": 0.13787539877525576,
352
+ "grad_norm": 51.275718688964844,
353
+ "learning_rate": 9.802377643970974e-07,
354
+ "loss": 221.3817,
355
+ "step": 470
356
+ },
357
+ {
358
+ "epoch": 0.14080891789813355,
359
+ "grad_norm": 55.4876823425293,
360
+ "learning_rate": 9.786938397406207e-07,
361
+ "loss": 216.4269,
362
+ "step": 480
363
+ },
364
+ {
365
+ "epoch": 0.14374243702101133,
366
+ "grad_norm": 55.99393844604492,
367
+ "learning_rate": 9.771499150841439e-07,
368
+ "loss": 218.8694,
369
+ "step": 490
370
+ },
371
+ {
372
+ "epoch": 0.14667595614388912,
373
+ "grad_norm": 95.5741958618164,
374
+ "learning_rate": 9.75605990427667e-07,
375
+ "loss": 221.4839,
376
+ "step": 500
377
+ },
378
+ {
379
+ "epoch": 0.1496094752667669,
380
+ "grad_norm": 49.25442886352539,
381
+ "learning_rate": 9.740620657711903e-07,
382
+ "loss": 222.9515,
383
+ "step": 510
384
+ },
385
+ {
386
+ "epoch": 0.15254299438964466,
387
+ "grad_norm": 50.05457305908203,
388
+ "learning_rate": 9.725181411147135e-07,
389
+ "loss": 218.0743,
390
+ "step": 520
391
+ },
392
+ {
393
+ "epoch": 0.15547651351252245,
394
+ "grad_norm": 43.44709777832031,
395
+ "learning_rate": 9.709742164582367e-07,
396
+ "loss": 218.6208,
397
+ "step": 530
398
+ },
399
+ {
400
+ "epoch": 0.15841003263540024,
401
+ "grad_norm": 66.39103698730469,
402
+ "learning_rate": 9.694302918017602e-07,
403
+ "loss": 219.6833,
404
+ "step": 540
405
+ },
406
+ {
407
+ "epoch": 0.16134355175827803,
408
+ "grad_norm": 54.72968292236328,
409
+ "learning_rate": 9.678863671452832e-07,
410
+ "loss": 221.8852,
411
+ "step": 550
412
+ },
413
+ {
414
+ "epoch": 0.1642770708811558,
415
+ "grad_norm": 65.26374816894531,
416
+ "learning_rate": 9.663424424888064e-07,
417
+ "loss": 219.7626,
418
+ "step": 560
419
+ },
420
+ {
421
+ "epoch": 0.1672105900040336,
422
+ "grad_norm": 60.0925178527832,
423
+ "learning_rate": 9.647985178323296e-07,
424
+ "loss": 217.8218,
425
+ "step": 570
426
+ },
427
+ {
428
+ "epoch": 0.17014410912691136,
429
+ "grad_norm": 47.97535705566406,
430
+ "learning_rate": 9.63254593175853e-07,
431
+ "loss": 217.9315,
432
+ "step": 580
433
+ },
434
+ {
435
+ "epoch": 0.17307762824978914,
436
+ "grad_norm": 53.61656951904297,
437
+ "learning_rate": 9.617106685193762e-07,
438
+ "loss": 219.2269,
439
+ "step": 590
440
+ },
441
+ {
442
+ "epoch": 0.17601114737266693,
443
+ "grad_norm": 52.75293731689453,
444
+ "learning_rate": 9.601667438628995e-07,
445
+ "loss": 216.867,
446
+ "step": 600
447
+ },
448
+ {
449
+ "epoch": 0.17601114737266693,
450
+ "eval_loss": 24.240764617919922,
451
+ "eval_runtime": 97.5766,
452
+ "eval_samples_per_second": 99.368,
453
+ "eval_steps_per_second": 6.211,
454
+ "step": 600
455
+ },
456
+ {
457
+ "epoch": 0.17894466649554472,
458
+ "grad_norm": 59.573219299316406,
459
+ "learning_rate": 9.586228192064227e-07,
460
+ "loss": 213.2538,
461
+ "step": 610
462
+ },
463
+ {
464
+ "epoch": 0.1818781856184225,
465
+ "grad_norm": 113.46548461914062,
466
+ "learning_rate": 9.570788945499459e-07,
467
+ "loss": 218.0255,
468
+ "step": 620
469
+ },
470
+ {
471
+ "epoch": 0.1848117047413003,
472
+ "grad_norm": 119.12982177734375,
473
+ "learning_rate": 9.55534969893469e-07,
474
+ "loss": 216.9313,
475
+ "step": 630
476
+ },
477
+ {
478
+ "epoch": 0.18774522386417805,
479
+ "grad_norm": 54.008338928222656,
480
+ "learning_rate": 9.539910452369923e-07,
481
+ "loss": 220.8365,
482
+ "step": 640
483
+ },
484
+ {
485
+ "epoch": 0.19067874298705584,
486
+ "grad_norm": 59.56270217895508,
487
+ "learning_rate": 9.524471205805155e-07,
488
+ "loss": 218.8136,
489
+ "step": 650
490
+ },
491
+ {
492
+ "epoch": 0.19361226210993362,
493
+ "grad_norm": 52.067115783691406,
494
+ "learning_rate": 9.509031959240389e-07,
495
+ "loss": 220.6164,
496
+ "step": 660
497
+ },
498
+ {
499
+ "epoch": 0.1965457812328114,
500
+ "grad_norm": 60.61309051513672,
501
+ "learning_rate": 9.493592712675621e-07,
502
+ "loss": 217.9881,
503
+ "step": 670
504
+ },
505
+ {
506
+ "epoch": 0.1994793003556892,
507
+ "grad_norm": 49.88456726074219,
508
+ "learning_rate": 9.478153466110853e-07,
509
+ "loss": 217.0137,
510
+ "step": 680
511
+ },
512
+ {
513
+ "epoch": 0.20241281947856699,
514
+ "grad_norm": 49.28492736816406,
515
+ "learning_rate": 9.462714219546085e-07,
516
+ "loss": 212.2388,
517
+ "step": 690
518
+ },
519
+ {
520
+ "epoch": 0.20534633860144474,
521
+ "grad_norm": 55.44947814941406,
522
+ "learning_rate": 9.447274972981318e-07,
523
+ "loss": 221.7097,
524
+ "step": 700
525
+ },
526
+ {
527
+ "epoch": 0.20827985772432253,
528
+ "grad_norm": 47.7352409362793,
529
+ "learning_rate": 9.43183572641655e-07,
530
+ "loss": 217.8991,
531
+ "step": 710
532
+ },
533
+ {
534
+ "epoch": 0.21121337684720032,
535
+ "grad_norm": 56.91552734375,
536
+ "learning_rate": 9.416396479851782e-07,
537
+ "loss": 216.618,
538
+ "step": 720
539
+ },
540
+ {
541
+ "epoch": 0.2141468959700781,
542
+ "grad_norm": 50.68717575073242,
543
+ "learning_rate": 9.400957233287015e-07,
544
+ "loss": 217.7346,
545
+ "step": 730
546
+ },
547
+ {
548
+ "epoch": 0.2170804150929559,
549
+ "grad_norm": 75.52225494384766,
550
+ "learning_rate": 9.385517986722248e-07,
551
+ "loss": 215.9344,
552
+ "step": 740
553
+ },
554
+ {
555
+ "epoch": 0.22001393421583368,
556
+ "grad_norm": 74.4793472290039,
557
+ "learning_rate": 9.37007874015748e-07,
558
+ "loss": 222.193,
559
+ "step": 750
560
+ },
561
+ {
562
+ "epoch": 0.22294745333871147,
563
+ "grad_norm": 58.30630111694336,
564
+ "learning_rate": 9.354639493592712e-07,
565
+ "loss": 215.5639,
566
+ "step": 760
567
+ },
568
+ {
569
+ "epoch": 0.22588097246158922,
570
+ "grad_norm": 52.7680778503418,
571
+ "learning_rate": 9.339200247027944e-07,
572
+ "loss": 219.3169,
573
+ "step": 770
574
+ },
575
+ {
576
+ "epoch": 0.228814491584467,
577
+ "grad_norm": 51.10957717895508,
578
+ "learning_rate": 9.323761000463177e-07,
579
+ "loss": 213.7119,
580
+ "step": 780
581
+ },
582
+ {
583
+ "epoch": 0.2317480107073448,
584
+ "grad_norm": 96.71678161621094,
585
+ "learning_rate": 9.30832175389841e-07,
586
+ "loss": 216.2126,
587
+ "step": 790
588
+ },
589
+ {
590
+ "epoch": 0.23468152983022258,
591
+ "grad_norm": 59.496395111083984,
592
+ "learning_rate": 9.292882507333642e-07,
593
+ "loss": 220.2937,
594
+ "step": 800
595
+ },
596
+ {
597
+ "epoch": 0.23468152983022258,
598
+ "eval_loss": 24.050508499145508,
599
+ "eval_runtime": 98.6094,
600
+ "eval_samples_per_second": 98.327,
601
+ "eval_steps_per_second": 6.145,
602
+ "step": 800
603
+ },
604
+ {
605
+ "epoch": 0.23761504895310037,
606
+ "grad_norm": 115.57308959960938,
607
+ "learning_rate": 9.277443260768874e-07,
608
+ "loss": 214.0267,
609
+ "step": 810
610
+ },
611
+ {
612
+ "epoch": 0.24054856807597816,
613
+ "grad_norm": 58.29754638671875,
614
+ "learning_rate": 9.262004014204107e-07,
615
+ "loss": 219.2819,
616
+ "step": 820
617
+ },
618
+ {
619
+ "epoch": 0.24348208719885592,
620
+ "grad_norm": 137.19517517089844,
621
+ "learning_rate": 9.246564767639339e-07,
622
+ "loss": 217.8361,
623
+ "step": 830
624
+ },
625
+ {
626
+ "epoch": 0.2464156063217337,
627
+ "grad_norm": 62.34098434448242,
628
+ "learning_rate": 9.23112552107457e-07,
629
+ "loss": 217.4855,
630
+ "step": 840
631
+ },
632
+ {
633
+ "epoch": 0.2493491254446115,
634
+ "grad_norm": 57.445247650146484,
635
+ "learning_rate": 9.215686274509803e-07,
636
+ "loss": 217.8953,
637
+ "step": 850
638
+ },
639
+ {
640
+ "epoch": 0.2522826445674893,
641
+ "grad_norm": 61.09876251220703,
642
+ "learning_rate": 9.200247027945036e-07,
643
+ "loss": 215.2011,
644
+ "step": 860
645
+ },
646
+ {
647
+ "epoch": 0.25521616369036704,
648
+ "grad_norm": 59.176513671875,
649
+ "learning_rate": 9.184807781380268e-07,
650
+ "loss": 217.2304,
651
+ "step": 870
652
+ },
653
+ {
654
+ "epoch": 0.25814968281324485,
655
+ "grad_norm": 52.66059494018555,
656
+ "learning_rate": 9.1693685348155e-07,
657
+ "loss": 218.234,
658
+ "step": 880
659
+ },
660
+ {
661
+ "epoch": 0.2610832019361226,
662
+ "grad_norm": 98.39973449707031,
663
+ "learning_rate": 9.153929288250732e-07,
664
+ "loss": 214.297,
665
+ "step": 890
666
+ },
667
+ {
668
+ "epoch": 0.2640167210590004,
669
+ "grad_norm": 72.08065795898438,
670
+ "learning_rate": 9.138490041685965e-07,
671
+ "loss": 217.044,
672
+ "step": 900
673
+ },
674
+ {
675
+ "epoch": 0.2669502401818782,
676
+ "grad_norm": 59.712371826171875,
677
+ "learning_rate": 9.123050795121198e-07,
678
+ "loss": 215.2483,
679
+ "step": 910
680
+ },
681
+ {
682
+ "epoch": 0.26988375930475594,
683
+ "grad_norm": 64.43281555175781,
684
+ "learning_rate": 9.10761154855643e-07,
685
+ "loss": 211.9948,
686
+ "step": 920
687
+ },
688
+ {
689
+ "epoch": 0.27281727842763376,
690
+ "grad_norm": 61.78029251098633,
691
+ "learning_rate": 9.092172301991662e-07,
692
+ "loss": 217.2441,
693
+ "step": 930
694
+ },
695
+ {
696
+ "epoch": 0.2757507975505115,
697
+ "grad_norm": 68.14164733886719,
698
+ "learning_rate": 9.076733055426895e-07,
699
+ "loss": 214.7014,
700
+ "step": 940
701
+ },
702
+ {
703
+ "epoch": 0.27868431667338933,
704
+ "grad_norm": 61.65287399291992,
705
+ "learning_rate": 9.061293808862127e-07,
706
+ "loss": 212.859,
707
+ "step": 950
708
+ },
709
+ {
710
+ "epoch": 0.2816178357962671,
711
+ "grad_norm": 64.0514144897461,
712
+ "learning_rate": 9.045854562297359e-07,
713
+ "loss": 217.1946,
714
+ "step": 960
715
+ },
716
+ {
717
+ "epoch": 0.2845513549191449,
718
+ "grad_norm": 91.87364959716797,
719
+ "learning_rate": 9.030415315732592e-07,
720
+ "loss": 215.7542,
721
+ "step": 970
722
+ },
723
+ {
724
+ "epoch": 0.28748487404202266,
725
+ "grad_norm": 54.730316162109375,
726
+ "learning_rate": 9.014976069167825e-07,
727
+ "loss": 218.0408,
728
+ "step": 980
729
+ },
730
+ {
731
+ "epoch": 0.2904183931649004,
732
+ "grad_norm": 56.43712615966797,
733
+ "learning_rate": 8.999536822603057e-07,
734
+ "loss": 212.8671,
735
+ "step": 990
736
+ },
737
+ {
738
+ "epoch": 0.29335191228777824,
739
+ "grad_norm": 59.28590393066406,
740
+ "learning_rate": 8.984097576038289e-07,
741
+ "loss": 215.822,
742
+ "step": 1000
743
+ },
744
+ {
745
+ "epoch": 0.29335191228777824,
746
+ "eval_loss": 23.851858139038086,
747
+ "eval_runtime": 96.4448,
748
+ "eval_samples_per_second": 100.534,
749
+ "eval_steps_per_second": 6.283,
750
+ "step": 1000
751
+ }
752
+ ],
753
+ "logging_steps": 10,
754
+ "max_steps": 6818,
755
+ "num_input_tokens_seen": 0,
756
+ "num_train_epochs": 2,
757
+ "save_steps": 200,
758
+ "stateful_callbacks": {
759
+ "TrainerControl": {
760
+ "args": {
761
+ "should_epoch_stop": false,
762
+ "should_evaluate": false,
763
+ "should_log": false,
764
+ "should_save": true,
765
+ "should_training_stop": false
766
+ },
767
+ "attributes": {}
768
+ }
769
+ },
770
+ "total_flos": 3.503007404654592e+17,
771
+ "train_batch_size": 8,
772
+ "trial_name": null,
773
+ "trial_params": null
774
+ }
lora/lora-stage1/vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
lora/lora-stage2/README.md ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: ''
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - 'base_model:adapter:'
7
+ - lora
8
+ - transformers
9
+ ---
10
+
11
+ # Model Card for Model ID
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ <!-- Provide a longer summary of what this model is. -->
22
+
23
+
24
+
25
+ - **Developed by:** [More Information Needed]
26
+ - **Funded by [optional]:** [More Information Needed]
27
+ - **Shared by [optional]:** [More Information Needed]
28
+ - **Model type:** [More Information Needed]
29
+ - **Language(s) (NLP):** [More Information Needed]
30
+ - **License:** [More Information Needed]
31
+ - **Finetuned from model [optional]:** [More Information Needed]
32
+
33
+ ### Model Sources [optional]
34
+
35
+ <!-- Provide the basic links for the model. -->
36
+
37
+ - **Repository:** [More Information Needed]
38
+ - **Paper [optional]:** [More Information Needed]
39
+ - **Demo [optional]:** [More Information Needed]
40
+
41
+ ## Uses
42
+
43
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
+
45
+ ### Direct Use
46
+
47
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Downstream Use [optional]
52
+
53
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
+
55
+ [More Information Needed]
56
+
57
+ ### Out-of-Scope Use
58
+
59
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ## Bias, Risks, and Limitations
64
+
65
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
+
67
+ [More Information Needed]
68
+
69
+ ### Recommendations
70
+
71
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
+
73
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
+
75
+ ## How to Get Started with the Model
76
+
77
+ Use the code below to get started with the model.
78
+
79
+ [More Information Needed]
80
+
81
+ ## Training Details
82
+
83
+ ### Training Data
84
+
85
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
+
87
+ [More Information Needed]
88
+
89
+ ### Training Procedure
90
+
91
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
+
93
+ #### Preprocessing [optional]
94
+
95
+ [More Information Needed]
96
+
97
+
98
+ #### Training Hyperparameters
99
+
100
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
+
102
+ #### Speeds, Sizes, Times [optional]
103
+
104
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
+
106
+ [More Information Needed]
107
+
108
+ ## Evaluation
109
+
110
+ <!-- This section describes the evaluation protocols and provides the results. -->
111
+
112
+ ### Testing Data, Factors & Metrics
113
+
114
+ #### Testing Data
115
+
116
+ <!-- This should link to a Dataset Card if possible. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Factors
121
+
122
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
+
124
+ [More Information Needed]
125
+
126
+ #### Metrics
127
+
128
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
+
130
+ [More Information Needed]
131
+
132
+ ### Results
133
+
134
+ [More Information Needed]
135
+
136
+ #### Summary
137
+
138
+
139
+
140
+ ## Model Examination [optional]
141
+
142
+ <!-- Relevant interpretability work for the model goes here -->
143
+
144
+ [More Information Needed]
145
+
146
+ ## Environmental Impact
147
+
148
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
+
150
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
+
152
+ - **Hardware Type:** [More Information Needed]
153
+ - **Hours used:** [More Information Needed]
154
+ - **Cloud Provider:** [More Information Needed]
155
+ - **Compute Region:** [More Information Needed]
156
+ - **Carbon Emitted:** [More Information Needed]
157
+
158
+ ## Technical Specifications [optional]
159
+
160
+ ### Model Architecture and Objective
161
+
162
+ [More Information Needed]
163
+
164
+ ### Compute Infrastructure
165
+
166
+ [More Information Needed]
167
+
168
+ #### Hardware
169
+
170
+ [More Information Needed]
171
+
172
+ #### Software
173
+
174
+ [More Information Needed]
175
+
176
+ ## Citation [optional]
177
+
178
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
+
180
+ **BibTeX:**
181
+
182
+ [More Information Needed]
183
+
184
+ **APA:**
185
+
186
+ [More Information Needed]
187
+
188
+ ## Glossary [optional]
189
+
190
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
+
192
+ [More Information Needed]
193
+
194
+ ## More Information [optional]
195
+
196
+ [More Information Needed]
197
+
198
+ ## Model Card Authors [optional]
199
+
200
+ [More Information Needed]
201
+
202
+ ## Model Card Contact
203
+
204
+ [More Information Needed]
205
+ ### Framework versions
206
+
207
+ - PEFT 0.18.1
lora/lora-stage2/adapter_config.json ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alora_invocation_tokens": null,
3
+ "alpha_pattern": {},
4
+ "arrow_config": null,
5
+ "auto_mapping": null,
6
+ "base_model_name_or_path": "",
7
+ "bias": "none",
8
+ "corda_config": null,
9
+ "ensure_weight_tying": false,
10
+ "eva_config": null,
11
+ "exclude_modules": null,
12
+ "fan_in_fan_out": false,
13
+ "inference_mode": true,
14
+ "init_lora_weights": true,
15
+ "layer_replication": null,
16
+ "layers_pattern": null,
17
+ "layers_to_transform": null,
18
+ "loftq_config": {},
19
+ "lora_alpha": 16,
20
+ "lora_bias": false,
21
+ "lora_dropout": 0.05,
22
+ "megatron_config": null,
23
+ "megatron_core": "megatron.core",
24
+ "modules_to_save": null,
25
+ "peft_type": "LORA",
26
+ "peft_version": "0.18.1",
27
+ "qalora_group_size": 16,
28
+ "r": 8,
29
+ "rank_pattern": {},
30
+ "revision": null,
31
+ "target_modules": "^(audio_tower\\.(conv_out|proj1|proj2)$|audio_tower\\.layers\\.\\d+\\..*\\.(q_proj|k_proj|v_proj|out_proj|fc1|fc2)$|model\\.layers\\.\\d+\\..*\\.(q_proj|k_proj|v_proj|o_proj|gate_proj|up_proj|down_proj)$)",
32
+ "target_parameters": null,
33
+ "task_type": "CAUSAL_LM",
34
+ "trainable_token_indices": null,
35
+ "use_dora": false,
36
+ "use_qalora": false,
37
+ "use_rslora": false
38
+ }
lora/lora-stage2/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dd4baa6a45645b280fdddb3c722186d149b5f64daab687300dba0c08373e3962
3
+ size 41677888
lora/lora-stage2/added_tokens.json ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</think>": 151668,
3
+ "</tool_call>": 151658,
4
+ "</tool_response>": 151666,
5
+ "<asr_text>": 151704,
6
+ "<blank10>": 151686,
7
+ "<blank11>": 151687,
8
+ "<blank12>": 151688,
9
+ "<blank13>": 151689,
10
+ "<blank14>": 151690,
11
+ "<blank15>": 151691,
12
+ "<blank16>": 151692,
13
+ "<blank17>": 151693,
14
+ "<blank18>": 151694,
15
+ "<blank19>": 151695,
16
+ "<blank1>": 151677,
17
+ "<blank20>": 151696,
18
+ "<blank21>": 151697,
19
+ "<blank22>": 151698,
20
+ "<blank23>": 151699,
21
+ "<blank24>": 151700,
22
+ "<blank25>": 151701,
23
+ "<blank26>": 151702,
24
+ "<blank27>": 151703,
25
+ "<blank2>": 151678,
26
+ "<blank3>": 151679,
27
+ "<blank4>": 151680,
28
+ "<blank5>": 151681,
29
+ "<blank6>": 151682,
30
+ "<blank7>": 151683,
31
+ "<blank8>": 151684,
32
+ "<blank9>": 151685,
33
+ "<non_speech>": 151675,
34
+ "<think>": 151667,
35
+ "<tool_call>": 151657,
36
+ "<tool_response>": 151665,
37
+ "<tts_pad>": 151671,
38
+ "<tts_text_bos>": 151672,
39
+ "<tts_text_bos_single>": 151674,
40
+ "<tts_text_eod>": 151673,
41
+ "<|audio_end|>": 151670,
42
+ "<|audio_pad|>": 151676,
43
+ "<|audio_start|>": 151669,
44
+ "<|box_end|>": 151649,
45
+ "<|box_start|>": 151648,
46
+ "<|endoftext|>": 151643,
47
+ "<|file_sep|>": 151664,
48
+ "<|fim_middle|>": 151660,
49
+ "<|fim_pad|>": 151662,
50
+ "<|fim_prefix|>": 151659,
51
+ "<|fim_suffix|>": 151661,
52
+ "<|im_end|>": 151645,
53
+ "<|im_start|>": 151644,
54
+ "<|image_pad|>": 151655,
55
+ "<|object_ref_end|>": 151647,
56
+ "<|object_ref_start|>": 151646,
57
+ "<|quad_end|>": 151651,
58
+ "<|quad_start|>": 151650,
59
+ "<|repo_name|>": 151663,
60
+ "<|video_pad|>": 151656,
61
+ "<|vision_end|>": 151653,
62
+ "<|vision_pad|>": 151654,
63
+ "<|vision_start|>": 151652
64
+ }
lora/lora-stage2/base_model.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ /data/haobin/pky_train/qwen3/Qwen3-ASR-1.7B
lora/lora-stage2/chat_template.jinja ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- set ns = namespace(system_text="") -%}
2
+ {%- for m in messages -%}
3
+ {%- if m.role == 'system' -%}
4
+ {%- if m.content is string -%}
5
+ {%- set ns.system_text = ns.system_text + m.content -%}
6
+ {%- else -%}
7
+ {%- for c in m.content -%}
8
+ {%- if c.type == 'text' and (c.text is defined) -%}
9
+ {%- set ns.system_text = ns.system_text + c.text -%}
10
+ {%- endif -%}
11
+ {%- endfor -%}
12
+ {%- endif -%}
13
+ {%- endif -%}
14
+ {%- endfor -%}
15
+
16
+ {%- set ns2 = namespace(audio_tokens="") -%}
17
+ {%- for m in messages -%}
18
+ {%- if m.content is not string -%}
19
+ {%- for c in m.content -%}
20
+ {%- if c.type == 'audio' or ('audio' in c) or ('audio_url' in c) -%}
21
+ {%- set ns2.audio_tokens = ns2.audio_tokens + "<|audio_start|><|audio_pad|><|audio_end|>" -%}
22
+ {%- endif -%}
23
+ {%- endfor -%}
24
+ {%- endif -%}
25
+ {%- endfor -%}
26
+
27
+ {{- '<|im_start|>system\n' + (ns.system_text if ns.system_text is string else '') + '<|im_end|>\n' -}}
28
+ {{- '<|im_start|>user\n' + ns2.audio_tokens + '<|im_end|>\n' -}}
29
+ {%- if add_generation_prompt -%}
30
+ {{- '<|im_start|>assistant\n' -}}
31
+ {%- endif -%}
lora/lora-stage2/chat_template.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"chat_template": "{%- set ns = namespace(system_text=\"\") -%}\n{%- for m in messages -%}\n {%- if m.role == 'system' -%}\n {%- if m.content is string -%}\n {%- set ns.system_text = ns.system_text + m.content -%}\n {%- else -%}\n {%- for c in m.content -%}\n {%- if c.type == 'text' and (c.text is defined) -%}\n {%- set ns.system_text = ns.system_text + c.text -%}\n {%- endif -%}\n {%- endfor -%}\n {%- endif -%}\n {%- endif -%}\n{%- endfor -%}\n\n{%- set ns2 = namespace(audio_tokens=\"\") -%}\n{%- for m in messages -%}\n {%- if m.content is not string -%}\n {%- for c in m.content -%}\n {%- if c.type == 'audio' or ('audio' in c) or ('audio_url' in c) -%}\n {%- set ns2.audio_tokens = ns2.audio_tokens + \"<|audio_start|><|audio_pad|><|audio_end|>\" -%}\n {%- endif -%}\n {%- endfor -%}\n {%- endif -%}\n{%- endfor -%}\n\n{{- '<|im_start|>system\\n' + (ns.system_text if ns.system_text is string else '') + '<|im_end|>\\n' -}}\n{{- '<|im_start|>user\\n' + ns2.audio_tokens + '<|im_end|>\\n' -}}\n{%- if add_generation_prompt -%}\n{{- '<|im_start|>assistant\\n' -}}\n{%- endif -%}"}
lora/lora-stage2/config.json ADDED
@@ -0,0 +1,221 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "Qwen3ASRForConditionalGeneration"
4
+ ],
5
+ "model_type": "qwen3_asr",
6
+ "support_languages": [
7
+ "Chinese",
8
+ "English",
9
+ "Cantonese",
10
+ "Arabic",
11
+ "German",
12
+ "French",
13
+ "Spanish",
14
+ "Portuguese",
15
+ "Indonesian",
16
+ "Italian",
17
+ "Korean",
18
+ "Russian",
19
+ "Thai",
20
+ "Vietnamese",
21
+ "Japanese",
22
+ "Turkish",
23
+ "Hindi",
24
+ "Malay",
25
+ "Dutch",
26
+ "Swedish",
27
+ "Danish",
28
+ "Finnish",
29
+ "Polish",
30
+ "Czech",
31
+ "Filipino",
32
+ "Persian",
33
+ "Greek",
34
+ "Romanian",
35
+ "Hungarian",
36
+ "Macedonian"
37
+ ],
38
+ "thinker_config": {
39
+ "model_type": "qwen3_asr",
40
+ "architectures": [
41
+ "Qwen3ASRForConditionalGeneration"
42
+ ],
43
+ "audio_config": {
44
+ "_name_or_path": "",
45
+ "activation_dropout": 0,
46
+ "activation_function": "gelu",
47
+ "add_cross_attention": false,
48
+ "architectures": null,
49
+ "attention_dropout": 0,
50
+ "bad_words_ids": null,
51
+ "begin_suppress_tokens": null,
52
+ "bos_token_id": null,
53
+ "chunk_size_feed_forward": 0,
54
+ "conv_chunksize": 500,
55
+ "cross_attention_hidden_size": null,
56
+ "d_model": 1024,
57
+ "decoder_start_token_id": null,
58
+ "diversity_penalty": 0.0,
59
+ "do_sample": false,
60
+ "downsample_hidden_size": 480,
61
+ "dropout": 0,
62
+ "dtype": null,
63
+ "early_stopping": false,
64
+ "encoder_attention_heads": 16,
65
+ "encoder_ffn_dim": 4096,
66
+ "encoder_layers": 24,
67
+ "encoder_no_repeat_ngram_size": 0,
68
+ "eos_token_id": null,
69
+ "exponential_decay_length_penalty": null,
70
+ "finetuning_task": null,
71
+ "forced_bos_token_id": null,
72
+ "forced_eos_token_id": null,
73
+ "id2label": {
74
+ "0": "LABEL_0",
75
+ "1": "LABEL_1"
76
+ },
77
+ "initializer_range": 0.02,
78
+ "is_decoder": false,
79
+ "is_encoder_decoder": false,
80
+ "label2id": {
81
+ "LABEL_0": 0,
82
+ "LABEL_1": 1
83
+ },
84
+ "length_penalty": 1.0,
85
+ "max_length": 20,
86
+ "max_source_positions": 1500,
87
+ "min_length": 0,
88
+ "model_type": "qwen3_asr_audio_encoder",
89
+ "n_window": 50,
90
+ "n_window_infer": 800,
91
+ "no_repeat_ngram_size": 0,
92
+ "num_beam_groups": 1,
93
+ "num_beams": 1,
94
+ "num_hidden_layers": 24,
95
+ "num_mel_bins": 128,
96
+ "num_return_sequences": 1,
97
+ "output_attentions": false,
98
+ "output_dim": 2048,
99
+ "output_hidden_states": false,
100
+ "output_scores": false,
101
+ "pad_token_id": null,
102
+ "prefix": null,
103
+ "problem_type": null,
104
+ "pruned_heads": {},
105
+ "remove_invalid_values": false,
106
+ "repetition_penalty": 1.0,
107
+ "return_dict": true,
108
+ "return_dict_in_generate": false,
109
+ "scale_embedding": false,
110
+ "sep_token_id": null,
111
+ "suppress_tokens": null,
112
+ "task_specific_params": null,
113
+ "temperature": 1.0,
114
+ "tf_legacy_loss": false,
115
+ "tie_encoder_decoder": false,
116
+ "tie_word_embeddings": true,
117
+ "tokenizer_class": null,
118
+ "top_k": 50,
119
+ "top_p": 1.0,
120
+ "torchscript": false,
121
+ "typical_p": 1.0,
122
+ "use_bfloat16": false
123
+ },
124
+ "audio_end_token_id": 151670,
125
+ "audio_start_token_id": 151669,
126
+ "audio_token_id": 151676,
127
+ "dtype": "bfloat16",
128
+ "initializer_range": 0.02,
129
+ "text_config": {
130
+ "_name_or_path": "",
131
+ "add_cross_attention": false,
132
+ "architectures": null,
133
+ "attention_bias": false,
134
+ "attention_dropout": 0.0,
135
+ "bad_words_ids": null,
136
+ "begin_suppress_tokens": null,
137
+ "bos_token_id": null,
138
+ "chunk_size_feed_forward": 0,
139
+ "cross_attention_hidden_size": null,
140
+ "decoder_start_token_id": null,
141
+ "diversity_penalty": 0.0,
142
+ "do_sample": false,
143
+ "dtype": null,
144
+ "early_stopping": false,
145
+ "encoder_no_repeat_ngram_size": 0,
146
+ "eos_token_id": null,
147
+ "exponential_decay_length_penalty": null,
148
+ "finetuning_task": null,
149
+ "forced_bos_token_id": null,
150
+ "forced_eos_token_id": null,
151
+ "head_dim": 128,
152
+ "hidden_act": "silu",
153
+ "hidden_size": 2048,
154
+ "id2label": {
155
+ "0": "LABEL_0",
156
+ "1": "LABEL_1"
157
+ },
158
+ "initializer_range": 0.02,
159
+ "intermediate_size": 6144,
160
+ "is_decoder": false,
161
+ "is_encoder_decoder": false,
162
+ "label2id": {
163
+ "LABEL_0": 0,
164
+ "LABEL_1": 1
165
+ },
166
+ "length_penalty": 1.0,
167
+ "max_length": 20,
168
+ "max_position_embeddings": 65536,
169
+ "min_length": 0,
170
+ "model_type": "qwen3",
171
+ "no_repeat_ngram_size": 0,
172
+ "num_attention_heads": 16,
173
+ "num_beam_groups": 1,
174
+ "num_beams": 1,
175
+ "num_hidden_layers": 28,
176
+ "num_key_value_heads": 8,
177
+ "num_return_sequences": 1,
178
+ "output_attentions": false,
179
+ "output_hidden_states": false,
180
+ "output_scores": false,
181
+ "pad_token_id": null,
182
+ "prefix": null,
183
+ "problem_type": null,
184
+ "pruned_heads": {},
185
+ "remove_invalid_values": false,
186
+ "repetition_penalty": 1.0,
187
+ "return_dict": true,
188
+ "return_dict_in_generate": false,
189
+ "rms_norm_eps": 1e-06,
190
+ "rope_scaling": {
191
+ "interleaved": true,
192
+ "mrope_interleaved": true,
193
+ "mrope_section": [
194
+ 24,
195
+ 20,
196
+ 20
197
+ ],
198
+ "rope_type": "default",
199
+ "type": "default"
200
+ },
201
+ "rope_theta": 1000000,
202
+ "sep_token_id": null,
203
+ "suppress_tokens": null,
204
+ "task_specific_params": null,
205
+ "temperature": 1.0,
206
+ "tf_legacy_loss": false,
207
+ "tie_encoder_decoder": false,
208
+ "tie_word_embeddings": true,
209
+ "tokenizer_class": null,
210
+ "top_k": 50,
211
+ "top_p": 1.0,
212
+ "torchscript": false,
213
+ "typical_p": 1.0,
214
+ "use_bfloat16": false,
215
+ "use_cache": true,
216
+ "vocab_size": 151936
217
+ }
218
+ },
219
+ "transformers_version": "4.57.6"
220
+ }
221
+
lora/lora-stage2/generation_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "eos_token_id": [151643,151645],
4
+ "pad_token_id": 151643,
5
+ "do_sample": false,
6
+ "temperature": 0.000001
7
+ }
lora/lora-stage2/merged_from_lora.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ /data/haobin/pky_train/qwen3/out_qwen3-asr-lora-0317_550000_wer3_towerb4+proj_2gpu_bs128/checkpoint-1000
lora/lora-stage2/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
lora/lora-stage2/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b26cb33d8c7aefdee4dcd88af58551b5a01e16c9f852a1c1ffb0d1a47e6421b4
3
+ size 83695117
lora/lora-stage2/preprocessor_config.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "chunk_length": 30,
3
+ "dither": 0.0,
4
+ "feature_extractor_type": "WhisperFeatureExtractor",
5
+ "feature_size": 128,
6
+ "hop_length": 160,
7
+ "n_fft": 400,
8
+ "n_samples": 480000,
9
+ "nb_max_frames": 3000,
10
+ "padding_side": "right",
11
+ "padding_value": 0.0,
12
+ "processor_class": "Qwen3ASRProcessor",
13
+ "return_attention_mask": true
14
+ }
lora/lora-stage2/rng_state_0.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:de015da1ba6a4dc8cf66420b3b9b378bc07585bfb14a0c37fb50e723424b9768
3
+ size 14917
lora/lora-stage2/rng_state_1.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:681f2e7cc7c3d884111a86a3bcdeeaea97b22ebf60e4f765788ee5cbeb94e2d9
3
+ size 14917
lora/lora-stage2/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2a7077d452a1df5790a83102fc7a743c5150e80f24610df63abd069404ebe93a
3
+ size 1465
lora/lora-stage2/special_tokens_map.json ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|im_start|>",
4
+ "<|im_end|>",
5
+ "<|object_ref_start|>",
6
+ "<|object_ref_end|>",
7
+ "<|box_start|>",
8
+ "<|box_end|>",
9
+ "<|quad_start|>",
10
+ "<|quad_end|>",
11
+ "<|vision_start|>",
12
+ "<|vision_end|>",
13
+ "<|vision_pad|>",
14
+ "<|image_pad|>",
15
+ "<|video_pad|>",
16
+ "<|audio_start|>",
17
+ "<|audio_end|>",
18
+ "<tts_pad>",
19
+ "<tts_text_bos>",
20
+ "<tts_text_bos_single>",
21
+ "<|audio_pad|>"
22
+ ],
23
+ "audio_bos_token": "<|audio_start|>",
24
+ "audio_eos_token": "<|audio_end|>",
25
+ "audio_token": "<|audio_pad|>",
26
+ "eos_token": {
27
+ "content": "<|im_end|>",
28
+ "lstrip": false,
29
+ "normalized": false,
30
+ "rstrip": false,
31
+ "single_word": false
32
+ },
33
+ "image_token": "<|image_pad|>",
34
+ "pad_token": {
35
+ "content": "<|endoftext|>",
36
+ "lstrip": false,
37
+ "normalized": false,
38
+ "rstrip": false,
39
+ "single_word": false
40
+ },
41
+ "video_token": "<|video_pad|>",
42
+ "vision_bos_token": "<|vision_start|>",
43
+ "vision_eos_token": "<|vision_end|>"
44
+ }
lora/lora-stage2/tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0499602714160467f2d68b910651d6216020689f1e016be87a2d0019ee3baeab
3
+ size 11429499
lora/lora-stage2/tokenizer_config.json ADDED
@@ -0,0 +1,549 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "151646": {
30
+ "content": "<|object_ref_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "151647": {
38
+ "content": "<|object_ref_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "151648": {
46
+ "content": "<|box_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "151649": {
54
+ "content": "<|box_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "151650": {
62
+ "content": "<|quad_start|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "151651": {
70
+ "content": "<|quad_end|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "151652": {
78
+ "content": "<|vision_start|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "151653": {
86
+ "content": "<|vision_end|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "151654": {
94
+ "content": "<|vision_pad|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "151655": {
102
+ "content": "<|image_pad|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "151656": {
110
+ "content": "<|video_pad|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "151657": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "151658": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "151659": {
134
+ "content": "<|fim_prefix|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "151660": {
142
+ "content": "<|fim_middle|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "151661": {
150
+ "content": "<|fim_suffix|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "151662": {
158
+ "content": "<|fim_pad|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "151663": {
166
+ "content": "<|repo_name|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "151664": {
174
+ "content": "<|file_sep|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ },
181
+ "151665": {
182
+ "content": "<tool_response>",
183
+ "lstrip": false,
184
+ "normalized": false,
185
+ "rstrip": false,
186
+ "single_word": false,
187
+ "special": false
188
+ },
189
+ "151666": {
190
+ "content": "</tool_response>",
191
+ "lstrip": false,
192
+ "normalized": false,
193
+ "rstrip": false,
194
+ "single_word": false,
195
+ "special": false
196
+ },
197
+ "151667": {
198
+ "content": "<think>",
199
+ "lstrip": false,
200
+ "normalized": false,
201
+ "rstrip": false,
202
+ "single_word": false,
203
+ "special": false
204
+ },
205
+ "151668": {
206
+ "content": "</think>",
207
+ "lstrip": false,
208
+ "normalized": false,
209
+ "rstrip": false,
210
+ "single_word": false,
211
+ "special": false
212
+ },
213
+ "151669": {
214
+ "content": "<|audio_start|>",
215
+ "lstrip": false,
216
+ "normalized": false,
217
+ "rstrip": false,
218
+ "single_word": false,
219
+ "special": true
220
+ },
221
+ "151670": {
222
+ "content": "<|audio_end|>",
223
+ "lstrip": false,
224
+ "normalized": false,
225
+ "rstrip": false,
226
+ "single_word": false,
227
+ "special": true
228
+ },
229
+ "151671": {
230
+ "content": "<tts_pad>",
231
+ "lstrip": false,
232
+ "normalized": false,
233
+ "rstrip": false,
234
+ "single_word": false,
235
+ "special": true
236
+ },
237
+ "151672": {
238
+ "content": "<tts_text_bos>",
239
+ "lstrip": false,
240
+ "normalized": false,
241
+ "rstrip": false,
242
+ "single_word": false,
243
+ "special": true
244
+ },
245
+ "151673": {
246
+ "content": "<tts_text_eod>",
247
+ "lstrip": false,
248
+ "normalized": false,
249
+ "rstrip": false,
250
+ "single_word": false,
251
+ "special": true
252
+ },
253
+ "151674": {
254
+ "content": "<tts_text_bos_single>",
255
+ "lstrip": false,
256
+ "normalized": false,
257
+ "rstrip": false,
258
+ "single_word": false,
259
+ "special": true
260
+ },
261
+ "151675": {
262
+ "content": "<non_speech>",
263
+ "lstrip": false,
264
+ "normalized": false,
265
+ "rstrip": false,
266
+ "single_word": false,
267
+ "special": false
268
+ },
269
+ "151676": {
270
+ "content": "<|audio_pad|>",
271
+ "lstrip": false,
272
+ "normalized": false,
273
+ "rstrip": false,
274
+ "single_word": false,
275
+ "special": true
276
+ },
277
+ "151677": {
278
+ "content": "<blank1>",
279
+ "lstrip": false,
280
+ "normalized": false,
281
+ "rstrip": false,
282
+ "single_word": false,
283
+ "special": true
284
+ },
285
+ "151678": {
286
+ "content": "<blank2>",
287
+ "lstrip": false,
288
+ "normalized": false,
289
+ "rstrip": false,
290
+ "single_word": false,
291
+ "special": true
292
+ },
293
+ "151679": {
294
+ "content": "<blank3>",
295
+ "lstrip": false,
296
+ "normalized": false,
297
+ "rstrip": false,
298
+ "single_word": false,
299
+ "special": true
300
+ },
301
+ "151680": {
302
+ "content": "<blank4>",
303
+ "lstrip": false,
304
+ "normalized": false,
305
+ "rstrip": false,
306
+ "single_word": false,
307
+ "special": true
308
+ },
309
+ "151681": {
310
+ "content": "<blank5>",
311
+ "lstrip": false,
312
+ "normalized": false,
313
+ "rstrip": false,
314
+ "single_word": false,
315
+ "special": true
316
+ },
317
+ "151682": {
318
+ "content": "<blank6>",
319
+ "lstrip": false,
320
+ "normalized": false,
321
+ "rstrip": false,
322
+ "single_word": false,
323
+ "special": true
324
+ },
325
+ "151683": {
326
+ "content": "<blank7>",
327
+ "lstrip": false,
328
+ "normalized": false,
329
+ "rstrip": false,
330
+ "single_word": false,
331
+ "special": true
332
+ },
333
+ "151684": {
334
+ "content": "<blank8>",
335
+ "lstrip": false,
336
+ "normalized": false,
337
+ "rstrip": false,
338
+ "single_word": false,
339
+ "special": true
340
+ },
341
+ "151685": {
342
+ "content": "<blank9>",
343
+ "lstrip": false,
344
+ "normalized": false,
345
+ "rstrip": false,
346
+ "single_word": false,
347
+ "special": true
348
+ },
349
+ "151686": {
350
+ "content": "<blank10>",
351
+ "lstrip": false,
352
+ "normalized": false,
353
+ "rstrip": false,
354
+ "single_word": false,
355
+ "special": true
356
+ },
357
+ "151687": {
358
+ "content": "<blank11>",
359
+ "lstrip": false,
360
+ "normalized": false,
361
+ "rstrip": false,
362
+ "single_word": false,
363
+ "special": true
364
+ },
365
+ "151688": {
366
+ "content": "<blank12>",
367
+ "lstrip": false,
368
+ "normalized": false,
369
+ "rstrip": false,
370
+ "single_word": false,
371
+ "special": true
372
+ },
373
+ "151689": {
374
+ "content": "<blank13>",
375
+ "lstrip": false,
376
+ "normalized": false,
377
+ "rstrip": false,
378
+ "single_word": false,
379
+ "special": true
380
+ },
381
+ "151690": {
382
+ "content": "<blank14>",
383
+ "lstrip": false,
384
+ "normalized": false,
385
+ "rstrip": false,
386
+ "single_word": false,
387
+ "special": true
388
+ },
389
+ "151691": {
390
+ "content": "<blank15>",
391
+ "lstrip": false,
392
+ "normalized": false,
393
+ "rstrip": false,
394
+ "single_word": false,
395
+ "special": true
396
+ },
397
+ "151692": {
398
+ "content": "<blank16>",
399
+ "lstrip": false,
400
+ "normalized": false,
401
+ "rstrip": false,
402
+ "single_word": false,
403
+ "special": true
404
+ },
405
+ "151693": {
406
+ "content": "<blank17>",
407
+ "lstrip": false,
408
+ "normalized": false,
409
+ "rstrip": false,
410
+ "single_word": false,
411
+ "special": true
412
+ },
413
+ "151694": {
414
+ "content": "<blank18>",
415
+ "lstrip": false,
416
+ "normalized": false,
417
+ "rstrip": false,
418
+ "single_word": false,
419
+ "special": true
420
+ },
421
+ "151695": {
422
+ "content": "<blank19>",
423
+ "lstrip": false,
424
+ "normalized": false,
425
+ "rstrip": false,
426
+ "single_word": false,
427
+ "special": true
428
+ },
429
+ "151696": {
430
+ "content": "<blank20>",
431
+ "lstrip": false,
432
+ "normalized": false,
433
+ "rstrip": false,
434
+ "single_word": false,
435
+ "special": true
436
+ },
437
+ "151697": {
438
+ "content": "<blank21>",
439
+ "lstrip": false,
440
+ "normalized": false,
441
+ "rstrip": false,
442
+ "single_word": false,
443
+ "special": true
444
+ },
445
+ "151698": {
446
+ "content": "<blank22>",
447
+ "lstrip": false,
448
+ "normalized": false,
449
+ "rstrip": false,
450
+ "single_word": false,
451
+ "special": true
452
+ },
453
+ "151699": {
454
+ "content": "<blank23>",
455
+ "lstrip": false,
456
+ "normalized": false,
457
+ "rstrip": false,
458
+ "single_word": false,
459
+ "special": true
460
+ },
461
+ "151700": {
462
+ "content": "<blank24>",
463
+ "lstrip": false,
464
+ "normalized": false,
465
+ "rstrip": false,
466
+ "single_word": false,
467
+ "special": true
468
+ },
469
+ "151701": {
470
+ "content": "<blank25>",
471
+ "lstrip": false,
472
+ "normalized": false,
473
+ "rstrip": false,
474
+ "single_word": false,
475
+ "special": true
476
+ },
477
+ "151702": {
478
+ "content": "<blank26>",
479
+ "lstrip": false,
480
+ "normalized": false,
481
+ "rstrip": false,
482
+ "single_word": false,
483
+ "special": true
484
+ },
485
+ "151703": {
486
+ "content": "<blank27>",
487
+ "lstrip": false,
488
+ "normalized": false,
489
+ "rstrip": false,
490
+ "single_word": false,
491
+ "special": true
492
+ },
493
+ "151704": {
494
+ "content": "<asr_text>",
495
+ "lstrip": false,
496
+ "normalized": false,
497
+ "rstrip": false,
498
+ "single_word": false,
499
+ "special": false
500
+ }
501
+ },
502
+ "additional_special_tokens": [
503
+ "<|im_start|>",
504
+ "<|im_end|>",
505
+ "<|object_ref_start|>",
506
+ "<|object_ref_end|>",
507
+ "<|box_start|>",
508
+ "<|box_end|>",
509
+ "<|quad_start|>",
510
+ "<|quad_end|>",
511
+ "<|vision_start|>",
512
+ "<|vision_end|>",
513
+ "<|vision_pad|>",
514
+ "<|image_pad|>",
515
+ "<|video_pad|>",
516
+ "<|audio_start|>",
517
+ "<|audio_end|>",
518
+ "<tts_pad>",
519
+ "<tts_text_bos>",
520
+ "<tts_text_bos_single>",
521
+ "<|audio_pad|>"
522
+ ],
523
+ "audio_bos_token": "<|audio_start|>",
524
+ "audio_eos_token": "<|audio_end|>",
525
+ "audio_token": "<|audio_pad|>",
526
+ "bos_token": null,
527
+ "clean_up_tokenization_spaces": false,
528
+ "eos_token": "<|im_end|>",
529
+ "errors": "replace",
530
+ "extra_special_tokens": {
531
+ "audio_bos_token": "<|audio_start|>",
532
+ "audio_eos_token": "<|audio_end|>",
533
+ "audio_token": "<|audio_pad|>",
534
+ "image_token": "<|image_pad|>",
535
+ "video_token": "<|video_pad|>",
536
+ "vision_bos_token": "<|vision_start|>",
537
+ "vision_eos_token": "<|vision_end|>"
538
+ },
539
+ "image_token": "<|image_pad|>",
540
+ "model_max_length": 131072,
541
+ "pad_token": "<|endoftext|>",
542
+ "processor_class": "Qwen3ASRProcessor",
543
+ "split_special_tokens": false,
544
+ "tokenizer_class": "Qwen2Tokenizer",
545
+ "unk_token": null,
546
+ "video_token": "<|video_pad|>",
547
+ "vision_bos_token": "<|vision_start|>",
548
+ "vision_eos_token": "<|vision_end|>"
549
+ }
lora/lora-stage2/trainer_state.json ADDED
The diff for this file is too large to render. See raw diff
 
lora/lora-stage2/vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
lora/lora-stage3/README.md ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: /data/haobin/Qwen3-ASR/Qwen3-ASR-1.7B-lora-merged
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - base_model:adapter:/data/haobin/Qwen3-ASR/Qwen3-ASR-1.7B-lora-merged
7
+ - lora
8
+ - transformers
9
+ ---
10
+
11
+ # Model Card for Model ID
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ <!-- Provide a longer summary of what this model is. -->
22
+
23
+
24
+
25
+ - **Developed by:** [More Information Needed]
26
+ - **Funded by [optional]:** [More Information Needed]
27
+ - **Shared by [optional]:** [More Information Needed]
28
+ - **Model type:** [More Information Needed]
29
+ - **Language(s) (NLP):** [More Information Needed]
30
+ - **License:** [More Information Needed]
31
+ - **Finetuned from model [optional]:** [More Information Needed]
32
+
33
+ ### Model Sources [optional]
34
+
35
+ <!-- Provide the basic links for the model. -->
36
+
37
+ - **Repository:** [More Information Needed]
38
+ - **Paper [optional]:** [More Information Needed]
39
+ - **Demo [optional]:** [More Information Needed]
40
+
41
+ ## Uses
42
+
43
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
+
45
+ ### Direct Use
46
+
47
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Downstream Use [optional]
52
+
53
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
+
55
+ [More Information Needed]
56
+
57
+ ### Out-of-Scope Use
58
+
59
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ## Bias, Risks, and Limitations
64
+
65
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
+
67
+ [More Information Needed]
68
+
69
+ ### Recommendations
70
+
71
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
+
73
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
+
75
+ ## How to Get Started with the Model
76
+
77
+ Use the code below to get started with the model.
78
+
79
+ [More Information Needed]
80
+
81
+ ## Training Details
82
+
83
+ ### Training Data
84
+
85
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
+
87
+ [More Information Needed]
88
+
89
+ ### Training Procedure
90
+
91
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
+
93
+ #### Preprocessing [optional]
94
+
95
+ [More Information Needed]
96
+
97
+
98
+ #### Training Hyperparameters
99
+
100
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
+
102
+ #### Speeds, Sizes, Times [optional]
103
+
104
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
+
106
+ [More Information Needed]
107
+
108
+ ## Evaluation
109
+
110
+ <!-- This section describes the evaluation protocols and provides the results. -->
111
+
112
+ ### Testing Data, Factors & Metrics
113
+
114
+ #### Testing Data
115
+
116
+ <!-- This should link to a Dataset Card if possible. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Factors
121
+
122
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
+
124
+ [More Information Needed]
125
+
126
+ #### Metrics
127
+
128
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
+
130
+ [More Information Needed]
131
+
132
+ ### Results
133
+
134
+ [More Information Needed]
135
+
136
+ #### Summary
137
+
138
+
139
+
140
+ ## Model Examination [optional]
141
+
142
+ <!-- Relevant interpretability work for the model goes here -->
143
+
144
+ [More Information Needed]
145
+
146
+ ## Environmental Impact
147
+
148
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
+
150
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
+
152
+ - **Hardware Type:** [More Information Needed]
153
+ - **Hours used:** [More Information Needed]
154
+ - **Cloud Provider:** [More Information Needed]
155
+ - **Compute Region:** [More Information Needed]
156
+ - **Carbon Emitted:** [More Information Needed]
157
+
158
+ ## Technical Specifications [optional]
159
+
160
+ ### Model Architecture and Objective
161
+
162
+ [More Information Needed]
163
+
164
+ ### Compute Infrastructure
165
+
166
+ [More Information Needed]
167
+
168
+ #### Hardware
169
+
170
+ [More Information Needed]
171
+
172
+ #### Software
173
+
174
+ [More Information Needed]
175
+
176
+ ## Citation [optional]
177
+
178
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
+
180
+ **BibTeX:**
181
+
182
+ [More Information Needed]
183
+
184
+ **APA:**
185
+
186
+ [More Information Needed]
187
+
188
+ ## Glossary [optional]
189
+
190
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
+
192
+ [More Information Needed]
193
+
194
+ ## More Information [optional]
195
+
196
+ [More Information Needed]
197
+
198
+ ## Model Card Authors [optional]
199
+
200
+ [More Information Needed]
201
+
202
+ ## Model Card Contact
203
+
204
+ [More Information Needed]
205
+ ### Framework versions
206
+
207
+ - PEFT 0.18.1
lora/lora-stage3/adapter_config.json ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alora_invocation_tokens": null,
3
+ "alpha_pattern": {},
4
+ "arrow_config": null,
5
+ "auto_mapping": null,
6
+ "base_model_name_or_path": "/data/haobin/Qwen3-ASR/Qwen3-ASR-1.7B-lora-merged",
7
+ "bias": "none",
8
+ "corda_config": null,
9
+ "ensure_weight_tying": false,
10
+ "eva_config": null,
11
+ "exclude_modules": null,
12
+ "fan_in_fan_out": false,
13
+ "inference_mode": true,
14
+ "init_lora_weights": true,
15
+ "layer_replication": null,
16
+ "layers_pattern": null,
17
+ "layers_to_transform": null,
18
+ "loftq_config": {},
19
+ "lora_alpha": 32,
20
+ "lora_bias": false,
21
+ "lora_dropout": 0.05,
22
+ "megatron_config": null,
23
+ "megatron_core": "megatron.core",
24
+ "modules_to_save": [],
25
+ "peft_type": "LORA",
26
+ "peft_version": "0.18.1",
27
+ "qalora_group_size": 16,
28
+ "r": 8,
29
+ "rank_pattern": {},
30
+ "revision": null,
31
+ "target_modules": "^(thinker\\.model(?=\\.).*\\.(k_proj|q_proj|o_proj|up_proj|down_proj|v_proj|gate_proj)|(?!(thinker.audio_tower.proj1|thinker.audio_tower.proj2))thinker\\.audio_tower(?=\\.).*\\.(fc1|out_proj|proj1|k_proj|q_proj|fc2|proj2|v_proj|conv_out)|thinker\\.audio_tower\\.proj1(?=\\.)|thinker\\.audio_tower\\.proj2(?=\\.))$",
32
+ "target_parameters": null,
33
+ "task_type": "CAUSAL_LM",
34
+ "trainable_token_indices": null,
35
+ "use_dora": false,
36
+ "use_qalora": false,
37
+ "use_rslora": false
38
+ }
lora/lora-stage3/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1b5507acb5bb51851c4db58504cac3dcc748dbc37210b986e93624bb9ea115b0
3
+ size 49395592
lora/lora-stage3/additional_config.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"lora_dtype": null, "lorap_lr_ratio": null, "lorap_emb_lr": 1e-06}
lora/lora-stage3/args.json ADDED
@@ -0,0 +1,502 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "output_dir": "/data/haobin/pky_train/qwen3_swift/pky_out/qwen3asr_dapo_reward5_3x8x8_12gen_3GPU/v3-20260410-173721",
3
+ "overwrite_output_dir": false,
4
+ "do_train": false,
5
+ "do_eval": false,
6
+ "do_predict": false,
7
+ "eval_strategy": "steps",
8
+ "prediction_loss_only": false,
9
+ "per_device_train_batch_size": 4,
10
+ "per_device_eval_batch_size": 4,
11
+ "per_gpu_train_batch_size": null,
12
+ "per_gpu_eval_batch_size": null,
13
+ "gradient_accumulation_steps": 16,
14
+ "eval_accumulation_steps": null,
15
+ "eval_delay": 0,
16
+ "torch_empty_cache_steps": null,
17
+ "learning_rate": 5e-05,
18
+ "weight_decay": 0.1,
19
+ "adam_beta1": 0.9,
20
+ "adam_beta2": 0.95,
21
+ "adam_epsilon": 1e-08,
22
+ "max_grad_norm": 1.0,
23
+ "num_train_epochs": 3.0,
24
+ "max_steps": -1,
25
+ "lr_scheduler_type": "cosine",
26
+ "lr_scheduler_kwargs": null,
27
+ "warmup_ratio": 0.03,
28
+ "warmup_steps": 0,
29
+ "log_level": "passive",
30
+ "log_level_replica": "warning",
31
+ "log_on_each_node": true,
32
+ "logging_dir": "/data/haobin/pky_train/qwen3_swift/pky_out/qwen3asr_dapo_reward5_3x8x8_12gen_3GPU/v3-20260410-173721/runs",
33
+ "logging_strategy": "steps",
34
+ "logging_first_step": true,
35
+ "logging_steps": 5,
36
+ "logging_nan_inf_filter": true,
37
+ "save_strategy": "steps",
38
+ "save_steps": 20.0,
39
+ "save_total_limit": null,
40
+ "save_safetensors": true,
41
+ "save_on_each_node": false,
42
+ "save_only_model": false,
43
+ "restore_callback_states_from_checkpoint": false,
44
+ "no_cuda": false,
45
+ "use_cpu": false,
46
+ "use_mps_device": false,
47
+ "seed": 42,
48
+ "data_seed": 42,
49
+ "jit_mode_eval": false,
50
+ "bf16": true,
51
+ "fp16": false,
52
+ "fp16_opt_level": "O1",
53
+ "half_precision_backend": "auto",
54
+ "bf16_full_eval": false,
55
+ "fp16_full_eval": false,
56
+ "tf32": null,
57
+ "local_rank": 0,
58
+ "ddp_backend": null,
59
+ "tpu_num_cores": null,
60
+ "tpu_metrics_debug": false,
61
+ "debug": null,
62
+ "dataloader_drop_last": false,
63
+ "eval_steps": 20.0,
64
+ "dataloader_num_workers": null,
65
+ "dataloader_prefetch_factor": null,
66
+ "past_index": -1,
67
+ "run_name": "qwen3asr_dapo_reward5_3x8x8_12gen_3GPU",
68
+ "disable_tqdm": null,
69
+ "remove_unused_columns": false,
70
+ "label_names": null,
71
+ "load_best_model_at_end": false,
72
+ "metric_for_best_model": "loss",
73
+ "greater_is_better": false,
74
+ "ignore_data_skip": false,
75
+ "fsdp": [],
76
+ "fsdp_min_num_params": 0,
77
+ "fsdp_config": null,
78
+ "fsdp_transformer_layer_cls_to_wrap": null,
79
+ "accelerator_config": {
80
+ "dispatch_batches": false
81
+ },
82
+ "parallelism_config": null,
83
+ "deepspeed": null,
84
+ "label_smoothing_factor": 0.0,
85
+ "optim": "adamw_torch_fused",
86
+ "optim_args": null,
87
+ "adafactor": false,
88
+ "group_by_length": false,
89
+ "length_column_name": "length",
90
+ "report_to": [
91
+ "wandb"
92
+ ],
93
+ "project": "huggingface",
94
+ "trackio_space_id": "trackio",
95
+ "ddp_find_unused_parameters": null,
96
+ "ddp_bucket_cap_mb": null,
97
+ "ddp_broadcast_buffers": null,
98
+ "dataloader_pin_memory": true,
99
+ "dataloader_persistent_workers": false,
100
+ "skip_memory_metrics": true,
101
+ "use_legacy_prediction_loop": false,
102
+ "push_to_hub": false,
103
+ "resume_from_checkpoint": null,
104
+ "hub_model_id": null,
105
+ "hub_strategy": "every_save",
106
+ "hub_token": null,
107
+ "hub_private_repo": null,
108
+ "hub_always_push": false,
109
+ "hub_revision": null,
110
+ "gradient_checkpointing": true,
111
+ "gradient_checkpointing_kwargs": null,
112
+ "include_inputs_for_metrics": false,
113
+ "include_for_metrics": [],
114
+ "eval_do_concat_batches": true,
115
+ "fp16_backend": "auto",
116
+ "push_to_hub_model_id": null,
117
+ "push_to_hub_organization": null,
118
+ "push_to_hub_token": null,
119
+ "mp_parameters": "",
120
+ "auto_find_batch_size": false,
121
+ "full_determinism": false,
122
+ "torchdynamo": null,
123
+ "ray_scope": "last",
124
+ "ddp_timeout": 18000000,
125
+ "torch_compile": false,
126
+ "torch_compile_backend": null,
127
+ "torch_compile_mode": null,
128
+ "include_tokens_per_second": false,
129
+ "include_num_input_tokens_seen": false,
130
+ "neftune_noise_alpha": null,
131
+ "optim_target_modules": null,
132
+ "batch_eval_metrics": false,
133
+ "eval_on_start": false,
134
+ "use_liger_kernel": false,
135
+ "liger_kernel_config": null,
136
+ "eval_use_gather_object": false,
137
+ "average_tokens_across_devices": true,
138
+ "sortish_sampler": false,
139
+ "predict_with_generate": false,
140
+ "generation_max_length": null,
141
+ "generation_num_beams": null,
142
+ "generation_config": null,
143
+ "tuner_backend": "peft",
144
+ "vit_gradient_checkpointing": null,
145
+ "router_aux_loss_coef": 0.0,
146
+ "enable_dft_loss": false,
147
+ "enable_channel_loss": false,
148
+ "safe_serialization": true,
149
+ "max_shard_size": "5GB",
150
+ "check_model": true,
151
+ "acc_strategy": "token",
152
+ "train_dataloader_shuffle": true,
153
+ "max_epochs": null,
154
+ "aligner_lr": null,
155
+ "vit_lr": null,
156
+ "use_logits_to_keep": null,
157
+ "ds3_gather_for_generation": true,
158
+ "resume_only_model": false,
159
+ "optimizer": null,
160
+ "loss_type": "dapo",
161
+ "eval_metric": null,
162
+ "callbacks": [],
163
+ "early_stop_interval": null,
164
+ "eval_use_evalscope": false,
165
+ "eval_dataset": [],
166
+ "eval_dataset_args": null,
167
+ "eval_limit": null,
168
+ "eval_generation_config": null,
169
+ "extra_eval_args": null,
170
+ "tuner_type": "lora",
171
+ "use_galore": false,
172
+ "galore_target_modules": null,
173
+ "galore_rank": 128,
174
+ "galore_update_proj_gap": 50,
175
+ "galore_scale": 1.0,
176
+ "galore_proj_type": "std",
177
+ "galore_optim_per_parameter": false,
178
+ "galore_with_embedding": false,
179
+ "galore_quantization": false,
180
+ "galore_proj_quant": false,
181
+ "galore_proj_bits": 4,
182
+ "galore_proj_group_size": 256,
183
+ "galore_cos_threshold": 0.4,
184
+ "galore_gamma_proj": 2,
185
+ "galore_queue_size": 5,
186
+ "lisa_activated_layers": 0,
187
+ "lisa_step_interval": 20,
188
+ "use_flash_ckpt": false,
189
+ "use_ray": false,
190
+ "ray_exp_name": null,
191
+ "device_groups": null,
192
+ "model": "/data/haobin/Qwen3-ASR/Qwen3-ASR-1.7B-lora-merged",
193
+ "model_type": "my_qwen3_asr_rl",
194
+ "model_revision": null,
195
+ "task_type": "causal_lm",
196
+ "torch_dtype": "bfloat16",
197
+ "attn_impl": null,
198
+ "experts_impl": null,
199
+ "new_special_tokens": [],
200
+ "num_labels": null,
201
+ "problem_type": null,
202
+ "rope_scaling": null,
203
+ "device_map": null,
204
+ "max_memory": {},
205
+ "max_model_len": null,
206
+ "local_repo_path": null,
207
+ "init_strategy": null,
208
+ "template": "my_qwen3_asr_rl",
209
+ "system": null,
210
+ "max_length": 65536,
211
+ "truncation_strategy": "delete",
212
+ "max_pixels": null,
213
+ "agent_template": null,
214
+ "norm_bbox": null,
215
+ "use_chat_template": true,
216
+ "padding_side": "left",
217
+ "padding_free": false,
218
+ "loss_scale": "last_round",
219
+ "sequence_parallel_size": 1,
220
+ "template_backend": "swift",
221
+ "response_prefix": null,
222
+ "enable_thinking": null,
223
+ "add_non_thinking_prefix": true,
224
+ "dataset": [
225
+ "/data/haobin/batch_process/lora_0323_10w+55w+error+syn_with_domain_train90_targeted_rl_train90_loramerged_basewer_271.jsonl"
226
+ ],
227
+ "val_dataset": [
228
+ "/data/haobin/batch_process/lora_0323_10w+55w+error+syn_with_domain_train90_targeted_rl_val5_sample5p.jsonl"
229
+ ],
230
+ "cached_dataset": [],
231
+ "cached_val_dataset": [],
232
+ "split_dataset_ratio": 0.0,
233
+ "dataset_num_proc": 1,
234
+ "load_from_cache_file": false,
235
+ "dataset_shuffle": true,
236
+ "val_dataset_shuffle": false,
237
+ "streaming": false,
238
+ "interleave_prob": null,
239
+ "stopping_strategy": "first_exhausted",
240
+ "shuffle_buffer_size": 1000,
241
+ "download_mode": "reuse_dataset_if_exists",
242
+ "columns": {},
243
+ "strict": false,
244
+ "model_name": null,
245
+ "model_author": null,
246
+ "custom_dataset_info": [],
247
+ "quant_method": null,
248
+ "quant_bits": null,
249
+ "hqq_axis": null,
250
+ "bnb_4bit_compute_dtype": "bfloat16",
251
+ "bnb_4bit_quant_type": "nf4",
252
+ "bnb_4bit_use_double_quant": true,
253
+ "bnb_4bit_quant_storage": null,
254
+ "max_new_tokens": 256,
255
+ "temperature": 0.5,
256
+ "top_k": 50,
257
+ "top_p": 0.95,
258
+ "repetition_penalty": 1.08,
259
+ "num_beams": 1,
260
+ "stream": false,
261
+ "stop_words": [],
262
+ "logprobs": false,
263
+ "top_logprobs": null,
264
+ "structured_outputs_regex": null,
265
+ "train_type": "lora",
266
+ "adapters": [],
267
+ "external_plugins": [
268
+ "/data/haobin/pky_train/qwen3_swift/my_qwen3_asr_dapo_register.py",
269
+ "/data/haobin/pky_train/qwen3_swift/qwen3_RL_reward5.py"
270
+ ],
271
+ "custom_register_path": [],
272
+ "model_kwargs": {},
273
+ "load_args": false,
274
+ "load_data_args": false,
275
+ "packing": false,
276
+ "packing_length": null,
277
+ "packing_num_proc": 1,
278
+ "lazy_tokenize": true,
279
+ "use_hf": false,
280
+ "ignore_args_error": false,
281
+ "use_swift_lora": false,
282
+ "freeze_parameters": [],
283
+ "freeze_parameters_regex": null,
284
+ "freeze_parameters_ratio": 0.0,
285
+ "trainable_parameters": [],
286
+ "trainable_parameters_regex": null,
287
+ "freeze_llm": false,
288
+ "freeze_vit": false,
289
+ "freeze_aligner": false,
290
+ "target_modules": [
291
+ "all-linear"
292
+ ],
293
+ "target_regex": null,
294
+ "target_parameters": null,
295
+ "modules_to_save": [],
296
+ "lora_rank": 8,
297
+ "lora_alpha": 32,
298
+ "lora_dropout": 0.05,
299
+ "lora_bias": "none",
300
+ "lora_dtype": null,
301
+ "lorap_lr_ratio": null,
302
+ "use_rslora": false,
303
+ "use_dora": false,
304
+ "lora_ga_batch_size": 2,
305
+ "lora_ga_iters": 2,
306
+ "lora_ga_max_length": 1024,
307
+ "lora_ga_direction": "ArB2r",
308
+ "lora_ga_scale": "stable",
309
+ "lora_ga_stable_gamma": 16,
310
+ "init_weights": true,
311
+ "fourier_n_frequency": 2000,
312
+ "fourier_scaling": 300.0,
313
+ "boft_block_size": 4,
314
+ "boft_block_num": 0,
315
+ "boft_n_butterfly_factor": 1,
316
+ "boft_dropout": 0.0,
317
+ "vera_rank": 256,
318
+ "vera_projection_prng_key": 0,
319
+ "vera_dropout": 0.0,
320
+ "vera_d_initial": 0.1,
321
+ "adapter_act": "gelu",
322
+ "adapter_length": 128,
323
+ "adalora_target_r": 8,
324
+ "adalora_init_r": 12,
325
+ "adalora_tinit": 0,
326
+ "adalora_tfinal": 0,
327
+ "adalora_deltaT": 1,
328
+ "adalora_beta1": 0.85,
329
+ "adalora_beta2": 0.85,
330
+ "adalora_orth_reg_weight": 0.5,
331
+ "llamapro_num_new_blocks": 4,
332
+ "llamapro_num_groups": null,
333
+ "reft_layer_key": null,
334
+ "reft_layers": null,
335
+ "reft_rank": 4,
336
+ "reft_intervention_type": "LoreftIntervention",
337
+ "reft_args": null,
338
+ "swanlab_token": null,
339
+ "swanlab_project": "ms-swift",
340
+ "swanlab_workspace": null,
341
+ "swanlab_exp_name": null,
342
+ "swanlab_notification_method": null,
343
+ "swanlab_webhook_url": null,
344
+ "swanlab_secret": null,
345
+ "swanlab_sender_email": null,
346
+ "swanlab_receiver_email": null,
347
+ "swanlab_smtp_server": null,
348
+ "swanlab_smtp_port": null,
349
+ "swanlab_email_language": "zh",
350
+ "swanlab_mode": "cloud",
351
+ "add_version": true,
352
+ "create_checkpoint_symlink": false,
353
+ "zero_hpz_partition_size": null,
354
+ "deepspeed_autotp_size": null,
355
+ "reward_model": null,
356
+ "reward_adapters": [],
357
+ "reward_model_type": null,
358
+ "reward_model_revision": null,
359
+ "num_ppo_epochs": 4,
360
+ "whiten_rewards": false,
361
+ "kl_coef": 0.05,
362
+ "cliprange": 0.2,
363
+ "vf_coef": 0.1,
364
+ "cliprange_value": 0.2,
365
+ "gamma": 1.0,
366
+ "lam": 0.95,
367
+ "num_mini_batches": 1,
368
+ "local_rollout_forward_batch_size": 64,
369
+ "num_sample_generations": 10,
370
+ "response_length": 256,
371
+ "missing_eos_penalty": null,
372
+ "vllm_gpu_memory_utilization": 0.9,
373
+ "vllm_tensor_parallel_size": 1,
374
+ "vllm_pipeline_parallel_size": 1,
375
+ "vllm_enable_expert_parallel": false,
376
+ "vllm_max_num_seqs": null,
377
+ "vllm_max_model_len": null,
378
+ "vllm_disable_custom_all_reduce": true,
379
+ "vllm_enforce_eager": false,
380
+ "vllm_limit_mm_per_prompt": null,
381
+ "vllm_max_lora_rank": 16,
382
+ "vllm_enable_prefix_caching": true,
383
+ "vllm_use_async_engine": null,
384
+ "vllm_quantization": null,
385
+ "vllm_reasoning_parser": null,
386
+ "vllm_disable_cascade_attn": false,
387
+ "vllm_mm_processor_cache_gb": null,
388
+ "vllm_speculative_config": null,
389
+ "vllm_engine_kwargs": {},
390
+ "vllm_data_parallel_size": 1,
391
+ "use_vllm": false,
392
+ "vllm_mode": null,
393
+ "vllm_enable_lora": false,
394
+ "vllm_server_base_url": null,
395
+ "vllm_server_host": null,
396
+ "vllm_server_port": [
397
+ 8000
398
+ ],
399
+ "vllm_server_timeout": 240.0,
400
+ "vllm_server_group_port": null,
401
+ "enable_flattened_weight_sync": true,
402
+ "async_generate": false,
403
+ "sleep_level": 0,
404
+ "move_model_batches": null,
405
+ "offload_optimizer": false,
406
+ "offload_model": false,
407
+ "wandb_log_unique_prompts": null,
408
+ "epsilon": 0.2,
409
+ "epsilon_high": 0.28,
410
+ "delta": null,
411
+ "cosine_min_len_value_wrong": -0.5,
412
+ "cosine_max_len_value_wrong": 0.0,
413
+ "cosine_min_len_value_correct": 1.0,
414
+ "cosine_max_len_value_correct": 0.5,
415
+ "cosine_max_len": null,
416
+ "repetition_n_grams": 3,
417
+ "repetition_max_penalty": -1.0,
418
+ "reward_model_plugin": null,
419
+ "chord_sft_dataset": [],
420
+ "chord_sft_per_device_train_batch_size": null,
421
+ "chord_enable_phi_function": false,
422
+ "chord_mu_warmup_steps": null,
423
+ "chord_mu_decay_steps": null,
424
+ "chord_mu_peak": null,
425
+ "chord_mu_valley": null,
426
+ "sync_ref_model": false,
427
+ "ref_model_sync_steps": 512,
428
+ "ref_model_mixup_alpha": 0.6,
429
+ "multi_turn_scheduler": null,
430
+ "max_turns": null,
431
+ "completion_length_limit_scope": "per_round",
432
+ "vllm_server_pass_dataset": false,
433
+ "dynamic_sample": true,
434
+ "max_resample_times": 4,
435
+ "overlong_filter": true,
436
+ "soft_max_length": null,
437
+ "soft_cache_length": null,
438
+ "scale_rewards": "group",
439
+ "log_entropy": false,
440
+ "top_entropy_quantile": 1.0,
441
+ "importance_sampling_level": "token",
442
+ "tau_pos": 1.0,
443
+ "tau_neg": 1.05,
444
+ "advantage_estimator": "grpo",
445
+ "kl_in_reward": false,
446
+ "generation_batch_size": 48,
447
+ "steps_per_generation": null,
448
+ "num_generations_eval": 4,
449
+ "rollout_importance_sampling_mode": null,
450
+ "rollout_importance_sampling_threshold": 2.0,
451
+ "log_rollout_offpolicy_metrics": false,
452
+ "off_policy_sequence_mask_delta": null,
453
+ "num_generations": 12,
454
+ "reward_funcs": [
455
+ "asr_wer_hallu_len_v5"
456
+ ],
457
+ "reward_weights": null,
458
+ "log_completions": true,
459
+ "num_iterations": 2,
460
+ "teacher_model": null,
461
+ "teacher_adapters": [],
462
+ "teacher_model_type": null,
463
+ "teacher_model_revision": null,
464
+ "teacher_deepspeed": null,
465
+ "teacher_model_server": null,
466
+ "rlhf_type": "grpo",
467
+ "ref_model": null,
468
+ "ref_adapters": [],
469
+ "ref_model_type": null,
470
+ "ref_model_revision": null,
471
+ "beta": 0.04,
472
+ "label_smoothing": 0,
473
+ "max_completion_length": 256,
474
+ "rpo_alpha": null,
475
+ "ld_alpha": null,
476
+ "discopop_tau": 0.05,
477
+ "loss_weights": null,
478
+ "cpo_alpha": 1.0,
479
+ "simpo_gamma": 1,
480
+ "desirable_weight": 1.0,
481
+ "undesirable_weight": 1.0,
482
+ "center_rewards_coefficient": null,
483
+ "sft_alpha": 0,
484
+ "lmbda": 0.5,
485
+ "seq_kd": false,
486
+ "gkd_logits_topk": null,
487
+ "offload_teacher_model": false,
488
+ "swift_version": "4.0.3",
489
+ "ckpt_dir": null,
490
+ "rank": 0,
491
+ "global_world_size": 3,
492
+ "local_world_size": 3,
493
+ "model_suffix": "Qwen3-ASR-1.7B-lora-merged",
494
+ "model_info": "ModelInfo(model_type='my_qwen3_asr_rl', model_dir='/data/haobin/Qwen3-ASR/Qwen3-ASR-1.7B-lora-merged', torch_dtype=torch.bfloat16, max_model_len=65536, quant_method=None, quant_bits=None, rope_scaling={'interleaved': True, 'mrope_interleaved': True, 'mrope_section': [24, 20, 20], 'rope_type': 'default', 'type': 'default'}, is_moe_model=False, is_multimodal=True, config=None, task_type='causal_lm', num_labels=None)",
495
+ "model_meta": "ModelMeta(model_type='my_qwen3_asr_rl', model_groups=[ModelGroup(models=[Model(ms_model_id='Qwen/Qwen3-ASR-0.6B', hf_model_id=None, model_path=None, ms_revision=None, hf_revision=None), Model(ms_model_id='Qwen/Qwen3-ASR-1.7B', hf_model_id=None, model_path=None, ms_revision=None, hf_revision=None)], template=None, ignore_patterns=None, requires=None, tags=[])], loader=<class 'my_qwen3_asr_dapo_register.Qwen3ASRRLLoader'>, template='my_qwen3_asr_rl', model_arch=MultiModelKeys(arch_name='my_qwen3_asr_rl', embedding=None, module_list=None, lm_head=None, q_proj=None, k_proj=None, v_proj=None, o_proj=None, attention=None, mlp=None, down_proj=None, qkv_proj=None, qk_proj=None, qa_proj=None, qb_proj=None, kv_proj=None, kva_proj=None, kvb_proj=None, language_model=['thinker.model', 'thinker.lm_head'], aligner=['thinker.audio_tower.proj1', 'thinker.audio_tower.proj2'], vision_tower=['thinker.audio_tower'], generator=[]), architectures=['Qwen3ASRForConditionalGeneration'], additional_saved_files=['generation_config.json', 'preprocessor_config.json', 'processor_config.json', 'tokenizer_config.json', 'tokenizer.json', 'special_tokens_map.json', 'chat_template.json', 'merges.txt', 'vocab.json'], torch_dtype=None, is_multimodal=True, is_reward=False, task_type=None, ignore_patterns=None, requires=['transformers>=4.57', 'qwen-asr', 'librosa'], tags=['audio'])",
496
+ "model_dir": "/data/haobin/Qwen3-ASR/Qwen3-ASR-1.7B-lora-merged",
497
+ "template_meta": "TemplateMeta(template_type='my_qwen3_asr_rl', prefix=[], prompt=['{{QUERY}}'], chat_sep=[], suffix=[''], template_cls=<class 'my_qwen3_asr_dapo_register.Qwen3ASRRLTemplate'>, system_prefix=[], default_system=None, auto_add_bos=False, stop_words=[], agent_template='react_en', is_thinking=False, thinking_prefix='', non_thinking_prefix='', history_thinking_prefix='')",
498
+ "_val_dataset_exists": true,
499
+ "hub": "<class 'swift.hub.hub.MSHub'>",
500
+ "evaluation_strategy": "steps",
501
+ "training_args": "GRPOConfig(output_dir='/data/haobin/pky_train/qwen3_swift/pky_out/qwen3asr_dapo_reward5_3x8x8_12gen_3GPU/v3-20260410-173721', overwrite_output_dir=False, do_train=False, do_eval=True, do_predict=False, eval_strategy=<IntervalStrategy.STEPS: 'steps'>, prediction_loss_only=False, per_device_train_batch_size=4, per_device_eval_batch_size=4, per_gpu_train_batch_size=None, per_gpu_eval_batch_size=None, gradient_accumulation_steps=16, eval_accumulation_steps=None, eval_delay=0, torch_empty_cache_steps=None, learning_rate=5e-05, weight_decay=0.1, adam_beta1=0.9, adam_beta2=0.95, adam_epsilon=1e-08, max_grad_norm=1.0, num_train_epochs=3.0, max_steps=-1, lr_scheduler_type=<SchedulerType.COSINE: 'cosine'>, lr_scheduler_kwargs=None, warmup_ratio=0.03, warmup_steps=0, log_level='passive', log_level_replica='warning', log_on_each_node=True, logging_dir='/data/haobin/pky_train/qwen3_swift/pky_out/qwen3asr_dapo_reward5_3x8x8_12gen_3GPU/v3-20260410-173721/runs', logging_strategy=<IntervalStrategy.STEPS: 'steps'>, logging_first_step=True, logging_steps=5, logging_nan_inf_filter=True, save_strategy=<SaveStrategy.STEPS: 'steps'>, save_steps=20, save_total_limit=None, save_safetensors=True, save_on_each_node=False, save_only_model=False, restore_callback_states_from_checkpoint=False, no_cuda=False, use_cpu=False, use_mps_device=False, seed=42, data_seed=42, jit_mode_eval=False, bf16=True, fp16=False, fp16_opt_level='O1', half_precision_backend='auto', bf16_full_eval=False, fp16_full_eval=False, tf32=None, local_rank=0, ddp_backend=None, tpu_num_cores=None, tpu_metrics_debug=False, debug=[], dataloader_drop_last=True, eval_steps=20, dataloader_num_workers=1, dataloader_prefetch_factor=2, past_index=-1, run_name='qwen3asr_dapo_reward5_3x8x8_12gen_3GPU', disable_tqdm=False, remove_unused_columns=False, label_names=None, load_best_model_at_end=False, metric_for_best_model='loss', greater_is_better=False, ignore_data_skip=False, fsdp=[], fsdp_min_num_params=0, fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_transformer_layer_cls_to_wrap=None, accelerator_config=AcceleratorConfig(split_batches=False, dispatch_batches=False, even_batches=True, use_seedable_sampler=True, non_blocking=False, gradient_accumulation_kwargs=None, use_configured_state=False), parallelism_config=None, deepspeed=None, label_smoothing_factor=0.0, optim=<OptimizerNames.ADAMW_TORCH_FUSED: 'adamw_torch_fused'>, optim_args=None, adafactor=False, group_by_length=False, length_column_name='length', report_to=['wandb'], project='huggingface', trackio_space_id='trackio', ddp_find_unused_parameters=None, ddp_bucket_cap_mb=None, ddp_broadcast_buffers=None, dataloader_pin_memory=True, dataloader_persistent_workers=False, skip_memory_metrics=True, use_legacy_prediction_loop=False, push_to_hub=False, resume_from_checkpoint=None, hub_model_id=None, hub_strategy=<HubStrategy.EVERY_SAVE: 'every_save'>, hub_token=None, hub_private_repo=None, hub_always_push=False, hub_revision=None, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, include_inputs_for_metrics=False, include_for_metrics=[], eval_do_concat_batches=True, fp16_backend='auto', push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=None, mp_parameters='', auto_find_batch_size=False, full_determinism=False, torchdynamo=None, ray_scope='last', ddp_timeout=18000000, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, include_tokens_per_second=None, include_num_input_tokens_seen=None, neftune_noise_alpha=None, optim_target_modules=None, batch_eval_metrics=False, eval_on_start=False, use_liger_kernel=False, liger_kernel_config=None, eval_use_gather_object=False, average_tokens_across_devices=None, model_init_kwargs=None, disable_dropout=False, cast_lm_head_to_fp32=False, num_generations=12, num_generations_eval=4, max_completion_length=256, ds3_gather_for_generation=True, shuffle_dataset=True, generation_batch_size=48, steps_per_generation=4, temperature=0.5, top_p=0.95, top_k=50, min_p=None, generation_kwargs=None, chat_template_kwargs=None, repetition_penalty=1.08, use_transformers_paged=False, cache_implementation=None, use_vllm=False, vllm_mode=None, vllm_model_impl='vllm', vllm_enable_sleep_mode=False, vllm_structured_outputs_regex=None, vllm_server_base_url=None, vllm_server_host=None, vllm_server_port=[8000], vllm_server_timeout=240.0, vllm_group_port=51216, vllm_gpu_memory_utilization=0.9, vllm_max_model_length=None, vllm_tensor_parallel_size=1, beta=0.04, num_iterations=2, epsilon=0.2, delta=None, epsilon_high=0.28, sapo_temperature_neg=1.05, sapo_temperature_pos=1.0, importance_sampling_level='token', reward_weights=None, multi_objective_aggregation='sum_then_normalize', scale_rewards='group', loss_type='dapo', mask_truncated_completions=False, sync_ref_model=False, ref_model_mixup_alpha=0.6, ref_model_sync_steps=512, top_entropy_quantile=1.0, max_tool_calling_iterations=None, vllm_importance_sampling_correction=True, vllm_importance_sampling_mode='sequence_mask', vllm_importance_sampling_cap=3.0, off_policy_mask_threshold=None, use_bias_correction_kl=False, log_completions=True, num_completions_to_print=None, log_unique_prompts=False, log_completions_hub_repo=None, tuner_backend='peft', vit_gradient_checkpointing=True, router_aux_loss_coef=0.0, enable_dft_loss=False, enable_channel_loss=False, safe_serialization=True, max_shard_size='5GB', check_model=True, acc_strategy='token', train_dataloader_shuffle=True, max_epochs=None, aligner_lr=None, vit_lr=None, use_logits_to_keep=None, resume_only_model=False, optimizer=None, eval_metric=None, callbacks=[], early_stop_interval=None, eval_use_evalscope=False, eval_dataset=[], eval_dataset_args=None, eval_limit=None, eval_generation_config=None, extra_eval_args=None, tuner_type='lora', use_galore=False, galore_target_modules=None, galore_rank=128, galore_update_proj_gap=50, galore_scale=1.0, galore_proj_type='std', galore_optim_per_parameter=False, galore_with_embedding=False, galore_quantization=False, galore_proj_quant=False, galore_proj_bits=4, galore_proj_group_size=256, galore_cos_threshold=0.4, galore_gamma_proj=2, galore_queue_size=5, lisa_activated_layers=0, lisa_step_interval=20, use_flash_ckpt=False, vllm_pipeline_parallel_size=1, vllm_enable_expert_parallel=False, vllm_max_num_seqs=None, vllm_max_model_len=None, vllm_disable_custom_all_reduce=True, vllm_enforce_eager=False, vllm_limit_mm_per_prompt=None, vllm_max_lora_rank=16, vllm_enable_prefix_caching=True, vllm_use_async_engine=None, vllm_quantization=None, vllm_reasoning_parser=None, vllm_disable_cascade_attn=False, vllm_mm_processor_cache_gb=None, vllm_speculative_config=None, vllm_engine_kwargs={}, vllm_data_parallel_size=1, stop_words=[], vllm_enable_lora=False, lora_rank=8, vllm_server_group_port=None, enable_flattened_weight_sync=True, async_generate=False, structured_outputs_regex=None, sleep_level=0, move_model_batches=None, offload_optimizer=False, offload_model=False, wandb_log_unique_prompts=None, cosine_min_len_value_wrong=-0.5, cosine_max_len_value_wrong=0.0, cosine_min_len_value_correct=1.0, cosine_max_len_value_correct=0.5, cosine_max_len=256, repetition_n_grams=3, repetition_max_penalty=-1.0, reward_model=None, reward_model_plugin=None, chord_sft_dataset=[], chord_sft_per_device_train_batch_size=None, chord_enable_phi_function=False, chord_mu_warmup_steps=None, chord_mu_decay_steps=None, chord_mu_peak=None, chord_mu_valley=None, multi_turn_scheduler=None, max_turns=None, completion_length_limit_scope='per_round', vllm_server_pass_dataset=False, dynamic_sample=True, max_resample_times=4, overlong_filter=True, soft_max_length=None, soft_cache_length=None, log_entropy=False, tau_pos=1.0, tau_neg=1.05, advantage_estimator='grpo', kl_in_reward=False, dataset_shuffle=True, rollout_importance_sampling_mode=None, rollout_importance_sampling_threshold=2.0, log_rollout_offpolicy_metrics=False, off_policy_sequence_mask_delta=None)"
502
+ }
lora/lora-stage3/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:226e26b176c37ed7c792adfd2fe4136f95d2fdb572f9bb695f787161e8da0faa
3
+ size 99183201
lora/lora-stage3/rng_state_0.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e6aa29f654dcff45f4d494e85fba95c300e2ba77360edeca5a3899f79909e7ce
3
+ size 14725
lora/lora-stage3/rng_state_1.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:47db367bb33a2abe8e3e662eec69e0be4925b4a0a64b5b6c12647bc9faa62ad2
3
+ size 14661
lora/lora-stage3/rng_state_2.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2faaa8f2708f53af7418ce42a1a06c28bcd4f75dce65c528b4f754d02132f5c0
3
+ size 14661
lora/lora-stage3/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:34c1c30cb5e25ddb67ea8d805e6a5f129c70e970f08b80a6781c3286db43ea15
3
+ size 1465
lora/lora-stage3/trainer_state.json ADDED
The diff for this file is too large to render. See raw diff