zhifeixie commited on
Commit
6185df0
·
verified ·
1 Parent(s): bbc585b

Remove staged LoRA folder

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. lora/lora-stage1/README.md +0 -207
  2. lora/lora-stage1/adapter_config.json +0 -38
  3. lora/lora-stage1/adapter_model.safetensors +0 -3
  4. lora/lora-stage1/added_tokens.json +0 -64
  5. lora/lora-stage1/base_model.txt +0 -1
  6. lora/lora-stage1/chat_template.jinja +0 -31
  7. lora/lora-stage1/chat_template.json +0 -1
  8. lora/lora-stage1/config.json +0 -221
  9. lora/lora-stage1/generation_config.json +0 -7
  10. lora/lora-stage1/merges.txt +0 -0
  11. lora/lora-stage1/optimizer.pt +0 -3
  12. lora/lora-stage1/preprocessor_config.json +0 -14
  13. lora/lora-stage1/rng_state_0.pth +0 -3
  14. lora/lora-stage1/rng_state_1.pth +0 -3
  15. lora/lora-stage1/scheduler.pt +0 -3
  16. lora/lora-stage1/special_tokens_map.json +0 -44
  17. lora/lora-stage1/tokenizer.json +0 -3
  18. lora/lora-stage1/tokenizer_config.json +0 -549
  19. lora/lora-stage1/trainer_state.json +0 -774
  20. lora/lora-stage1/vocab.json +0 -0
  21. lora/lora-stage2/README.md +0 -207
  22. lora/lora-stage2/adapter_config.json +0 -38
  23. lora/lora-stage2/adapter_model.safetensors +0 -3
  24. lora/lora-stage2/added_tokens.json +0 -64
  25. lora/lora-stage2/base_model.txt +0 -1
  26. lora/lora-stage2/chat_template.jinja +0 -31
  27. lora/lora-stage2/chat_template.json +0 -1
  28. lora/lora-stage2/config.json +0 -221
  29. lora/lora-stage2/generation_config.json +0 -7
  30. lora/lora-stage2/merged_from_lora.txt +0 -1
  31. lora/lora-stage2/merges.txt +0 -0
  32. lora/lora-stage2/optimizer.pt +0 -3
  33. lora/lora-stage2/preprocessor_config.json +0 -14
  34. lora/lora-stage2/rng_state_0.pth +0 -3
  35. lora/lora-stage2/rng_state_1.pth +0 -3
  36. lora/lora-stage2/scheduler.pt +0 -3
  37. lora/lora-stage2/special_tokens_map.json +0 -44
  38. lora/lora-stage2/tokenizer.json +0 -3
  39. lora/lora-stage2/tokenizer_config.json +0 -549
  40. lora/lora-stage2/trainer_state.json +0 -0
  41. lora/lora-stage2/vocab.json +0 -0
  42. lora/lora-stage3/README.md +0 -207
  43. lora/lora-stage3/adapter_config.json +0 -38
  44. lora/lora-stage3/adapter_model.safetensors +0 -3
  45. lora/lora-stage3/additional_config.json +0 -1
  46. lora/lora-stage3/args.json +0 -502
  47. lora/lora-stage3/optimizer.pt +0 -3
  48. lora/lora-stage3/rng_state_0.pth +0 -3
  49. lora/lora-stage3/rng_state_1.pth +0 -3
  50. lora/lora-stage3/rng_state_2.pth +0 -3
lora/lora-stage1/README.md DELETED
@@ -1,207 +0,0 @@
1
- ---
2
- base_model: ''
3
- library_name: peft
4
- pipeline_tag: text-generation
5
- tags:
6
- - 'base_model:adapter:'
7
- - lora
8
- - transformers
9
- ---
10
-
11
- # Model Card for Model ID
12
-
13
- <!-- Provide a quick summary of what the model is/does. -->
14
-
15
-
16
-
17
- ## Model Details
18
-
19
- ### Model Description
20
-
21
- <!-- Provide a longer summary of what this model is. -->
22
-
23
-
24
-
25
- - **Developed by:** [More Information Needed]
26
- - **Funded by [optional]:** [More Information Needed]
27
- - **Shared by [optional]:** [More Information Needed]
28
- - **Model type:** [More Information Needed]
29
- - **Language(s) (NLP):** [More Information Needed]
30
- - **License:** [More Information Needed]
31
- - **Finetuned from model [optional]:** [More Information Needed]
32
-
33
- ### Model Sources [optional]
34
-
35
- <!-- Provide the basic links for the model. -->
36
-
37
- - **Repository:** [More Information Needed]
38
- - **Paper [optional]:** [More Information Needed]
39
- - **Demo [optional]:** [More Information Needed]
40
-
41
- ## Uses
42
-
43
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
-
45
- ### Direct Use
46
-
47
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
-
49
- [More Information Needed]
50
-
51
- ### Downstream Use [optional]
52
-
53
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
-
55
- [More Information Needed]
56
-
57
- ### Out-of-Scope Use
58
-
59
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
-
61
- [More Information Needed]
62
-
63
- ## Bias, Risks, and Limitations
64
-
65
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
-
67
- [More Information Needed]
68
-
69
- ### Recommendations
70
-
71
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
-
73
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
-
75
- ## How to Get Started with the Model
76
-
77
- Use the code below to get started with the model.
78
-
79
- [More Information Needed]
80
-
81
- ## Training Details
82
-
83
- ### Training Data
84
-
85
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
-
87
- [More Information Needed]
88
-
89
- ### Training Procedure
90
-
91
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
-
93
- #### Preprocessing [optional]
94
-
95
- [More Information Needed]
96
-
97
-
98
- #### Training Hyperparameters
99
-
100
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
-
102
- #### Speeds, Sizes, Times [optional]
103
-
104
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
-
106
- [More Information Needed]
107
-
108
- ## Evaluation
109
-
110
- <!-- This section describes the evaluation protocols and provides the results. -->
111
-
112
- ### Testing Data, Factors & Metrics
113
-
114
- #### Testing Data
115
-
116
- <!-- This should link to a Dataset Card if possible. -->
117
-
118
- [More Information Needed]
119
-
120
- #### Factors
121
-
122
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
-
124
- [More Information Needed]
125
-
126
- #### Metrics
127
-
128
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
-
130
- [More Information Needed]
131
-
132
- ### Results
133
-
134
- [More Information Needed]
135
-
136
- #### Summary
137
-
138
-
139
-
140
- ## Model Examination [optional]
141
-
142
- <!-- Relevant interpretability work for the model goes here -->
143
-
144
- [More Information Needed]
145
-
146
- ## Environmental Impact
147
-
148
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
-
150
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
-
152
- - **Hardware Type:** [More Information Needed]
153
- - **Hours used:** [More Information Needed]
154
- - **Cloud Provider:** [More Information Needed]
155
- - **Compute Region:** [More Information Needed]
156
- - **Carbon Emitted:** [More Information Needed]
157
-
158
- ## Technical Specifications [optional]
159
-
160
- ### Model Architecture and Objective
161
-
162
- [More Information Needed]
163
-
164
- ### Compute Infrastructure
165
-
166
- [More Information Needed]
167
-
168
- #### Hardware
169
-
170
- [More Information Needed]
171
-
172
- #### Software
173
-
174
- [More Information Needed]
175
-
176
- ## Citation [optional]
177
-
178
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
-
180
- **BibTeX:**
181
-
182
- [More Information Needed]
183
-
184
- **APA:**
185
-
186
- [More Information Needed]
187
-
188
- ## Glossary [optional]
189
-
190
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
-
192
- [More Information Needed]
193
-
194
- ## More Information [optional]
195
-
196
- [More Information Needed]
197
-
198
- ## Model Card Authors [optional]
199
-
200
- [More Information Needed]
201
-
202
- ## Model Card Contact
203
-
204
- [More Information Needed]
205
- ### Framework versions
206
-
207
- - PEFT 0.18.1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
lora/lora-stage1/adapter_config.json DELETED
@@ -1,38 +0,0 @@
1
- {
2
- "alora_invocation_tokens": null,
3
- "alpha_pattern": {},
4
- "arrow_config": null,
5
- "auto_mapping": null,
6
- "base_model_name_or_path": "",
7
- "bias": "none",
8
- "corda_config": null,
9
- "ensure_weight_tying": false,
10
- "eva_config": null,
11
- "exclude_modules": null,
12
- "fan_in_fan_out": false,
13
- "inference_mode": true,
14
- "init_lora_weights": true,
15
- "layer_replication": null,
16
- "layers_pattern": null,
17
- "layers_to_transform": null,
18
- "loftq_config": {},
19
- "lora_alpha": 16,
20
- "lora_bias": false,
21
- "lora_dropout": 0.05,
22
- "megatron_config": null,
23
- "megatron_core": "megatron.core",
24
- "modules_to_save": null,
25
- "peft_type": "LORA",
26
- "peft_version": "0.18.1",
27
- "qalora_group_size": 16,
28
- "r": 8,
29
- "rank_pattern": {},
30
- "revision": null,
31
- "target_modules": "^(audio_tower\\.(conv_out|proj1|proj2)$|audio_tower\\.layers\\.(20|21|22|23)\\..*\\.(q_proj|k_proj|v_proj|out_proj|fc1|fc2)$)",
32
- "target_parameters": null,
33
- "task_type": "CAUSAL_LM",
34
- "trainable_token_indices": null,
35
- "use_dora": false,
36
- "use_qalora": false,
37
- "use_rslora": false
38
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
lora/lora-stage1/adapter_model.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:31052a993cbb582a250886db7dfcc327ab86ee8adc5229882bd48227b892c752
3
- size 1496072
 
 
 
 
lora/lora-stage1/added_tokens.json DELETED
@@ -1,64 +0,0 @@
1
- {
2
- "</think>": 151668,
3
- "</tool_call>": 151658,
4
- "</tool_response>": 151666,
5
- "<asr_text>": 151704,
6
- "<blank10>": 151686,
7
- "<blank11>": 151687,
8
- "<blank12>": 151688,
9
- "<blank13>": 151689,
10
- "<blank14>": 151690,
11
- "<blank15>": 151691,
12
- "<blank16>": 151692,
13
- "<blank17>": 151693,
14
- "<blank18>": 151694,
15
- "<blank19>": 151695,
16
- "<blank1>": 151677,
17
- "<blank20>": 151696,
18
- "<blank21>": 151697,
19
- "<blank22>": 151698,
20
- "<blank23>": 151699,
21
- "<blank24>": 151700,
22
- "<blank25>": 151701,
23
- "<blank26>": 151702,
24
- "<blank27>": 151703,
25
- "<blank2>": 151678,
26
- "<blank3>": 151679,
27
- "<blank4>": 151680,
28
- "<blank5>": 151681,
29
- "<blank6>": 151682,
30
- "<blank7>": 151683,
31
- "<blank8>": 151684,
32
- "<blank9>": 151685,
33
- "<non_speech>": 151675,
34
- "<think>": 151667,
35
- "<tool_call>": 151657,
36
- "<tool_response>": 151665,
37
- "<tts_pad>": 151671,
38
- "<tts_text_bos>": 151672,
39
- "<tts_text_bos_single>": 151674,
40
- "<tts_text_eod>": 151673,
41
- "<|audio_end|>": 151670,
42
- "<|audio_pad|>": 151676,
43
- "<|audio_start|>": 151669,
44
- "<|box_end|>": 151649,
45
- "<|box_start|>": 151648,
46
- "<|endoftext|>": 151643,
47
- "<|file_sep|>": 151664,
48
- "<|fim_middle|>": 151660,
49
- "<|fim_pad|>": 151662,
50
- "<|fim_prefix|>": 151659,
51
- "<|fim_suffix|>": 151661,
52
- "<|im_end|>": 151645,
53
- "<|im_start|>": 151644,
54
- "<|image_pad|>": 151655,
55
- "<|object_ref_end|>": 151647,
56
- "<|object_ref_start|>": 151646,
57
- "<|quad_end|>": 151651,
58
- "<|quad_start|>": 151650,
59
- "<|repo_name|>": 151663,
60
- "<|video_pad|>": 151656,
61
- "<|vision_end|>": 151653,
62
- "<|vision_pad|>": 151654,
63
- "<|vision_start|>": 151652
64
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
lora/lora-stage1/base_model.txt DELETED
@@ -1 +0,0 @@
1
- /data/haobin/pky_train/qwen3/Qwen3-ASR-1.7B
 
 
lora/lora-stage1/chat_template.jinja DELETED
@@ -1,31 +0,0 @@
1
- {%- set ns = namespace(system_text="") -%}
2
- {%- for m in messages -%}
3
- {%- if m.role == 'system' -%}
4
- {%- if m.content is string -%}
5
- {%- set ns.system_text = ns.system_text + m.content -%}
6
- {%- else -%}
7
- {%- for c in m.content -%}
8
- {%- if c.type == 'text' and (c.text is defined) -%}
9
- {%- set ns.system_text = ns.system_text + c.text -%}
10
- {%- endif -%}
11
- {%- endfor -%}
12
- {%- endif -%}
13
- {%- endif -%}
14
- {%- endfor -%}
15
-
16
- {%- set ns2 = namespace(audio_tokens="") -%}
17
- {%- for m in messages -%}
18
- {%- if m.content is not string -%}
19
- {%- for c in m.content -%}
20
- {%- if c.type == 'audio' or ('audio' in c) or ('audio_url' in c) -%}
21
- {%- set ns2.audio_tokens = ns2.audio_tokens + "<|audio_start|><|audio_pad|><|audio_end|>" -%}
22
- {%- endif -%}
23
- {%- endfor -%}
24
- {%- endif -%}
25
- {%- endfor -%}
26
-
27
- {{- '<|im_start|>system\n' + (ns.system_text if ns.system_text is string else '') + '<|im_end|>\n' -}}
28
- {{- '<|im_start|>user\n' + ns2.audio_tokens + '<|im_end|>\n' -}}
29
- {%- if add_generation_prompt -%}
30
- {{- '<|im_start|>assistant\n' -}}
31
- {%- endif -%}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
lora/lora-stage1/chat_template.json DELETED
@@ -1 +0,0 @@
1
- {"chat_template": "{%- set ns = namespace(system_text=\"\") -%}\n{%- for m in messages -%}\n {%- if m.role == 'system' -%}\n {%- if m.content is string -%}\n {%- set ns.system_text = ns.system_text + m.content -%}\n {%- else -%}\n {%- for c in m.content -%}\n {%- if c.type == 'text' and (c.text is defined) -%}\n {%- set ns.system_text = ns.system_text + c.text -%}\n {%- endif -%}\n {%- endfor -%}\n {%- endif -%}\n {%- endif -%}\n{%- endfor -%}\n\n{%- set ns2 = namespace(audio_tokens=\"\") -%}\n{%- for m in messages -%}\n {%- if m.content is not string -%}\n {%- for c in m.content -%}\n {%- if c.type == 'audio' or ('audio' in c) or ('audio_url' in c) -%}\n {%- set ns2.audio_tokens = ns2.audio_tokens + \"<|audio_start|><|audio_pad|><|audio_end|>\" -%}\n {%- endif -%}\n {%- endfor -%}\n {%- endif -%}\n{%- endfor -%}\n\n{{- '<|im_start|>system\\n' + (ns.system_text if ns.system_text is string else '') + '<|im_end|>\\n' -}}\n{{- '<|im_start|>user\\n' + ns2.audio_tokens + '<|im_end|>\\n' -}}\n{%- if add_generation_prompt -%}\n{{- '<|im_start|>assistant\\n' -}}\n{%- endif -%}"}
 
 
lora/lora-stage1/config.json DELETED
@@ -1,221 +0,0 @@
1
- {
2
- "architectures": [
3
- "Qwen3ASRForConditionalGeneration"
4
- ],
5
- "model_type": "qwen3_asr",
6
- "support_languages": [
7
- "Chinese",
8
- "English",
9
- "Cantonese",
10
- "Arabic",
11
- "German",
12
- "French",
13
- "Spanish",
14
- "Portuguese",
15
- "Indonesian",
16
- "Italian",
17
- "Korean",
18
- "Russian",
19
- "Thai",
20
- "Vietnamese",
21
- "Japanese",
22
- "Turkish",
23
- "Hindi",
24
- "Malay",
25
- "Dutch",
26
- "Swedish",
27
- "Danish",
28
- "Finnish",
29
- "Polish",
30
- "Czech",
31
- "Filipino",
32
- "Persian",
33
- "Greek",
34
- "Romanian",
35
- "Hungarian",
36
- "Macedonian"
37
- ],
38
- "thinker_config": {
39
- "model_type": "qwen3_asr",
40
- "architectures": [
41
- "Qwen3ASRForConditionalGeneration"
42
- ],
43
- "audio_config": {
44
- "_name_or_path": "",
45
- "activation_dropout": 0,
46
- "activation_function": "gelu",
47
- "add_cross_attention": false,
48
- "architectures": null,
49
- "attention_dropout": 0,
50
- "bad_words_ids": null,
51
- "begin_suppress_tokens": null,
52
- "bos_token_id": null,
53
- "chunk_size_feed_forward": 0,
54
- "conv_chunksize": 500,
55
- "cross_attention_hidden_size": null,
56
- "d_model": 1024,
57
- "decoder_start_token_id": null,
58
- "diversity_penalty": 0.0,
59
- "do_sample": false,
60
- "downsample_hidden_size": 480,
61
- "dropout": 0,
62
- "dtype": null,
63
- "early_stopping": false,
64
- "encoder_attention_heads": 16,
65
- "encoder_ffn_dim": 4096,
66
- "encoder_layers": 24,
67
- "encoder_no_repeat_ngram_size": 0,
68
- "eos_token_id": null,
69
- "exponential_decay_length_penalty": null,
70
- "finetuning_task": null,
71
- "forced_bos_token_id": null,
72
- "forced_eos_token_id": null,
73
- "id2label": {
74
- "0": "LABEL_0",
75
- "1": "LABEL_1"
76
- },
77
- "initializer_range": 0.02,
78
- "is_decoder": false,
79
- "is_encoder_decoder": false,
80
- "label2id": {
81
- "LABEL_0": 0,
82
- "LABEL_1": 1
83
- },
84
- "length_penalty": 1.0,
85
- "max_length": 20,
86
- "max_source_positions": 1500,
87
- "min_length": 0,
88
- "model_type": "qwen3_asr_audio_encoder",
89
- "n_window": 50,
90
- "n_window_infer": 800,
91
- "no_repeat_ngram_size": 0,
92
- "num_beam_groups": 1,
93
- "num_beams": 1,
94
- "num_hidden_layers": 24,
95
- "num_mel_bins": 128,
96
- "num_return_sequences": 1,
97
- "output_attentions": false,
98
- "output_dim": 2048,
99
- "output_hidden_states": false,
100
- "output_scores": false,
101
- "pad_token_id": null,
102
- "prefix": null,
103
- "problem_type": null,
104
- "pruned_heads": {},
105
- "remove_invalid_values": false,
106
- "repetition_penalty": 1.0,
107
- "return_dict": true,
108
- "return_dict_in_generate": false,
109
- "scale_embedding": false,
110
- "sep_token_id": null,
111
- "suppress_tokens": null,
112
- "task_specific_params": null,
113
- "temperature": 1.0,
114
- "tf_legacy_loss": false,
115
- "tie_encoder_decoder": false,
116
- "tie_word_embeddings": true,
117
- "tokenizer_class": null,
118
- "top_k": 50,
119
- "top_p": 1.0,
120
- "torchscript": false,
121
- "typical_p": 1.0,
122
- "use_bfloat16": false
123
- },
124
- "audio_end_token_id": 151670,
125
- "audio_start_token_id": 151669,
126
- "audio_token_id": 151676,
127
- "dtype": "bfloat16",
128
- "initializer_range": 0.02,
129
- "text_config": {
130
- "_name_or_path": "",
131
- "add_cross_attention": false,
132
- "architectures": null,
133
- "attention_bias": false,
134
- "attention_dropout": 0.0,
135
- "bad_words_ids": null,
136
- "begin_suppress_tokens": null,
137
- "bos_token_id": null,
138
- "chunk_size_feed_forward": 0,
139
- "cross_attention_hidden_size": null,
140
- "decoder_start_token_id": null,
141
- "diversity_penalty": 0.0,
142
- "do_sample": false,
143
- "dtype": null,
144
- "early_stopping": false,
145
- "encoder_no_repeat_ngram_size": 0,
146
- "eos_token_id": null,
147
- "exponential_decay_length_penalty": null,
148
- "finetuning_task": null,
149
- "forced_bos_token_id": null,
150
- "forced_eos_token_id": null,
151
- "head_dim": 128,
152
- "hidden_act": "silu",
153
- "hidden_size": 2048,
154
- "id2label": {
155
- "0": "LABEL_0",
156
- "1": "LABEL_1"
157
- },
158
- "initializer_range": 0.02,
159
- "intermediate_size": 6144,
160
- "is_decoder": false,
161
- "is_encoder_decoder": false,
162
- "label2id": {
163
- "LABEL_0": 0,
164
- "LABEL_1": 1
165
- },
166
- "length_penalty": 1.0,
167
- "max_length": 20,
168
- "max_position_embeddings": 65536,
169
- "min_length": 0,
170
- "model_type": "qwen3",
171
- "no_repeat_ngram_size": 0,
172
- "num_attention_heads": 16,
173
- "num_beam_groups": 1,
174
- "num_beams": 1,
175
- "num_hidden_layers": 28,
176
- "num_key_value_heads": 8,
177
- "num_return_sequences": 1,
178
- "output_attentions": false,
179
- "output_hidden_states": false,
180
- "output_scores": false,
181
- "pad_token_id": null,
182
- "prefix": null,
183
- "problem_type": null,
184
- "pruned_heads": {},
185
- "remove_invalid_values": false,
186
- "repetition_penalty": 1.0,
187
- "return_dict": true,
188
- "return_dict_in_generate": false,
189
- "rms_norm_eps": 1e-06,
190
- "rope_scaling": {
191
- "interleaved": true,
192
- "mrope_interleaved": true,
193
- "mrope_section": [
194
- 24,
195
- 20,
196
- 20
197
- ],
198
- "rope_type": "default",
199
- "type": "default"
200
- },
201
- "rope_theta": 1000000,
202
- "sep_token_id": null,
203
- "suppress_tokens": null,
204
- "task_specific_params": null,
205
- "temperature": 1.0,
206
- "tf_legacy_loss": false,
207
- "tie_encoder_decoder": false,
208
- "tie_word_embeddings": true,
209
- "tokenizer_class": null,
210
- "top_k": 50,
211
- "top_p": 1.0,
212
- "torchscript": false,
213
- "typical_p": 1.0,
214
- "use_bfloat16": false,
215
- "use_cache": true,
216
- "vocab_size": 151936
217
- }
218
- },
219
- "transformers_version": "4.57.6"
220
- }
221
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
lora/lora-stage1/generation_config.json DELETED
@@ -1,7 +0,0 @@
1
- {
2
- "_from_model_config": true,
3
- "eos_token_id": [151643,151645],
4
- "pad_token_id": 151643,
5
- "do_sample": false,
6
- "temperature": 0.000001
7
- }
 
 
 
 
 
 
 
 
lora/lora-stage1/merges.txt DELETED
The diff for this file is too large to render. See raw diff
 
lora/lora-stage1/optimizer.pt DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:edb1941ff96a7dc9ec4447ad24fd82907e8053fa66090d9f91e1fe84d42fde4c
3
- size 3014667
 
 
 
 
lora/lora-stage1/preprocessor_config.json DELETED
@@ -1,14 +0,0 @@
1
- {
2
- "chunk_length": 30,
3
- "dither": 0.0,
4
- "feature_extractor_type": "WhisperFeatureExtractor",
5
- "feature_size": 128,
6
- "hop_length": 160,
7
- "n_fft": 400,
8
- "n_samples": 480000,
9
- "nb_max_frames": 3000,
10
- "padding_side": "right",
11
- "padding_value": 0.0,
12
- "processor_class": "Qwen3ASRProcessor",
13
- "return_attention_mask": true
14
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
lora/lora-stage1/rng_state_0.pth DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:916059f3d5e18a65741db0b5dc2209e8c6aad0736bace4b346dacc3a0ed5408c
3
- size 14917
 
 
 
 
lora/lora-stage1/rng_state_1.pth DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:ed259d907743b5e6197bd66f79158ba04a3fdb590d48d290ac086e406341e1de
3
- size 14917
 
 
 
 
lora/lora-stage1/scheduler.pt DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:e54a63f2bf7b963121e794c245cd0c84f1ea5bad1a8e2686e9c59fa50b56ee1e
3
- size 1465
 
 
 
 
lora/lora-stage1/special_tokens_map.json DELETED
@@ -1,44 +0,0 @@
1
- {
2
- "additional_special_tokens": [
3
- "<|im_start|>",
4
- "<|im_end|>",
5
- "<|object_ref_start|>",
6
- "<|object_ref_end|>",
7
- "<|box_start|>",
8
- "<|box_end|>",
9
- "<|quad_start|>",
10
- "<|quad_end|>",
11
- "<|vision_start|>",
12
- "<|vision_end|>",
13
- "<|vision_pad|>",
14
- "<|image_pad|>",
15
- "<|video_pad|>",
16
- "<|audio_start|>",
17
- "<|audio_end|>",
18
- "<tts_pad>",
19
- "<tts_text_bos>",
20
- "<tts_text_bos_single>",
21
- "<|audio_pad|>"
22
- ],
23
- "audio_bos_token": "<|audio_start|>",
24
- "audio_eos_token": "<|audio_end|>",
25
- "audio_token": "<|audio_pad|>",
26
- "eos_token": {
27
- "content": "<|im_end|>",
28
- "lstrip": false,
29
- "normalized": false,
30
- "rstrip": false,
31
- "single_word": false
32
- },
33
- "image_token": "<|image_pad|>",
34
- "pad_token": {
35
- "content": "<|endoftext|>",
36
- "lstrip": false,
37
- "normalized": false,
38
- "rstrip": false,
39
- "single_word": false
40
- },
41
- "video_token": "<|video_pad|>",
42
- "vision_bos_token": "<|vision_start|>",
43
- "vision_eos_token": "<|vision_end|>"
44
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
lora/lora-stage1/tokenizer.json DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:0499602714160467f2d68b910651d6216020689f1e016be87a2d0019ee3baeab
3
- size 11429499
 
 
 
 
lora/lora-stage1/tokenizer_config.json DELETED
@@ -1,549 +0,0 @@
1
- {
2
- "add_bos_token": false,
3
- "add_prefix_space": false,
4
- "added_tokens_decoder": {
5
- "151643": {
6
- "content": "<|endoftext|>",
7
- "lstrip": false,
8
- "normalized": false,
9
- "rstrip": false,
10
- "single_word": false,
11
- "special": true
12
- },
13
- "151644": {
14
- "content": "<|im_start|>",
15
- "lstrip": false,
16
- "normalized": false,
17
- "rstrip": false,
18
- "single_word": false,
19
- "special": true
20
- },
21
- "151645": {
22
- "content": "<|im_end|>",
23
- "lstrip": false,
24
- "normalized": false,
25
- "rstrip": false,
26
- "single_word": false,
27
- "special": true
28
- },
29
- "151646": {
30
- "content": "<|object_ref_start|>",
31
- "lstrip": false,
32
- "normalized": false,
33
- "rstrip": false,
34
- "single_word": false,
35
- "special": true
36
- },
37
- "151647": {
38
- "content": "<|object_ref_end|>",
39
- "lstrip": false,
40
- "normalized": false,
41
- "rstrip": false,
42
- "single_word": false,
43
- "special": true
44
- },
45
- "151648": {
46
- "content": "<|box_start|>",
47
- "lstrip": false,
48
- "normalized": false,
49
- "rstrip": false,
50
- "single_word": false,
51
- "special": true
52
- },
53
- "151649": {
54
- "content": "<|box_end|>",
55
- "lstrip": false,
56
- "normalized": false,
57
- "rstrip": false,
58
- "single_word": false,
59
- "special": true
60
- },
61
- "151650": {
62
- "content": "<|quad_start|>",
63
- "lstrip": false,
64
- "normalized": false,
65
- "rstrip": false,
66
- "single_word": false,
67
- "special": true
68
- },
69
- "151651": {
70
- "content": "<|quad_end|>",
71
- "lstrip": false,
72
- "normalized": false,
73
- "rstrip": false,
74
- "single_word": false,
75
- "special": true
76
- },
77
- "151652": {
78
- "content": "<|vision_start|>",
79
- "lstrip": false,
80
- "normalized": false,
81
- "rstrip": false,
82
- "single_word": false,
83
- "special": true
84
- },
85
- "151653": {
86
- "content": "<|vision_end|>",
87
- "lstrip": false,
88
- "normalized": false,
89
- "rstrip": false,
90
- "single_word": false,
91
- "special": true
92
- },
93
- "151654": {
94
- "content": "<|vision_pad|>",
95
- "lstrip": false,
96
- "normalized": false,
97
- "rstrip": false,
98
- "single_word": false,
99
- "special": true
100
- },
101
- "151655": {
102
- "content": "<|image_pad|>",
103
- "lstrip": false,
104
- "normalized": false,
105
- "rstrip": false,
106
- "single_word": false,
107
- "special": true
108
- },
109
- "151656": {
110
- "content": "<|video_pad|>",
111
- "lstrip": false,
112
- "normalized": false,
113
- "rstrip": false,
114
- "single_word": false,
115
- "special": true
116
- },
117
- "151657": {
118
- "content": "<tool_call>",
119
- "lstrip": false,
120
- "normalized": false,
121
- "rstrip": false,
122
- "single_word": false,
123
- "special": false
124
- },
125
- "151658": {
126
- "content": "</tool_call>",
127
- "lstrip": false,
128
- "normalized": false,
129
- "rstrip": false,
130
- "single_word": false,
131
- "special": false
132
- },
133
- "151659": {
134
- "content": "<|fim_prefix|>",
135
- "lstrip": false,
136
- "normalized": false,
137
- "rstrip": false,
138
- "single_word": false,
139
- "special": false
140
- },
141
- "151660": {
142
- "content": "<|fim_middle|>",
143
- "lstrip": false,
144
- "normalized": false,
145
- "rstrip": false,
146
- "single_word": false,
147
- "special": false
148
- },
149
- "151661": {
150
- "content": "<|fim_suffix|>",
151
- "lstrip": false,
152
- "normalized": false,
153
- "rstrip": false,
154
- "single_word": false,
155
- "special": false
156
- },
157
- "151662": {
158
- "content": "<|fim_pad|>",
159
- "lstrip": false,
160
- "normalized": false,
161
- "rstrip": false,
162
- "single_word": false,
163
- "special": false
164
- },
165
- "151663": {
166
- "content": "<|repo_name|>",
167
- "lstrip": false,
168
- "normalized": false,
169
- "rstrip": false,
170
- "single_word": false,
171
- "special": false
172
- },
173
- "151664": {
174
- "content": "<|file_sep|>",
175
- "lstrip": false,
176
- "normalized": false,
177
- "rstrip": false,
178
- "single_word": false,
179
- "special": false
180
- },
181
- "151665": {
182
- "content": "<tool_response>",
183
- "lstrip": false,
184
- "normalized": false,
185
- "rstrip": false,
186
- "single_word": false,
187
- "special": false
188
- },
189
- "151666": {
190
- "content": "</tool_response>",
191
- "lstrip": false,
192
- "normalized": false,
193
- "rstrip": false,
194
- "single_word": false,
195
- "special": false
196
- },
197
- "151667": {
198
- "content": "<think>",
199
- "lstrip": false,
200
- "normalized": false,
201
- "rstrip": false,
202
- "single_word": false,
203
- "special": false
204
- },
205
- "151668": {
206
- "content": "</think>",
207
- "lstrip": false,
208
- "normalized": false,
209
- "rstrip": false,
210
- "single_word": false,
211
- "special": false
212
- },
213
- "151669": {
214
- "content": "<|audio_start|>",
215
- "lstrip": false,
216
- "normalized": false,
217
- "rstrip": false,
218
- "single_word": false,
219
- "special": true
220
- },
221
- "151670": {
222
- "content": "<|audio_end|>",
223
- "lstrip": false,
224
- "normalized": false,
225
- "rstrip": false,
226
- "single_word": false,
227
- "special": true
228
- },
229
- "151671": {
230
- "content": "<tts_pad>",
231
- "lstrip": false,
232
- "normalized": false,
233
- "rstrip": false,
234
- "single_word": false,
235
- "special": true
236
- },
237
- "151672": {
238
- "content": "<tts_text_bos>",
239
- "lstrip": false,
240
- "normalized": false,
241
- "rstrip": false,
242
- "single_word": false,
243
- "special": true
244
- },
245
- "151673": {
246
- "content": "<tts_text_eod>",
247
- "lstrip": false,
248
- "normalized": false,
249
- "rstrip": false,
250
- "single_word": false,
251
- "special": true
252
- },
253
- "151674": {
254
- "content": "<tts_text_bos_single>",
255
- "lstrip": false,
256
- "normalized": false,
257
- "rstrip": false,
258
- "single_word": false,
259
- "special": true
260
- },
261
- "151675": {
262
- "content": "<non_speech>",
263
- "lstrip": false,
264
- "normalized": false,
265
- "rstrip": false,
266
- "single_word": false,
267
- "special": false
268
- },
269
- "151676": {
270
- "content": "<|audio_pad|>",
271
- "lstrip": false,
272
- "normalized": false,
273
- "rstrip": false,
274
- "single_word": false,
275
- "special": true
276
- },
277
- "151677": {
278
- "content": "<blank1>",
279
- "lstrip": false,
280
- "normalized": false,
281
- "rstrip": false,
282
- "single_word": false,
283
- "special": true
284
- },
285
- "151678": {
286
- "content": "<blank2>",
287
- "lstrip": false,
288
- "normalized": false,
289
- "rstrip": false,
290
- "single_word": false,
291
- "special": true
292
- },
293
- "151679": {
294
- "content": "<blank3>",
295
- "lstrip": false,
296
- "normalized": false,
297
- "rstrip": false,
298
- "single_word": false,
299
- "special": true
300
- },
301
- "151680": {
302
- "content": "<blank4>",
303
- "lstrip": false,
304
- "normalized": false,
305
- "rstrip": false,
306
- "single_word": false,
307
- "special": true
308
- },
309
- "151681": {
310
- "content": "<blank5>",
311
- "lstrip": false,
312
- "normalized": false,
313
- "rstrip": false,
314
- "single_word": false,
315
- "special": true
316
- },
317
- "151682": {
318
- "content": "<blank6>",
319
- "lstrip": false,
320
- "normalized": false,
321
- "rstrip": false,
322
- "single_word": false,
323
- "special": true
324
- },
325
- "151683": {
326
- "content": "<blank7>",
327
- "lstrip": false,
328
- "normalized": false,
329
- "rstrip": false,
330
- "single_word": false,
331
- "special": true
332
- },
333
- "151684": {
334
- "content": "<blank8>",
335
- "lstrip": false,
336
- "normalized": false,
337
- "rstrip": false,
338
- "single_word": false,
339
- "special": true
340
- },
341
- "151685": {
342
- "content": "<blank9>",
343
- "lstrip": false,
344
- "normalized": false,
345
- "rstrip": false,
346
- "single_word": false,
347
- "special": true
348
- },
349
- "151686": {
350
- "content": "<blank10>",
351
- "lstrip": false,
352
- "normalized": false,
353
- "rstrip": false,
354
- "single_word": false,
355
- "special": true
356
- },
357
- "151687": {
358
- "content": "<blank11>",
359
- "lstrip": false,
360
- "normalized": false,
361
- "rstrip": false,
362
- "single_word": false,
363
- "special": true
364
- },
365
- "151688": {
366
- "content": "<blank12>",
367
- "lstrip": false,
368
- "normalized": false,
369
- "rstrip": false,
370
- "single_word": false,
371
- "special": true
372
- },
373
- "151689": {
374
- "content": "<blank13>",
375
- "lstrip": false,
376
- "normalized": false,
377
- "rstrip": false,
378
- "single_word": false,
379
- "special": true
380
- },
381
- "151690": {
382
- "content": "<blank14>",
383
- "lstrip": false,
384
- "normalized": false,
385
- "rstrip": false,
386
- "single_word": false,
387
- "special": true
388
- },
389
- "151691": {
390
- "content": "<blank15>",
391
- "lstrip": false,
392
- "normalized": false,
393
- "rstrip": false,
394
- "single_word": false,
395
- "special": true
396
- },
397
- "151692": {
398
- "content": "<blank16>",
399
- "lstrip": false,
400
- "normalized": false,
401
- "rstrip": false,
402
- "single_word": false,
403
- "special": true
404
- },
405
- "151693": {
406
- "content": "<blank17>",
407
- "lstrip": false,
408
- "normalized": false,
409
- "rstrip": false,
410
- "single_word": false,
411
- "special": true
412
- },
413
- "151694": {
414
- "content": "<blank18>",
415
- "lstrip": false,
416
- "normalized": false,
417
- "rstrip": false,
418
- "single_word": false,
419
- "special": true
420
- },
421
- "151695": {
422
- "content": "<blank19>",
423
- "lstrip": false,
424
- "normalized": false,
425
- "rstrip": false,
426
- "single_word": false,
427
- "special": true
428
- },
429
- "151696": {
430
- "content": "<blank20>",
431
- "lstrip": false,
432
- "normalized": false,
433
- "rstrip": false,
434
- "single_word": false,
435
- "special": true
436
- },
437
- "151697": {
438
- "content": "<blank21>",
439
- "lstrip": false,
440
- "normalized": false,
441
- "rstrip": false,
442
- "single_word": false,
443
- "special": true
444
- },
445
- "151698": {
446
- "content": "<blank22>",
447
- "lstrip": false,
448
- "normalized": false,
449
- "rstrip": false,
450
- "single_word": false,
451
- "special": true
452
- },
453
- "151699": {
454
- "content": "<blank23>",
455
- "lstrip": false,
456
- "normalized": false,
457
- "rstrip": false,
458
- "single_word": false,
459
- "special": true
460
- },
461
- "151700": {
462
- "content": "<blank24>",
463
- "lstrip": false,
464
- "normalized": false,
465
- "rstrip": false,
466
- "single_word": false,
467
- "special": true
468
- },
469
- "151701": {
470
- "content": "<blank25>",
471
- "lstrip": false,
472
- "normalized": false,
473
- "rstrip": false,
474
- "single_word": false,
475
- "special": true
476
- },
477
- "151702": {
478
- "content": "<blank26>",
479
- "lstrip": false,
480
- "normalized": false,
481
- "rstrip": false,
482
- "single_word": false,
483
- "special": true
484
- },
485
- "151703": {
486
- "content": "<blank27>",
487
- "lstrip": false,
488
- "normalized": false,
489
- "rstrip": false,
490
- "single_word": false,
491
- "special": true
492
- },
493
- "151704": {
494
- "content": "<asr_text>",
495
- "lstrip": false,
496
- "normalized": false,
497
- "rstrip": false,
498
- "single_word": false,
499
- "special": false
500
- }
501
- },
502
- "additional_special_tokens": [
503
- "<|im_start|>",
504
- "<|im_end|>",
505
- "<|object_ref_start|>",
506
- "<|object_ref_end|>",
507
- "<|box_start|>",
508
- "<|box_end|>",
509
- "<|quad_start|>",
510
- "<|quad_end|>",
511
- "<|vision_start|>",
512
- "<|vision_end|>",
513
- "<|vision_pad|>",
514
- "<|image_pad|>",
515
- "<|video_pad|>",
516
- "<|audio_start|>",
517
- "<|audio_end|>",
518
- "<tts_pad>",
519
- "<tts_text_bos>",
520
- "<tts_text_bos_single>",
521
- "<|audio_pad|>"
522
- ],
523
- "audio_bos_token": "<|audio_start|>",
524
- "audio_eos_token": "<|audio_end|>",
525
- "audio_token": "<|audio_pad|>",
526
- "bos_token": null,
527
- "clean_up_tokenization_spaces": false,
528
- "eos_token": "<|im_end|>",
529
- "errors": "replace",
530
- "extra_special_tokens": {
531
- "audio_bos_token": "<|audio_start|>",
532
- "audio_eos_token": "<|audio_end|>",
533
- "audio_token": "<|audio_pad|>",
534
- "image_token": "<|image_pad|>",
535
- "video_token": "<|video_pad|>",
536
- "vision_bos_token": "<|vision_start|>",
537
- "vision_eos_token": "<|vision_end|>"
538
- },
539
- "image_token": "<|image_pad|>",
540
- "model_max_length": 131072,
541
- "pad_token": "<|endoftext|>",
542
- "processor_class": "Qwen3ASRProcessor",
543
- "split_special_tokens": false,
544
- "tokenizer_class": "Qwen2Tokenizer",
545
- "unk_token": null,
546
- "video_token": "<|video_pad|>",
547
- "vision_bos_token": "<|vision_start|>",
548
- "vision_eos_token": "<|vision_end|>"
549
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
lora/lora-stage1/trainer_state.json DELETED
@@ -1,774 +0,0 @@
1
- {
2
- "best_global_step": null,
3
- "best_metric": null,
4
- "best_model_checkpoint": null,
5
- "epoch": 0.29335191228777824,
6
- "eval_steps": 200,
7
- "global_step": 1000,
8
- "is_hyper_param_search": false,
9
- "is_local_process_zero": true,
10
- "is_world_process_zero": true,
11
- "log_history": [
12
- {
13
- "epoch": 0.002933519122877782,
14
- "grad_norm": 31.058988571166992,
15
- "learning_rate": 2.6392961876832844e-08,
16
- "loss": 222.2233,
17
- "step": 10
18
- },
19
- {
20
- "epoch": 0.005867038245755564,
21
- "grad_norm": 29.318532943725586,
22
- "learning_rate": 5.571847507331378e-08,
23
- "loss": 223.4508,
24
- "step": 20
25
- },
26
- {
27
- "epoch": 0.008800557368633347,
28
- "grad_norm": 31.036029815673828,
29
- "learning_rate": 8.504398826979471e-08,
30
- "loss": 223.5497,
31
- "step": 30
32
- },
33
- {
34
- "epoch": 0.011734076491511128,
35
- "grad_norm": 31.80939483642578,
36
- "learning_rate": 1.1436950146627565e-07,
37
- "loss": 218.2694,
38
- "step": 40
39
- },
40
- {
41
- "epoch": 0.014667595614388912,
42
- "grad_norm": 32.80522918701172,
43
- "learning_rate": 1.436950146627566e-07,
44
- "loss": 219.2423,
45
- "step": 50
46
- },
47
- {
48
- "epoch": 0.017601114737266693,
49
- "grad_norm": 32.718772888183594,
50
- "learning_rate": 1.7302052785923753e-07,
51
- "loss": 224.9209,
52
- "step": 60
53
- },
54
- {
55
- "epoch": 0.020534633860144477,
56
- "grad_norm": 30.853660583496094,
57
- "learning_rate": 2.0234604105571846e-07,
58
- "loss": 220.9806,
59
- "step": 70
60
- },
61
- {
62
- "epoch": 0.023468152983022256,
63
- "grad_norm": 31.83987045288086,
64
- "learning_rate": 2.3167155425219938e-07,
65
- "loss": 221.7758,
66
- "step": 80
67
- },
68
- {
69
- "epoch": 0.02640167210590004,
70
- "grad_norm": 33.82211685180664,
71
- "learning_rate": 2.609970674486803e-07,
72
- "loss": 220.2104,
73
- "step": 90
74
- },
75
- {
76
- "epoch": 0.029335191228777823,
77
- "grad_norm": 39.342655181884766,
78
- "learning_rate": 2.903225806451613e-07,
79
- "loss": 223.5162,
80
- "step": 100
81
- },
82
- {
83
- "epoch": 0.032268710351655606,
84
- "grad_norm": 32.44097900390625,
85
- "learning_rate": 3.196480938416422e-07,
86
- "loss": 222.4887,
87
- "step": 110
88
- },
89
- {
90
- "epoch": 0.035202229474533386,
91
- "grad_norm": 30.906185150146484,
92
- "learning_rate": 3.489736070381232e-07,
93
- "loss": 221.0162,
94
- "step": 120
95
- },
96
- {
97
- "epoch": 0.038135748597411166,
98
- "grad_norm": 30.318588256835938,
99
- "learning_rate": 3.7829912023460407e-07,
100
- "loss": 219.1895,
101
- "step": 130
102
- },
103
- {
104
- "epoch": 0.04106926772028895,
105
- "grad_norm": 33.13260269165039,
106
- "learning_rate": 4.0762463343108505e-07,
107
- "loss": 219.1354,
108
- "step": 140
109
- },
110
- {
111
- "epoch": 0.04400278684316673,
112
- "grad_norm": 32.98201370239258,
113
- "learning_rate": 4.36950146627566e-07,
114
- "loss": 221.9035,
115
- "step": 150
116
- },
117
- {
118
- "epoch": 0.04693630596604451,
119
- "grad_norm": 30.733919143676758,
120
- "learning_rate": 4.6627565982404685e-07,
121
- "loss": 219.0109,
122
- "step": 160
123
- },
124
- {
125
- "epoch": 0.0498698250889223,
126
- "grad_norm": 35.68417739868164,
127
- "learning_rate": 4.956011730205278e-07,
128
- "loss": 222.1004,
129
- "step": 170
130
- },
131
- {
132
- "epoch": 0.05280334421180008,
133
- "grad_norm": 34.876121520996094,
134
- "learning_rate": 5.249266862170088e-07,
135
- "loss": 220.137,
136
- "step": 180
137
- },
138
- {
139
- "epoch": 0.055736863334677866,
140
- "grad_norm": 33.82151412963867,
141
- "learning_rate": 5.542521994134897e-07,
142
- "loss": 224.7452,
143
- "step": 190
144
- },
145
- {
146
- "epoch": 0.058670382457555646,
147
- "grad_norm": 36.70476531982422,
148
- "learning_rate": 5.835777126099707e-07,
149
- "loss": 219.8298,
150
- "step": 200
151
- },
152
- {
153
- "epoch": 0.058670382457555646,
154
- "eval_loss": 24.500732421875,
155
- "eval_runtime": 98.9198,
156
- "eval_samples_per_second": 98.019,
157
- "eval_steps_per_second": 6.126,
158
- "step": 200
159
- },
160
- {
161
- "epoch": 0.061603901580433426,
162
- "grad_norm": 34.49006652832031,
163
- "learning_rate": 6.129032258064516e-07,
164
- "loss": 223.5638,
165
- "step": 210
166
- },
167
- {
168
- "epoch": 0.06453742070331121,
169
- "grad_norm": 32.312313079833984,
170
- "learning_rate": 6.422287390029325e-07,
171
- "loss": 225.3921,
172
- "step": 220
173
- },
174
- {
175
- "epoch": 0.06747093982618899,
176
- "grad_norm": 33.46302032470703,
177
- "learning_rate": 6.715542521994134e-07,
178
- "loss": 219.3619,
179
- "step": 230
180
- },
181
- {
182
- "epoch": 0.07040445894906677,
183
- "grad_norm": 47.695858001708984,
184
- "learning_rate": 7.008797653958944e-07,
185
- "loss": 221.6162,
186
- "step": 240
187
- },
188
- {
189
- "epoch": 0.07333797807194456,
190
- "grad_norm": 36.99955368041992,
191
- "learning_rate": 7.302052785923753e-07,
192
- "loss": 224.4357,
193
- "step": 250
194
- },
195
- {
196
- "epoch": 0.07627149719482233,
197
- "grad_norm": 33.713096618652344,
198
- "learning_rate": 7.595307917888563e-07,
199
- "loss": 218.6644,
200
- "step": 260
201
- },
202
- {
203
- "epoch": 0.07920501631770012,
204
- "grad_norm": 36.349666595458984,
205
- "learning_rate": 7.888563049853372e-07,
206
- "loss": 221.4383,
207
- "step": 270
208
- },
209
- {
210
- "epoch": 0.0821385354405779,
211
- "grad_norm": 36.67658615112305,
212
- "learning_rate": 8.181818181818182e-07,
213
- "loss": 221.6365,
214
- "step": 280
215
- },
216
- {
217
- "epoch": 0.08507205456345568,
218
- "grad_norm": 31.31206512451172,
219
- "learning_rate": 8.475073313782992e-07,
220
- "loss": 219.8238,
221
- "step": 290
222
- },
223
- {
224
- "epoch": 0.08800557368633347,
225
- "grad_norm": 33.81391525268555,
226
- "learning_rate": 8.7683284457478e-07,
227
- "loss": 222.3335,
228
- "step": 300
229
- },
230
- {
231
- "epoch": 0.09093909280921125,
232
- "grad_norm": 39.456138610839844,
233
- "learning_rate": 9.061583577712609e-07,
234
- "loss": 225.0574,
235
- "step": 310
236
- },
237
- {
238
- "epoch": 0.09387261193208903,
239
- "grad_norm": 62.84433364868164,
240
- "learning_rate": 9.354838709677418e-07,
241
- "loss": 222.3193,
242
- "step": 320
243
- },
244
- {
245
- "epoch": 0.09680613105496681,
246
- "grad_norm": 37.60541915893555,
247
- "learning_rate": 9.648093841642228e-07,
248
- "loss": 215.4717,
249
- "step": 330
250
- },
251
- {
252
- "epoch": 0.0997396501778446,
253
- "grad_norm": 42.61164855957031,
254
- "learning_rate": 9.941348973607037e-07,
255
- "loss": 220.7702,
256
- "step": 340
257
- },
258
- {
259
- "epoch": 0.10267316930072237,
260
- "grad_norm": 41.35678482055664,
261
- "learning_rate": 9.987648602748184e-07,
262
- "loss": 221.7166,
263
- "step": 350
264
- },
265
- {
266
- "epoch": 0.10560668842360016,
267
- "grad_norm": 41.287208557128906,
268
- "learning_rate": 9.972209356183417e-07,
269
- "loss": 222.1618,
270
- "step": 360
271
- },
272
- {
273
- "epoch": 0.10854020754647795,
274
- "grad_norm": 54.5716667175293,
275
- "learning_rate": 9.956770109618649e-07,
276
- "loss": 221.5792,
277
- "step": 370
278
- },
279
- {
280
- "epoch": 0.11147372666935573,
281
- "grad_norm": 40.734012603759766,
282
- "learning_rate": 9.941330863053883e-07,
283
- "loss": 219.901,
284
- "step": 380
285
- },
286
- {
287
- "epoch": 0.1144072457922335,
288
- "grad_norm": 43.457218170166016,
289
- "learning_rate": 9.925891616489115e-07,
290
- "loss": 223.7378,
291
- "step": 390
292
- },
293
- {
294
- "epoch": 0.11734076491511129,
295
- "grad_norm": 42.917686462402344,
296
- "learning_rate": 9.910452369924347e-07,
297
- "loss": 222.8944,
298
- "step": 400
299
- },
300
- {
301
- "epoch": 0.11734076491511129,
302
- "eval_loss": 24.425460815429688,
303
- "eval_runtime": 94.8923,
304
- "eval_samples_per_second": 102.179,
305
- "eval_steps_per_second": 6.386,
306
- "step": 400
307
- },
308
- {
309
- "epoch": 0.12027428403798908,
310
- "grad_norm": 39.965293884277344,
311
- "learning_rate": 9.89501312335958e-07,
312
- "loss": 220.1472,
313
- "step": 410
314
- },
315
- {
316
- "epoch": 0.12320780316086685,
317
- "grad_norm": 45.19244384765625,
318
- "learning_rate": 9.879573876794812e-07,
319
- "loss": 224.2056,
320
- "step": 420
321
- },
322
- {
323
- "epoch": 0.12614132228374464,
324
- "grad_norm": 41.27251434326172,
325
- "learning_rate": 9.864134630230044e-07,
326
- "loss": 217.4446,
327
- "step": 430
328
- },
329
- {
330
- "epoch": 0.12907484140662243,
331
- "grad_norm": 49.71922302246094,
332
- "learning_rate": 9.848695383665276e-07,
333
- "loss": 220.2578,
334
- "step": 440
335
- },
336
- {
337
- "epoch": 0.1320083605295002,
338
- "grad_norm": 65.56668853759766,
339
- "learning_rate": 9.833256137100508e-07,
340
- "loss": 221.2077,
341
- "step": 450
342
- },
343
- {
344
- "epoch": 0.13494187965237797,
345
- "grad_norm": 41.73335266113281,
346
- "learning_rate": 9.817816890535742e-07,
347
- "loss": 219.8,
348
- "step": 460
349
- },
350
- {
351
- "epoch": 0.13787539877525576,
352
- "grad_norm": 51.275718688964844,
353
- "learning_rate": 9.802377643970974e-07,
354
- "loss": 221.3817,
355
- "step": 470
356
- },
357
- {
358
- "epoch": 0.14080891789813355,
359
- "grad_norm": 55.4876823425293,
360
- "learning_rate": 9.786938397406207e-07,
361
- "loss": 216.4269,
362
- "step": 480
363
- },
364
- {
365
- "epoch": 0.14374243702101133,
366
- "grad_norm": 55.99393844604492,
367
- "learning_rate": 9.771499150841439e-07,
368
- "loss": 218.8694,
369
- "step": 490
370
- },
371
- {
372
- "epoch": 0.14667595614388912,
373
- "grad_norm": 95.5741958618164,
374
- "learning_rate": 9.75605990427667e-07,
375
- "loss": 221.4839,
376
- "step": 500
377
- },
378
- {
379
- "epoch": 0.1496094752667669,
380
- "grad_norm": 49.25442886352539,
381
- "learning_rate": 9.740620657711903e-07,
382
- "loss": 222.9515,
383
- "step": 510
384
- },
385
- {
386
- "epoch": 0.15254299438964466,
387
- "grad_norm": 50.05457305908203,
388
- "learning_rate": 9.725181411147135e-07,
389
- "loss": 218.0743,
390
- "step": 520
391
- },
392
- {
393
- "epoch": 0.15547651351252245,
394
- "grad_norm": 43.44709777832031,
395
- "learning_rate": 9.709742164582367e-07,
396
- "loss": 218.6208,
397
- "step": 530
398
- },
399
- {
400
- "epoch": 0.15841003263540024,
401
- "grad_norm": 66.39103698730469,
402
- "learning_rate": 9.694302918017602e-07,
403
- "loss": 219.6833,
404
- "step": 540
405
- },
406
- {
407
- "epoch": 0.16134355175827803,
408
- "grad_norm": 54.72968292236328,
409
- "learning_rate": 9.678863671452832e-07,
410
- "loss": 221.8852,
411
- "step": 550
412
- },
413
- {
414
- "epoch": 0.1642770708811558,
415
- "grad_norm": 65.26374816894531,
416
- "learning_rate": 9.663424424888064e-07,
417
- "loss": 219.7626,
418
- "step": 560
419
- },
420
- {
421
- "epoch": 0.1672105900040336,
422
- "grad_norm": 60.0925178527832,
423
- "learning_rate": 9.647985178323296e-07,
424
- "loss": 217.8218,
425
- "step": 570
426
- },
427
- {
428
- "epoch": 0.17014410912691136,
429
- "grad_norm": 47.97535705566406,
430
- "learning_rate": 9.63254593175853e-07,
431
- "loss": 217.9315,
432
- "step": 580
433
- },
434
- {
435
- "epoch": 0.17307762824978914,
436
- "grad_norm": 53.61656951904297,
437
- "learning_rate": 9.617106685193762e-07,
438
- "loss": 219.2269,
439
- "step": 590
440
- },
441
- {
442
- "epoch": 0.17601114737266693,
443
- "grad_norm": 52.75293731689453,
444
- "learning_rate": 9.601667438628995e-07,
445
- "loss": 216.867,
446
- "step": 600
447
- },
448
- {
449
- "epoch": 0.17601114737266693,
450
- "eval_loss": 24.240764617919922,
451
- "eval_runtime": 97.5766,
452
- "eval_samples_per_second": 99.368,
453
- "eval_steps_per_second": 6.211,
454
- "step": 600
455
- },
456
- {
457
- "epoch": 0.17894466649554472,
458
- "grad_norm": 59.573219299316406,
459
- "learning_rate": 9.586228192064227e-07,
460
- "loss": 213.2538,
461
- "step": 610
462
- },
463
- {
464
- "epoch": 0.1818781856184225,
465
- "grad_norm": 113.46548461914062,
466
- "learning_rate": 9.570788945499459e-07,
467
- "loss": 218.0255,
468
- "step": 620
469
- },
470
- {
471
- "epoch": 0.1848117047413003,
472
- "grad_norm": 119.12982177734375,
473
- "learning_rate": 9.55534969893469e-07,
474
- "loss": 216.9313,
475
- "step": 630
476
- },
477
- {
478
- "epoch": 0.18774522386417805,
479
- "grad_norm": 54.008338928222656,
480
- "learning_rate": 9.539910452369923e-07,
481
- "loss": 220.8365,
482
- "step": 640
483
- },
484
- {
485
- "epoch": 0.19067874298705584,
486
- "grad_norm": 59.56270217895508,
487
- "learning_rate": 9.524471205805155e-07,
488
- "loss": 218.8136,
489
- "step": 650
490
- },
491
- {
492
- "epoch": 0.19361226210993362,
493
- "grad_norm": 52.067115783691406,
494
- "learning_rate": 9.509031959240389e-07,
495
- "loss": 220.6164,
496
- "step": 660
497
- },
498
- {
499
- "epoch": 0.1965457812328114,
500
- "grad_norm": 60.61309051513672,
501
- "learning_rate": 9.493592712675621e-07,
502
- "loss": 217.9881,
503
- "step": 670
504
- },
505
- {
506
- "epoch": 0.1994793003556892,
507
- "grad_norm": 49.88456726074219,
508
- "learning_rate": 9.478153466110853e-07,
509
- "loss": 217.0137,
510
- "step": 680
511
- },
512
- {
513
- "epoch": 0.20241281947856699,
514
- "grad_norm": 49.28492736816406,
515
- "learning_rate": 9.462714219546085e-07,
516
- "loss": 212.2388,
517
- "step": 690
518
- },
519
- {
520
- "epoch": 0.20534633860144474,
521
- "grad_norm": 55.44947814941406,
522
- "learning_rate": 9.447274972981318e-07,
523
- "loss": 221.7097,
524
- "step": 700
525
- },
526
- {
527
- "epoch": 0.20827985772432253,
528
- "grad_norm": 47.7352409362793,
529
- "learning_rate": 9.43183572641655e-07,
530
- "loss": 217.8991,
531
- "step": 710
532
- },
533
- {
534
- "epoch": 0.21121337684720032,
535
- "grad_norm": 56.91552734375,
536
- "learning_rate": 9.416396479851782e-07,
537
- "loss": 216.618,
538
- "step": 720
539
- },
540
- {
541
- "epoch": 0.2141468959700781,
542
- "grad_norm": 50.68717575073242,
543
- "learning_rate": 9.400957233287015e-07,
544
- "loss": 217.7346,
545
- "step": 730
546
- },
547
- {
548
- "epoch": 0.2170804150929559,
549
- "grad_norm": 75.52225494384766,
550
- "learning_rate": 9.385517986722248e-07,
551
- "loss": 215.9344,
552
- "step": 740
553
- },
554
- {
555
- "epoch": 0.22001393421583368,
556
- "grad_norm": 74.4793472290039,
557
- "learning_rate": 9.37007874015748e-07,
558
- "loss": 222.193,
559
- "step": 750
560
- },
561
- {
562
- "epoch": 0.22294745333871147,
563
- "grad_norm": 58.30630111694336,
564
- "learning_rate": 9.354639493592712e-07,
565
- "loss": 215.5639,
566
- "step": 760
567
- },
568
- {
569
- "epoch": 0.22588097246158922,
570
- "grad_norm": 52.7680778503418,
571
- "learning_rate": 9.339200247027944e-07,
572
- "loss": 219.3169,
573
- "step": 770
574
- },
575
- {
576
- "epoch": 0.228814491584467,
577
- "grad_norm": 51.10957717895508,
578
- "learning_rate": 9.323761000463177e-07,
579
- "loss": 213.7119,
580
- "step": 780
581
- },
582
- {
583
- "epoch": 0.2317480107073448,
584
- "grad_norm": 96.71678161621094,
585
- "learning_rate": 9.30832175389841e-07,
586
- "loss": 216.2126,
587
- "step": 790
588
- },
589
- {
590
- "epoch": 0.23468152983022258,
591
- "grad_norm": 59.496395111083984,
592
- "learning_rate": 9.292882507333642e-07,
593
- "loss": 220.2937,
594
- "step": 800
595
- },
596
- {
597
- "epoch": 0.23468152983022258,
598
- "eval_loss": 24.050508499145508,
599
- "eval_runtime": 98.6094,
600
- "eval_samples_per_second": 98.327,
601
- "eval_steps_per_second": 6.145,
602
- "step": 800
603
- },
604
- {
605
- "epoch": 0.23761504895310037,
606
- "grad_norm": 115.57308959960938,
607
- "learning_rate": 9.277443260768874e-07,
608
- "loss": 214.0267,
609
- "step": 810
610
- },
611
- {
612
- "epoch": 0.24054856807597816,
613
- "grad_norm": 58.29754638671875,
614
- "learning_rate": 9.262004014204107e-07,
615
- "loss": 219.2819,
616
- "step": 820
617
- },
618
- {
619
- "epoch": 0.24348208719885592,
620
- "grad_norm": 137.19517517089844,
621
- "learning_rate": 9.246564767639339e-07,
622
- "loss": 217.8361,
623
- "step": 830
624
- },
625
- {
626
- "epoch": 0.2464156063217337,
627
- "grad_norm": 62.34098434448242,
628
- "learning_rate": 9.23112552107457e-07,
629
- "loss": 217.4855,
630
- "step": 840
631
- },
632
- {
633
- "epoch": 0.2493491254446115,
634
- "grad_norm": 57.445247650146484,
635
- "learning_rate": 9.215686274509803e-07,
636
- "loss": 217.8953,
637
- "step": 850
638
- },
639
- {
640
- "epoch": 0.2522826445674893,
641
- "grad_norm": 61.09876251220703,
642
- "learning_rate": 9.200247027945036e-07,
643
- "loss": 215.2011,
644
- "step": 860
645
- },
646
- {
647
- "epoch": 0.25521616369036704,
648
- "grad_norm": 59.176513671875,
649
- "learning_rate": 9.184807781380268e-07,
650
- "loss": 217.2304,
651
- "step": 870
652
- },
653
- {
654
- "epoch": 0.25814968281324485,
655
- "grad_norm": 52.66059494018555,
656
- "learning_rate": 9.1693685348155e-07,
657
- "loss": 218.234,
658
- "step": 880
659
- },
660
- {
661
- "epoch": 0.2610832019361226,
662
- "grad_norm": 98.39973449707031,
663
- "learning_rate": 9.153929288250732e-07,
664
- "loss": 214.297,
665
- "step": 890
666
- },
667
- {
668
- "epoch": 0.2640167210590004,
669
- "grad_norm": 72.08065795898438,
670
- "learning_rate": 9.138490041685965e-07,
671
- "loss": 217.044,
672
- "step": 900
673
- },
674
- {
675
- "epoch": 0.2669502401818782,
676
- "grad_norm": 59.712371826171875,
677
- "learning_rate": 9.123050795121198e-07,
678
- "loss": 215.2483,
679
- "step": 910
680
- },
681
- {
682
- "epoch": 0.26988375930475594,
683
- "grad_norm": 64.43281555175781,
684
- "learning_rate": 9.10761154855643e-07,
685
- "loss": 211.9948,
686
- "step": 920
687
- },
688
- {
689
- "epoch": 0.27281727842763376,
690
- "grad_norm": 61.78029251098633,
691
- "learning_rate": 9.092172301991662e-07,
692
- "loss": 217.2441,
693
- "step": 930
694
- },
695
- {
696
- "epoch": 0.2757507975505115,
697
- "grad_norm": 68.14164733886719,
698
- "learning_rate": 9.076733055426895e-07,
699
- "loss": 214.7014,
700
- "step": 940
701
- },
702
- {
703
- "epoch": 0.27868431667338933,
704
- "grad_norm": 61.65287399291992,
705
- "learning_rate": 9.061293808862127e-07,
706
- "loss": 212.859,
707
- "step": 950
708
- },
709
- {
710
- "epoch": 0.2816178357962671,
711
- "grad_norm": 64.0514144897461,
712
- "learning_rate": 9.045854562297359e-07,
713
- "loss": 217.1946,
714
- "step": 960
715
- },
716
- {
717
- "epoch": 0.2845513549191449,
718
- "grad_norm": 91.87364959716797,
719
- "learning_rate": 9.030415315732592e-07,
720
- "loss": 215.7542,
721
- "step": 970
722
- },
723
- {
724
- "epoch": 0.28748487404202266,
725
- "grad_norm": 54.730316162109375,
726
- "learning_rate": 9.014976069167825e-07,
727
- "loss": 218.0408,
728
- "step": 980
729
- },
730
- {
731
- "epoch": 0.2904183931649004,
732
- "grad_norm": 56.43712615966797,
733
- "learning_rate": 8.999536822603057e-07,
734
- "loss": 212.8671,
735
- "step": 990
736
- },
737
- {
738
- "epoch": 0.29335191228777824,
739
- "grad_norm": 59.28590393066406,
740
- "learning_rate": 8.984097576038289e-07,
741
- "loss": 215.822,
742
- "step": 1000
743
- },
744
- {
745
- "epoch": 0.29335191228777824,
746
- "eval_loss": 23.851858139038086,
747
- "eval_runtime": 96.4448,
748
- "eval_samples_per_second": 100.534,
749
- "eval_steps_per_second": 6.283,
750
- "step": 1000
751
- }
752
- ],
753
- "logging_steps": 10,
754
- "max_steps": 6818,
755
- "num_input_tokens_seen": 0,
756
- "num_train_epochs": 2,
757
- "save_steps": 200,
758
- "stateful_callbacks": {
759
- "TrainerControl": {
760
- "args": {
761
- "should_epoch_stop": false,
762
- "should_evaluate": false,
763
- "should_log": false,
764
- "should_save": true,
765
- "should_training_stop": false
766
- },
767
- "attributes": {}
768
- }
769
- },
770
- "total_flos": 3.503007404654592e+17,
771
- "train_batch_size": 8,
772
- "trial_name": null,
773
- "trial_params": null
774
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
lora/lora-stage1/vocab.json DELETED
The diff for this file is too large to render. See raw diff
 
lora/lora-stage2/README.md DELETED
@@ -1,207 +0,0 @@
1
- ---
2
- base_model: ''
3
- library_name: peft
4
- pipeline_tag: text-generation
5
- tags:
6
- - 'base_model:adapter:'
7
- - lora
8
- - transformers
9
- ---
10
-
11
- # Model Card for Model ID
12
-
13
- <!-- Provide a quick summary of what the model is/does. -->
14
-
15
-
16
-
17
- ## Model Details
18
-
19
- ### Model Description
20
-
21
- <!-- Provide a longer summary of what this model is. -->
22
-
23
-
24
-
25
- - **Developed by:** [More Information Needed]
26
- - **Funded by [optional]:** [More Information Needed]
27
- - **Shared by [optional]:** [More Information Needed]
28
- - **Model type:** [More Information Needed]
29
- - **Language(s) (NLP):** [More Information Needed]
30
- - **License:** [More Information Needed]
31
- - **Finetuned from model [optional]:** [More Information Needed]
32
-
33
- ### Model Sources [optional]
34
-
35
- <!-- Provide the basic links for the model. -->
36
-
37
- - **Repository:** [More Information Needed]
38
- - **Paper [optional]:** [More Information Needed]
39
- - **Demo [optional]:** [More Information Needed]
40
-
41
- ## Uses
42
-
43
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
-
45
- ### Direct Use
46
-
47
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
-
49
- [More Information Needed]
50
-
51
- ### Downstream Use [optional]
52
-
53
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
-
55
- [More Information Needed]
56
-
57
- ### Out-of-Scope Use
58
-
59
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
-
61
- [More Information Needed]
62
-
63
- ## Bias, Risks, and Limitations
64
-
65
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
-
67
- [More Information Needed]
68
-
69
- ### Recommendations
70
-
71
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
-
73
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
-
75
- ## How to Get Started with the Model
76
-
77
- Use the code below to get started with the model.
78
-
79
- [More Information Needed]
80
-
81
- ## Training Details
82
-
83
- ### Training Data
84
-
85
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
-
87
- [More Information Needed]
88
-
89
- ### Training Procedure
90
-
91
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
-
93
- #### Preprocessing [optional]
94
-
95
- [More Information Needed]
96
-
97
-
98
- #### Training Hyperparameters
99
-
100
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
-
102
- #### Speeds, Sizes, Times [optional]
103
-
104
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
-
106
- [More Information Needed]
107
-
108
- ## Evaluation
109
-
110
- <!-- This section describes the evaluation protocols and provides the results. -->
111
-
112
- ### Testing Data, Factors & Metrics
113
-
114
- #### Testing Data
115
-
116
- <!-- This should link to a Dataset Card if possible. -->
117
-
118
- [More Information Needed]
119
-
120
- #### Factors
121
-
122
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
-
124
- [More Information Needed]
125
-
126
- #### Metrics
127
-
128
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
-
130
- [More Information Needed]
131
-
132
- ### Results
133
-
134
- [More Information Needed]
135
-
136
- #### Summary
137
-
138
-
139
-
140
- ## Model Examination [optional]
141
-
142
- <!-- Relevant interpretability work for the model goes here -->
143
-
144
- [More Information Needed]
145
-
146
- ## Environmental Impact
147
-
148
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
-
150
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
-
152
- - **Hardware Type:** [More Information Needed]
153
- - **Hours used:** [More Information Needed]
154
- - **Cloud Provider:** [More Information Needed]
155
- - **Compute Region:** [More Information Needed]
156
- - **Carbon Emitted:** [More Information Needed]
157
-
158
- ## Technical Specifications [optional]
159
-
160
- ### Model Architecture and Objective
161
-
162
- [More Information Needed]
163
-
164
- ### Compute Infrastructure
165
-
166
- [More Information Needed]
167
-
168
- #### Hardware
169
-
170
- [More Information Needed]
171
-
172
- #### Software
173
-
174
- [More Information Needed]
175
-
176
- ## Citation [optional]
177
-
178
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
-
180
- **BibTeX:**
181
-
182
- [More Information Needed]
183
-
184
- **APA:**
185
-
186
- [More Information Needed]
187
-
188
- ## Glossary [optional]
189
-
190
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
-
192
- [More Information Needed]
193
-
194
- ## More Information [optional]
195
-
196
- [More Information Needed]
197
-
198
- ## Model Card Authors [optional]
199
-
200
- [More Information Needed]
201
-
202
- ## Model Card Contact
203
-
204
- [More Information Needed]
205
- ### Framework versions
206
-
207
- - PEFT 0.18.1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
lora/lora-stage2/adapter_config.json DELETED
@@ -1,38 +0,0 @@
1
- {
2
- "alora_invocation_tokens": null,
3
- "alpha_pattern": {},
4
- "arrow_config": null,
5
- "auto_mapping": null,
6
- "base_model_name_or_path": "",
7
- "bias": "none",
8
- "corda_config": null,
9
- "ensure_weight_tying": false,
10
- "eva_config": null,
11
- "exclude_modules": null,
12
- "fan_in_fan_out": false,
13
- "inference_mode": true,
14
- "init_lora_weights": true,
15
- "layer_replication": null,
16
- "layers_pattern": null,
17
- "layers_to_transform": null,
18
- "loftq_config": {},
19
- "lora_alpha": 16,
20
- "lora_bias": false,
21
- "lora_dropout": 0.05,
22
- "megatron_config": null,
23
- "megatron_core": "megatron.core",
24
- "modules_to_save": null,
25
- "peft_type": "LORA",
26
- "peft_version": "0.18.1",
27
- "qalora_group_size": 16,
28
- "r": 8,
29
- "rank_pattern": {},
30
- "revision": null,
31
- "target_modules": "^(audio_tower\\.(conv_out|proj1|proj2)$|audio_tower\\.layers\\.\\d+\\..*\\.(q_proj|k_proj|v_proj|out_proj|fc1|fc2)$|model\\.layers\\.\\d+\\..*\\.(q_proj|k_proj|v_proj|o_proj|gate_proj|up_proj|down_proj)$)",
32
- "target_parameters": null,
33
- "task_type": "CAUSAL_LM",
34
- "trainable_token_indices": null,
35
- "use_dora": false,
36
- "use_qalora": false,
37
- "use_rslora": false
38
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
lora/lora-stage2/adapter_model.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:dd4baa6a45645b280fdddb3c722186d149b5f64daab687300dba0c08373e3962
3
- size 41677888
 
 
 
 
lora/lora-stage2/added_tokens.json DELETED
@@ -1,64 +0,0 @@
1
- {
2
- "</think>": 151668,
3
- "</tool_call>": 151658,
4
- "</tool_response>": 151666,
5
- "<asr_text>": 151704,
6
- "<blank10>": 151686,
7
- "<blank11>": 151687,
8
- "<blank12>": 151688,
9
- "<blank13>": 151689,
10
- "<blank14>": 151690,
11
- "<blank15>": 151691,
12
- "<blank16>": 151692,
13
- "<blank17>": 151693,
14
- "<blank18>": 151694,
15
- "<blank19>": 151695,
16
- "<blank1>": 151677,
17
- "<blank20>": 151696,
18
- "<blank21>": 151697,
19
- "<blank22>": 151698,
20
- "<blank23>": 151699,
21
- "<blank24>": 151700,
22
- "<blank25>": 151701,
23
- "<blank26>": 151702,
24
- "<blank27>": 151703,
25
- "<blank2>": 151678,
26
- "<blank3>": 151679,
27
- "<blank4>": 151680,
28
- "<blank5>": 151681,
29
- "<blank6>": 151682,
30
- "<blank7>": 151683,
31
- "<blank8>": 151684,
32
- "<blank9>": 151685,
33
- "<non_speech>": 151675,
34
- "<think>": 151667,
35
- "<tool_call>": 151657,
36
- "<tool_response>": 151665,
37
- "<tts_pad>": 151671,
38
- "<tts_text_bos>": 151672,
39
- "<tts_text_bos_single>": 151674,
40
- "<tts_text_eod>": 151673,
41
- "<|audio_end|>": 151670,
42
- "<|audio_pad|>": 151676,
43
- "<|audio_start|>": 151669,
44
- "<|box_end|>": 151649,
45
- "<|box_start|>": 151648,
46
- "<|endoftext|>": 151643,
47
- "<|file_sep|>": 151664,
48
- "<|fim_middle|>": 151660,
49
- "<|fim_pad|>": 151662,
50
- "<|fim_prefix|>": 151659,
51
- "<|fim_suffix|>": 151661,
52
- "<|im_end|>": 151645,
53
- "<|im_start|>": 151644,
54
- "<|image_pad|>": 151655,
55
- "<|object_ref_end|>": 151647,
56
- "<|object_ref_start|>": 151646,
57
- "<|quad_end|>": 151651,
58
- "<|quad_start|>": 151650,
59
- "<|repo_name|>": 151663,
60
- "<|video_pad|>": 151656,
61
- "<|vision_end|>": 151653,
62
- "<|vision_pad|>": 151654,
63
- "<|vision_start|>": 151652
64
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
lora/lora-stage2/base_model.txt DELETED
@@ -1 +0,0 @@
1
- /data/haobin/pky_train/qwen3/Qwen3-ASR-1.7B
 
 
lora/lora-stage2/chat_template.jinja DELETED
@@ -1,31 +0,0 @@
1
- {%- set ns = namespace(system_text="") -%}
2
- {%- for m in messages -%}
3
- {%- if m.role == 'system' -%}
4
- {%- if m.content is string -%}
5
- {%- set ns.system_text = ns.system_text + m.content -%}
6
- {%- else -%}
7
- {%- for c in m.content -%}
8
- {%- if c.type == 'text' and (c.text is defined) -%}
9
- {%- set ns.system_text = ns.system_text + c.text -%}
10
- {%- endif -%}
11
- {%- endfor -%}
12
- {%- endif -%}
13
- {%- endif -%}
14
- {%- endfor -%}
15
-
16
- {%- set ns2 = namespace(audio_tokens="") -%}
17
- {%- for m in messages -%}
18
- {%- if m.content is not string -%}
19
- {%- for c in m.content -%}
20
- {%- if c.type == 'audio' or ('audio' in c) or ('audio_url' in c) -%}
21
- {%- set ns2.audio_tokens = ns2.audio_tokens + "<|audio_start|><|audio_pad|><|audio_end|>" -%}
22
- {%- endif -%}
23
- {%- endfor -%}
24
- {%- endif -%}
25
- {%- endfor -%}
26
-
27
- {{- '<|im_start|>system\n' + (ns.system_text if ns.system_text is string else '') + '<|im_end|>\n' -}}
28
- {{- '<|im_start|>user\n' + ns2.audio_tokens + '<|im_end|>\n' -}}
29
- {%- if add_generation_prompt -%}
30
- {{- '<|im_start|>assistant\n' -}}
31
- {%- endif -%}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
lora/lora-stage2/chat_template.json DELETED
@@ -1 +0,0 @@
1
- {"chat_template": "{%- set ns = namespace(system_text=\"\") -%}\n{%- for m in messages -%}\n {%- if m.role == 'system' -%}\n {%- if m.content is string -%}\n {%- set ns.system_text = ns.system_text + m.content -%}\n {%- else -%}\n {%- for c in m.content -%}\n {%- if c.type == 'text' and (c.text is defined) -%}\n {%- set ns.system_text = ns.system_text + c.text -%}\n {%- endif -%}\n {%- endfor -%}\n {%- endif -%}\n {%- endif -%}\n{%- endfor -%}\n\n{%- set ns2 = namespace(audio_tokens=\"\") -%}\n{%- for m in messages -%}\n {%- if m.content is not string -%}\n {%- for c in m.content -%}\n {%- if c.type == 'audio' or ('audio' in c) or ('audio_url' in c) -%}\n {%- set ns2.audio_tokens = ns2.audio_tokens + \"<|audio_start|><|audio_pad|><|audio_end|>\" -%}\n {%- endif -%}\n {%- endfor -%}\n {%- endif -%}\n{%- endfor -%}\n\n{{- '<|im_start|>system\\n' + (ns.system_text if ns.system_text is string else '') + '<|im_end|>\\n' -}}\n{{- '<|im_start|>user\\n' + ns2.audio_tokens + '<|im_end|>\\n' -}}\n{%- if add_generation_prompt -%}\n{{- '<|im_start|>assistant\\n' -}}\n{%- endif -%}"}
 
 
lora/lora-stage2/config.json DELETED
@@ -1,221 +0,0 @@
1
- {
2
- "architectures": [
3
- "Qwen3ASRForConditionalGeneration"
4
- ],
5
- "model_type": "qwen3_asr",
6
- "support_languages": [
7
- "Chinese",
8
- "English",
9
- "Cantonese",
10
- "Arabic",
11
- "German",
12
- "French",
13
- "Spanish",
14
- "Portuguese",
15
- "Indonesian",
16
- "Italian",
17
- "Korean",
18
- "Russian",
19
- "Thai",
20
- "Vietnamese",
21
- "Japanese",
22
- "Turkish",
23
- "Hindi",
24
- "Malay",
25
- "Dutch",
26
- "Swedish",
27
- "Danish",
28
- "Finnish",
29
- "Polish",
30
- "Czech",
31
- "Filipino",
32
- "Persian",
33
- "Greek",
34
- "Romanian",
35
- "Hungarian",
36
- "Macedonian"
37
- ],
38
- "thinker_config": {
39
- "model_type": "qwen3_asr",
40
- "architectures": [
41
- "Qwen3ASRForConditionalGeneration"
42
- ],
43
- "audio_config": {
44
- "_name_or_path": "",
45
- "activation_dropout": 0,
46
- "activation_function": "gelu",
47
- "add_cross_attention": false,
48
- "architectures": null,
49
- "attention_dropout": 0,
50
- "bad_words_ids": null,
51
- "begin_suppress_tokens": null,
52
- "bos_token_id": null,
53
- "chunk_size_feed_forward": 0,
54
- "conv_chunksize": 500,
55
- "cross_attention_hidden_size": null,
56
- "d_model": 1024,
57
- "decoder_start_token_id": null,
58
- "diversity_penalty": 0.0,
59
- "do_sample": false,
60
- "downsample_hidden_size": 480,
61
- "dropout": 0,
62
- "dtype": null,
63
- "early_stopping": false,
64
- "encoder_attention_heads": 16,
65
- "encoder_ffn_dim": 4096,
66
- "encoder_layers": 24,
67
- "encoder_no_repeat_ngram_size": 0,
68
- "eos_token_id": null,
69
- "exponential_decay_length_penalty": null,
70
- "finetuning_task": null,
71
- "forced_bos_token_id": null,
72
- "forced_eos_token_id": null,
73
- "id2label": {
74
- "0": "LABEL_0",
75
- "1": "LABEL_1"
76
- },
77
- "initializer_range": 0.02,
78
- "is_decoder": false,
79
- "is_encoder_decoder": false,
80
- "label2id": {
81
- "LABEL_0": 0,
82
- "LABEL_1": 1
83
- },
84
- "length_penalty": 1.0,
85
- "max_length": 20,
86
- "max_source_positions": 1500,
87
- "min_length": 0,
88
- "model_type": "qwen3_asr_audio_encoder",
89
- "n_window": 50,
90
- "n_window_infer": 800,
91
- "no_repeat_ngram_size": 0,
92
- "num_beam_groups": 1,
93
- "num_beams": 1,
94
- "num_hidden_layers": 24,
95
- "num_mel_bins": 128,
96
- "num_return_sequences": 1,
97
- "output_attentions": false,
98
- "output_dim": 2048,
99
- "output_hidden_states": false,
100
- "output_scores": false,
101
- "pad_token_id": null,
102
- "prefix": null,
103
- "problem_type": null,
104
- "pruned_heads": {},
105
- "remove_invalid_values": false,
106
- "repetition_penalty": 1.0,
107
- "return_dict": true,
108
- "return_dict_in_generate": false,
109
- "scale_embedding": false,
110
- "sep_token_id": null,
111
- "suppress_tokens": null,
112
- "task_specific_params": null,
113
- "temperature": 1.0,
114
- "tf_legacy_loss": false,
115
- "tie_encoder_decoder": false,
116
- "tie_word_embeddings": true,
117
- "tokenizer_class": null,
118
- "top_k": 50,
119
- "top_p": 1.0,
120
- "torchscript": false,
121
- "typical_p": 1.0,
122
- "use_bfloat16": false
123
- },
124
- "audio_end_token_id": 151670,
125
- "audio_start_token_id": 151669,
126
- "audio_token_id": 151676,
127
- "dtype": "bfloat16",
128
- "initializer_range": 0.02,
129
- "text_config": {
130
- "_name_or_path": "",
131
- "add_cross_attention": false,
132
- "architectures": null,
133
- "attention_bias": false,
134
- "attention_dropout": 0.0,
135
- "bad_words_ids": null,
136
- "begin_suppress_tokens": null,
137
- "bos_token_id": null,
138
- "chunk_size_feed_forward": 0,
139
- "cross_attention_hidden_size": null,
140
- "decoder_start_token_id": null,
141
- "diversity_penalty": 0.0,
142
- "do_sample": false,
143
- "dtype": null,
144
- "early_stopping": false,
145
- "encoder_no_repeat_ngram_size": 0,
146
- "eos_token_id": null,
147
- "exponential_decay_length_penalty": null,
148
- "finetuning_task": null,
149
- "forced_bos_token_id": null,
150
- "forced_eos_token_id": null,
151
- "head_dim": 128,
152
- "hidden_act": "silu",
153
- "hidden_size": 2048,
154
- "id2label": {
155
- "0": "LABEL_0",
156
- "1": "LABEL_1"
157
- },
158
- "initializer_range": 0.02,
159
- "intermediate_size": 6144,
160
- "is_decoder": false,
161
- "is_encoder_decoder": false,
162
- "label2id": {
163
- "LABEL_0": 0,
164
- "LABEL_1": 1
165
- },
166
- "length_penalty": 1.0,
167
- "max_length": 20,
168
- "max_position_embeddings": 65536,
169
- "min_length": 0,
170
- "model_type": "qwen3",
171
- "no_repeat_ngram_size": 0,
172
- "num_attention_heads": 16,
173
- "num_beam_groups": 1,
174
- "num_beams": 1,
175
- "num_hidden_layers": 28,
176
- "num_key_value_heads": 8,
177
- "num_return_sequences": 1,
178
- "output_attentions": false,
179
- "output_hidden_states": false,
180
- "output_scores": false,
181
- "pad_token_id": null,
182
- "prefix": null,
183
- "problem_type": null,
184
- "pruned_heads": {},
185
- "remove_invalid_values": false,
186
- "repetition_penalty": 1.0,
187
- "return_dict": true,
188
- "return_dict_in_generate": false,
189
- "rms_norm_eps": 1e-06,
190
- "rope_scaling": {
191
- "interleaved": true,
192
- "mrope_interleaved": true,
193
- "mrope_section": [
194
- 24,
195
- 20,
196
- 20
197
- ],
198
- "rope_type": "default",
199
- "type": "default"
200
- },
201
- "rope_theta": 1000000,
202
- "sep_token_id": null,
203
- "suppress_tokens": null,
204
- "task_specific_params": null,
205
- "temperature": 1.0,
206
- "tf_legacy_loss": false,
207
- "tie_encoder_decoder": false,
208
- "tie_word_embeddings": true,
209
- "tokenizer_class": null,
210
- "top_k": 50,
211
- "top_p": 1.0,
212
- "torchscript": false,
213
- "typical_p": 1.0,
214
- "use_bfloat16": false,
215
- "use_cache": true,
216
- "vocab_size": 151936
217
- }
218
- },
219
- "transformers_version": "4.57.6"
220
- }
221
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
lora/lora-stage2/generation_config.json DELETED
@@ -1,7 +0,0 @@
1
- {
2
- "_from_model_config": true,
3
- "eos_token_id": [151643,151645],
4
- "pad_token_id": 151643,
5
- "do_sample": false,
6
- "temperature": 0.000001
7
- }
 
 
 
 
 
 
 
 
lora/lora-stage2/merged_from_lora.txt DELETED
@@ -1 +0,0 @@
1
- /data/haobin/pky_train/qwen3/out_qwen3-asr-lora-0317_550000_wer3_towerb4+proj_2gpu_bs128/checkpoint-1000
 
 
lora/lora-stage2/merges.txt DELETED
The diff for this file is too large to render. See raw diff
 
lora/lora-stage2/optimizer.pt DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:b26cb33d8c7aefdee4dcd88af58551b5a01e16c9f852a1c1ffb0d1a47e6421b4
3
- size 83695117
 
 
 
 
lora/lora-stage2/preprocessor_config.json DELETED
@@ -1,14 +0,0 @@
1
- {
2
- "chunk_length": 30,
3
- "dither": 0.0,
4
- "feature_extractor_type": "WhisperFeatureExtractor",
5
- "feature_size": 128,
6
- "hop_length": 160,
7
- "n_fft": 400,
8
- "n_samples": 480000,
9
- "nb_max_frames": 3000,
10
- "padding_side": "right",
11
- "padding_value": 0.0,
12
- "processor_class": "Qwen3ASRProcessor",
13
- "return_attention_mask": true
14
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
lora/lora-stage2/rng_state_0.pth DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:de015da1ba6a4dc8cf66420b3b9b378bc07585bfb14a0c37fb50e723424b9768
3
- size 14917
 
 
 
 
lora/lora-stage2/rng_state_1.pth DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:681f2e7cc7c3d884111a86a3bcdeeaea97b22ebf60e4f765788ee5cbeb94e2d9
3
- size 14917
 
 
 
 
lora/lora-stage2/scheduler.pt DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:2a7077d452a1df5790a83102fc7a743c5150e80f24610df63abd069404ebe93a
3
- size 1465
 
 
 
 
lora/lora-stage2/special_tokens_map.json DELETED
@@ -1,44 +0,0 @@
1
- {
2
- "additional_special_tokens": [
3
- "<|im_start|>",
4
- "<|im_end|>",
5
- "<|object_ref_start|>",
6
- "<|object_ref_end|>",
7
- "<|box_start|>",
8
- "<|box_end|>",
9
- "<|quad_start|>",
10
- "<|quad_end|>",
11
- "<|vision_start|>",
12
- "<|vision_end|>",
13
- "<|vision_pad|>",
14
- "<|image_pad|>",
15
- "<|video_pad|>",
16
- "<|audio_start|>",
17
- "<|audio_end|>",
18
- "<tts_pad>",
19
- "<tts_text_bos>",
20
- "<tts_text_bos_single>",
21
- "<|audio_pad|>"
22
- ],
23
- "audio_bos_token": "<|audio_start|>",
24
- "audio_eos_token": "<|audio_end|>",
25
- "audio_token": "<|audio_pad|>",
26
- "eos_token": {
27
- "content": "<|im_end|>",
28
- "lstrip": false,
29
- "normalized": false,
30
- "rstrip": false,
31
- "single_word": false
32
- },
33
- "image_token": "<|image_pad|>",
34
- "pad_token": {
35
- "content": "<|endoftext|>",
36
- "lstrip": false,
37
- "normalized": false,
38
- "rstrip": false,
39
- "single_word": false
40
- },
41
- "video_token": "<|video_pad|>",
42
- "vision_bos_token": "<|vision_start|>",
43
- "vision_eos_token": "<|vision_end|>"
44
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
lora/lora-stage2/tokenizer.json DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:0499602714160467f2d68b910651d6216020689f1e016be87a2d0019ee3baeab
3
- size 11429499
 
 
 
 
lora/lora-stage2/tokenizer_config.json DELETED
@@ -1,549 +0,0 @@
1
- {
2
- "add_bos_token": false,
3
- "add_prefix_space": false,
4
- "added_tokens_decoder": {
5
- "151643": {
6
- "content": "<|endoftext|>",
7
- "lstrip": false,
8
- "normalized": false,
9
- "rstrip": false,
10
- "single_word": false,
11
- "special": true
12
- },
13
- "151644": {
14
- "content": "<|im_start|>",
15
- "lstrip": false,
16
- "normalized": false,
17
- "rstrip": false,
18
- "single_word": false,
19
- "special": true
20
- },
21
- "151645": {
22
- "content": "<|im_end|>",
23
- "lstrip": false,
24
- "normalized": false,
25
- "rstrip": false,
26
- "single_word": false,
27
- "special": true
28
- },
29
- "151646": {
30
- "content": "<|object_ref_start|>",
31
- "lstrip": false,
32
- "normalized": false,
33
- "rstrip": false,
34
- "single_word": false,
35
- "special": true
36
- },
37
- "151647": {
38
- "content": "<|object_ref_end|>",
39
- "lstrip": false,
40
- "normalized": false,
41
- "rstrip": false,
42
- "single_word": false,
43
- "special": true
44
- },
45
- "151648": {
46
- "content": "<|box_start|>",
47
- "lstrip": false,
48
- "normalized": false,
49
- "rstrip": false,
50
- "single_word": false,
51
- "special": true
52
- },
53
- "151649": {
54
- "content": "<|box_end|>",
55
- "lstrip": false,
56
- "normalized": false,
57
- "rstrip": false,
58
- "single_word": false,
59
- "special": true
60
- },
61
- "151650": {
62
- "content": "<|quad_start|>",
63
- "lstrip": false,
64
- "normalized": false,
65
- "rstrip": false,
66
- "single_word": false,
67
- "special": true
68
- },
69
- "151651": {
70
- "content": "<|quad_end|>",
71
- "lstrip": false,
72
- "normalized": false,
73
- "rstrip": false,
74
- "single_word": false,
75
- "special": true
76
- },
77
- "151652": {
78
- "content": "<|vision_start|>",
79
- "lstrip": false,
80
- "normalized": false,
81
- "rstrip": false,
82
- "single_word": false,
83
- "special": true
84
- },
85
- "151653": {
86
- "content": "<|vision_end|>",
87
- "lstrip": false,
88
- "normalized": false,
89
- "rstrip": false,
90
- "single_word": false,
91
- "special": true
92
- },
93
- "151654": {
94
- "content": "<|vision_pad|>",
95
- "lstrip": false,
96
- "normalized": false,
97
- "rstrip": false,
98
- "single_word": false,
99
- "special": true
100
- },
101
- "151655": {
102
- "content": "<|image_pad|>",
103
- "lstrip": false,
104
- "normalized": false,
105
- "rstrip": false,
106
- "single_word": false,
107
- "special": true
108
- },
109
- "151656": {
110
- "content": "<|video_pad|>",
111
- "lstrip": false,
112
- "normalized": false,
113
- "rstrip": false,
114
- "single_word": false,
115
- "special": true
116
- },
117
- "151657": {
118
- "content": "<tool_call>",
119
- "lstrip": false,
120
- "normalized": false,
121
- "rstrip": false,
122
- "single_word": false,
123
- "special": false
124
- },
125
- "151658": {
126
- "content": "</tool_call>",
127
- "lstrip": false,
128
- "normalized": false,
129
- "rstrip": false,
130
- "single_word": false,
131
- "special": false
132
- },
133
- "151659": {
134
- "content": "<|fim_prefix|>",
135
- "lstrip": false,
136
- "normalized": false,
137
- "rstrip": false,
138
- "single_word": false,
139
- "special": false
140
- },
141
- "151660": {
142
- "content": "<|fim_middle|>",
143
- "lstrip": false,
144
- "normalized": false,
145
- "rstrip": false,
146
- "single_word": false,
147
- "special": false
148
- },
149
- "151661": {
150
- "content": "<|fim_suffix|>",
151
- "lstrip": false,
152
- "normalized": false,
153
- "rstrip": false,
154
- "single_word": false,
155
- "special": false
156
- },
157
- "151662": {
158
- "content": "<|fim_pad|>",
159
- "lstrip": false,
160
- "normalized": false,
161
- "rstrip": false,
162
- "single_word": false,
163
- "special": false
164
- },
165
- "151663": {
166
- "content": "<|repo_name|>",
167
- "lstrip": false,
168
- "normalized": false,
169
- "rstrip": false,
170
- "single_word": false,
171
- "special": false
172
- },
173
- "151664": {
174
- "content": "<|file_sep|>",
175
- "lstrip": false,
176
- "normalized": false,
177
- "rstrip": false,
178
- "single_word": false,
179
- "special": false
180
- },
181
- "151665": {
182
- "content": "<tool_response>",
183
- "lstrip": false,
184
- "normalized": false,
185
- "rstrip": false,
186
- "single_word": false,
187
- "special": false
188
- },
189
- "151666": {
190
- "content": "</tool_response>",
191
- "lstrip": false,
192
- "normalized": false,
193
- "rstrip": false,
194
- "single_word": false,
195
- "special": false
196
- },
197
- "151667": {
198
- "content": "<think>",
199
- "lstrip": false,
200
- "normalized": false,
201
- "rstrip": false,
202
- "single_word": false,
203
- "special": false
204
- },
205
- "151668": {
206
- "content": "</think>",
207
- "lstrip": false,
208
- "normalized": false,
209
- "rstrip": false,
210
- "single_word": false,
211
- "special": false
212
- },
213
- "151669": {
214
- "content": "<|audio_start|>",
215
- "lstrip": false,
216
- "normalized": false,
217
- "rstrip": false,
218
- "single_word": false,
219
- "special": true
220
- },
221
- "151670": {
222
- "content": "<|audio_end|>",
223
- "lstrip": false,
224
- "normalized": false,
225
- "rstrip": false,
226
- "single_word": false,
227
- "special": true
228
- },
229
- "151671": {
230
- "content": "<tts_pad>",
231
- "lstrip": false,
232
- "normalized": false,
233
- "rstrip": false,
234
- "single_word": false,
235
- "special": true
236
- },
237
- "151672": {
238
- "content": "<tts_text_bos>",
239
- "lstrip": false,
240
- "normalized": false,
241
- "rstrip": false,
242
- "single_word": false,
243
- "special": true
244
- },
245
- "151673": {
246
- "content": "<tts_text_eod>",
247
- "lstrip": false,
248
- "normalized": false,
249
- "rstrip": false,
250
- "single_word": false,
251
- "special": true
252
- },
253
- "151674": {
254
- "content": "<tts_text_bos_single>",
255
- "lstrip": false,
256
- "normalized": false,
257
- "rstrip": false,
258
- "single_word": false,
259
- "special": true
260
- },
261
- "151675": {
262
- "content": "<non_speech>",
263
- "lstrip": false,
264
- "normalized": false,
265
- "rstrip": false,
266
- "single_word": false,
267
- "special": false
268
- },
269
- "151676": {
270
- "content": "<|audio_pad|>",
271
- "lstrip": false,
272
- "normalized": false,
273
- "rstrip": false,
274
- "single_word": false,
275
- "special": true
276
- },
277
- "151677": {
278
- "content": "<blank1>",
279
- "lstrip": false,
280
- "normalized": false,
281
- "rstrip": false,
282
- "single_word": false,
283
- "special": true
284
- },
285
- "151678": {
286
- "content": "<blank2>",
287
- "lstrip": false,
288
- "normalized": false,
289
- "rstrip": false,
290
- "single_word": false,
291
- "special": true
292
- },
293
- "151679": {
294
- "content": "<blank3>",
295
- "lstrip": false,
296
- "normalized": false,
297
- "rstrip": false,
298
- "single_word": false,
299
- "special": true
300
- },
301
- "151680": {
302
- "content": "<blank4>",
303
- "lstrip": false,
304
- "normalized": false,
305
- "rstrip": false,
306
- "single_word": false,
307
- "special": true
308
- },
309
- "151681": {
310
- "content": "<blank5>",
311
- "lstrip": false,
312
- "normalized": false,
313
- "rstrip": false,
314
- "single_word": false,
315
- "special": true
316
- },
317
- "151682": {
318
- "content": "<blank6>",
319
- "lstrip": false,
320
- "normalized": false,
321
- "rstrip": false,
322
- "single_word": false,
323
- "special": true
324
- },
325
- "151683": {
326
- "content": "<blank7>",
327
- "lstrip": false,
328
- "normalized": false,
329
- "rstrip": false,
330
- "single_word": false,
331
- "special": true
332
- },
333
- "151684": {
334
- "content": "<blank8>",
335
- "lstrip": false,
336
- "normalized": false,
337
- "rstrip": false,
338
- "single_word": false,
339
- "special": true
340
- },
341
- "151685": {
342
- "content": "<blank9>",
343
- "lstrip": false,
344
- "normalized": false,
345
- "rstrip": false,
346
- "single_word": false,
347
- "special": true
348
- },
349
- "151686": {
350
- "content": "<blank10>",
351
- "lstrip": false,
352
- "normalized": false,
353
- "rstrip": false,
354
- "single_word": false,
355
- "special": true
356
- },
357
- "151687": {
358
- "content": "<blank11>",
359
- "lstrip": false,
360
- "normalized": false,
361
- "rstrip": false,
362
- "single_word": false,
363
- "special": true
364
- },
365
- "151688": {
366
- "content": "<blank12>",
367
- "lstrip": false,
368
- "normalized": false,
369
- "rstrip": false,
370
- "single_word": false,
371
- "special": true
372
- },
373
- "151689": {
374
- "content": "<blank13>",
375
- "lstrip": false,
376
- "normalized": false,
377
- "rstrip": false,
378
- "single_word": false,
379
- "special": true
380
- },
381
- "151690": {
382
- "content": "<blank14>",
383
- "lstrip": false,
384
- "normalized": false,
385
- "rstrip": false,
386
- "single_word": false,
387
- "special": true
388
- },
389
- "151691": {
390
- "content": "<blank15>",
391
- "lstrip": false,
392
- "normalized": false,
393
- "rstrip": false,
394
- "single_word": false,
395
- "special": true
396
- },
397
- "151692": {
398
- "content": "<blank16>",
399
- "lstrip": false,
400
- "normalized": false,
401
- "rstrip": false,
402
- "single_word": false,
403
- "special": true
404
- },
405
- "151693": {
406
- "content": "<blank17>",
407
- "lstrip": false,
408
- "normalized": false,
409
- "rstrip": false,
410
- "single_word": false,
411
- "special": true
412
- },
413
- "151694": {
414
- "content": "<blank18>",
415
- "lstrip": false,
416
- "normalized": false,
417
- "rstrip": false,
418
- "single_word": false,
419
- "special": true
420
- },
421
- "151695": {
422
- "content": "<blank19>",
423
- "lstrip": false,
424
- "normalized": false,
425
- "rstrip": false,
426
- "single_word": false,
427
- "special": true
428
- },
429
- "151696": {
430
- "content": "<blank20>",
431
- "lstrip": false,
432
- "normalized": false,
433
- "rstrip": false,
434
- "single_word": false,
435
- "special": true
436
- },
437
- "151697": {
438
- "content": "<blank21>",
439
- "lstrip": false,
440
- "normalized": false,
441
- "rstrip": false,
442
- "single_word": false,
443
- "special": true
444
- },
445
- "151698": {
446
- "content": "<blank22>",
447
- "lstrip": false,
448
- "normalized": false,
449
- "rstrip": false,
450
- "single_word": false,
451
- "special": true
452
- },
453
- "151699": {
454
- "content": "<blank23>",
455
- "lstrip": false,
456
- "normalized": false,
457
- "rstrip": false,
458
- "single_word": false,
459
- "special": true
460
- },
461
- "151700": {
462
- "content": "<blank24>",
463
- "lstrip": false,
464
- "normalized": false,
465
- "rstrip": false,
466
- "single_word": false,
467
- "special": true
468
- },
469
- "151701": {
470
- "content": "<blank25>",
471
- "lstrip": false,
472
- "normalized": false,
473
- "rstrip": false,
474
- "single_word": false,
475
- "special": true
476
- },
477
- "151702": {
478
- "content": "<blank26>",
479
- "lstrip": false,
480
- "normalized": false,
481
- "rstrip": false,
482
- "single_word": false,
483
- "special": true
484
- },
485
- "151703": {
486
- "content": "<blank27>",
487
- "lstrip": false,
488
- "normalized": false,
489
- "rstrip": false,
490
- "single_word": false,
491
- "special": true
492
- },
493
- "151704": {
494
- "content": "<asr_text>",
495
- "lstrip": false,
496
- "normalized": false,
497
- "rstrip": false,
498
- "single_word": false,
499
- "special": false
500
- }
501
- },
502
- "additional_special_tokens": [
503
- "<|im_start|>",
504
- "<|im_end|>",
505
- "<|object_ref_start|>",
506
- "<|object_ref_end|>",
507
- "<|box_start|>",
508
- "<|box_end|>",
509
- "<|quad_start|>",
510
- "<|quad_end|>",
511
- "<|vision_start|>",
512
- "<|vision_end|>",
513
- "<|vision_pad|>",
514
- "<|image_pad|>",
515
- "<|video_pad|>",
516
- "<|audio_start|>",
517
- "<|audio_end|>",
518
- "<tts_pad>",
519
- "<tts_text_bos>",
520
- "<tts_text_bos_single>",
521
- "<|audio_pad|>"
522
- ],
523
- "audio_bos_token": "<|audio_start|>",
524
- "audio_eos_token": "<|audio_end|>",
525
- "audio_token": "<|audio_pad|>",
526
- "bos_token": null,
527
- "clean_up_tokenization_spaces": false,
528
- "eos_token": "<|im_end|>",
529
- "errors": "replace",
530
- "extra_special_tokens": {
531
- "audio_bos_token": "<|audio_start|>",
532
- "audio_eos_token": "<|audio_end|>",
533
- "audio_token": "<|audio_pad|>",
534
- "image_token": "<|image_pad|>",
535
- "video_token": "<|video_pad|>",
536
- "vision_bos_token": "<|vision_start|>",
537
- "vision_eos_token": "<|vision_end|>"
538
- },
539
- "image_token": "<|image_pad|>",
540
- "model_max_length": 131072,
541
- "pad_token": "<|endoftext|>",
542
- "processor_class": "Qwen3ASRProcessor",
543
- "split_special_tokens": false,
544
- "tokenizer_class": "Qwen2Tokenizer",
545
- "unk_token": null,
546
- "video_token": "<|video_pad|>",
547
- "vision_bos_token": "<|vision_start|>",
548
- "vision_eos_token": "<|vision_end|>"
549
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
lora/lora-stage2/trainer_state.json DELETED
The diff for this file is too large to render. See raw diff
 
lora/lora-stage2/vocab.json DELETED
The diff for this file is too large to render. See raw diff
 
lora/lora-stage3/README.md DELETED
@@ -1,207 +0,0 @@
1
- ---
2
- base_model: /data/haobin/Qwen3-ASR/Qwen3-ASR-1.7B-lora-merged
3
- library_name: peft
4
- pipeline_tag: text-generation
5
- tags:
6
- - base_model:adapter:/data/haobin/Qwen3-ASR/Qwen3-ASR-1.7B-lora-merged
7
- - lora
8
- - transformers
9
- ---
10
-
11
- # Model Card for Model ID
12
-
13
- <!-- Provide a quick summary of what the model is/does. -->
14
-
15
-
16
-
17
- ## Model Details
18
-
19
- ### Model Description
20
-
21
- <!-- Provide a longer summary of what this model is. -->
22
-
23
-
24
-
25
- - **Developed by:** [More Information Needed]
26
- - **Funded by [optional]:** [More Information Needed]
27
- - **Shared by [optional]:** [More Information Needed]
28
- - **Model type:** [More Information Needed]
29
- - **Language(s) (NLP):** [More Information Needed]
30
- - **License:** [More Information Needed]
31
- - **Finetuned from model [optional]:** [More Information Needed]
32
-
33
- ### Model Sources [optional]
34
-
35
- <!-- Provide the basic links for the model. -->
36
-
37
- - **Repository:** [More Information Needed]
38
- - **Paper [optional]:** [More Information Needed]
39
- - **Demo [optional]:** [More Information Needed]
40
-
41
- ## Uses
42
-
43
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
-
45
- ### Direct Use
46
-
47
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
-
49
- [More Information Needed]
50
-
51
- ### Downstream Use [optional]
52
-
53
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
-
55
- [More Information Needed]
56
-
57
- ### Out-of-Scope Use
58
-
59
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
-
61
- [More Information Needed]
62
-
63
- ## Bias, Risks, and Limitations
64
-
65
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
-
67
- [More Information Needed]
68
-
69
- ### Recommendations
70
-
71
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
-
73
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
-
75
- ## How to Get Started with the Model
76
-
77
- Use the code below to get started with the model.
78
-
79
- [More Information Needed]
80
-
81
- ## Training Details
82
-
83
- ### Training Data
84
-
85
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
-
87
- [More Information Needed]
88
-
89
- ### Training Procedure
90
-
91
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
-
93
- #### Preprocessing [optional]
94
-
95
- [More Information Needed]
96
-
97
-
98
- #### Training Hyperparameters
99
-
100
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
-
102
- #### Speeds, Sizes, Times [optional]
103
-
104
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
-
106
- [More Information Needed]
107
-
108
- ## Evaluation
109
-
110
- <!-- This section describes the evaluation protocols and provides the results. -->
111
-
112
- ### Testing Data, Factors & Metrics
113
-
114
- #### Testing Data
115
-
116
- <!-- This should link to a Dataset Card if possible. -->
117
-
118
- [More Information Needed]
119
-
120
- #### Factors
121
-
122
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
-
124
- [More Information Needed]
125
-
126
- #### Metrics
127
-
128
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
-
130
- [More Information Needed]
131
-
132
- ### Results
133
-
134
- [More Information Needed]
135
-
136
- #### Summary
137
-
138
-
139
-
140
- ## Model Examination [optional]
141
-
142
- <!-- Relevant interpretability work for the model goes here -->
143
-
144
- [More Information Needed]
145
-
146
- ## Environmental Impact
147
-
148
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
-
150
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
-
152
- - **Hardware Type:** [More Information Needed]
153
- - **Hours used:** [More Information Needed]
154
- - **Cloud Provider:** [More Information Needed]
155
- - **Compute Region:** [More Information Needed]
156
- - **Carbon Emitted:** [More Information Needed]
157
-
158
- ## Technical Specifications [optional]
159
-
160
- ### Model Architecture and Objective
161
-
162
- [More Information Needed]
163
-
164
- ### Compute Infrastructure
165
-
166
- [More Information Needed]
167
-
168
- #### Hardware
169
-
170
- [More Information Needed]
171
-
172
- #### Software
173
-
174
- [More Information Needed]
175
-
176
- ## Citation [optional]
177
-
178
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
-
180
- **BibTeX:**
181
-
182
- [More Information Needed]
183
-
184
- **APA:**
185
-
186
- [More Information Needed]
187
-
188
- ## Glossary [optional]
189
-
190
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
-
192
- [More Information Needed]
193
-
194
- ## More Information [optional]
195
-
196
- [More Information Needed]
197
-
198
- ## Model Card Authors [optional]
199
-
200
- [More Information Needed]
201
-
202
- ## Model Card Contact
203
-
204
- [More Information Needed]
205
- ### Framework versions
206
-
207
- - PEFT 0.18.1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
lora/lora-stage3/adapter_config.json DELETED
@@ -1,38 +0,0 @@
1
- {
2
- "alora_invocation_tokens": null,
3
- "alpha_pattern": {},
4
- "arrow_config": null,
5
- "auto_mapping": null,
6
- "base_model_name_or_path": "/data/haobin/Qwen3-ASR/Qwen3-ASR-1.7B-lora-merged",
7
- "bias": "none",
8
- "corda_config": null,
9
- "ensure_weight_tying": false,
10
- "eva_config": null,
11
- "exclude_modules": null,
12
- "fan_in_fan_out": false,
13
- "inference_mode": true,
14
- "init_lora_weights": true,
15
- "layer_replication": null,
16
- "layers_pattern": null,
17
- "layers_to_transform": null,
18
- "loftq_config": {},
19
- "lora_alpha": 32,
20
- "lora_bias": false,
21
- "lora_dropout": 0.05,
22
- "megatron_config": null,
23
- "megatron_core": "megatron.core",
24
- "modules_to_save": [],
25
- "peft_type": "LORA",
26
- "peft_version": "0.18.1",
27
- "qalora_group_size": 16,
28
- "r": 8,
29
- "rank_pattern": {},
30
- "revision": null,
31
- "target_modules": "^(thinker\\.model(?=\\.).*\\.(k_proj|q_proj|o_proj|up_proj|down_proj|v_proj|gate_proj)|(?!(thinker.audio_tower.proj1|thinker.audio_tower.proj2))thinker\\.audio_tower(?=\\.).*\\.(fc1|out_proj|proj1|k_proj|q_proj|fc2|proj2|v_proj|conv_out)|thinker\\.audio_tower\\.proj1(?=\\.)|thinker\\.audio_tower\\.proj2(?=\\.))$",
32
- "target_parameters": null,
33
- "task_type": "CAUSAL_LM",
34
- "trainable_token_indices": null,
35
- "use_dora": false,
36
- "use_qalora": false,
37
- "use_rslora": false
38
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
lora/lora-stage3/adapter_model.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:1b5507acb5bb51851c4db58504cac3dcc748dbc37210b986e93624bb9ea115b0
3
- size 49395592
 
 
 
 
lora/lora-stage3/additional_config.json DELETED
@@ -1 +0,0 @@
1
- {"lora_dtype": null, "lorap_lr_ratio": null, "lorap_emb_lr": 1e-06}
 
 
lora/lora-stage3/args.json DELETED
@@ -1,502 +0,0 @@
1
- {
2
- "output_dir": "/data/haobin/pky_train/qwen3_swift/pky_out/qwen3asr_dapo_reward5_3x8x8_12gen_3GPU/v3-20260410-173721",
3
- "overwrite_output_dir": false,
4
- "do_train": false,
5
- "do_eval": false,
6
- "do_predict": false,
7
- "eval_strategy": "steps",
8
- "prediction_loss_only": false,
9
- "per_device_train_batch_size": 4,
10
- "per_device_eval_batch_size": 4,
11
- "per_gpu_train_batch_size": null,
12
- "per_gpu_eval_batch_size": null,
13
- "gradient_accumulation_steps": 16,
14
- "eval_accumulation_steps": null,
15
- "eval_delay": 0,
16
- "torch_empty_cache_steps": null,
17
- "learning_rate": 5e-05,
18
- "weight_decay": 0.1,
19
- "adam_beta1": 0.9,
20
- "adam_beta2": 0.95,
21
- "adam_epsilon": 1e-08,
22
- "max_grad_norm": 1.0,
23
- "num_train_epochs": 3.0,
24
- "max_steps": -1,
25
- "lr_scheduler_type": "cosine",
26
- "lr_scheduler_kwargs": null,
27
- "warmup_ratio": 0.03,
28
- "warmup_steps": 0,
29
- "log_level": "passive",
30
- "log_level_replica": "warning",
31
- "log_on_each_node": true,
32
- "logging_dir": "/data/haobin/pky_train/qwen3_swift/pky_out/qwen3asr_dapo_reward5_3x8x8_12gen_3GPU/v3-20260410-173721/runs",
33
- "logging_strategy": "steps",
34
- "logging_first_step": true,
35
- "logging_steps": 5,
36
- "logging_nan_inf_filter": true,
37
- "save_strategy": "steps",
38
- "save_steps": 20.0,
39
- "save_total_limit": null,
40
- "save_safetensors": true,
41
- "save_on_each_node": false,
42
- "save_only_model": false,
43
- "restore_callback_states_from_checkpoint": false,
44
- "no_cuda": false,
45
- "use_cpu": false,
46
- "use_mps_device": false,
47
- "seed": 42,
48
- "data_seed": 42,
49
- "jit_mode_eval": false,
50
- "bf16": true,
51
- "fp16": false,
52
- "fp16_opt_level": "O1",
53
- "half_precision_backend": "auto",
54
- "bf16_full_eval": false,
55
- "fp16_full_eval": false,
56
- "tf32": null,
57
- "local_rank": 0,
58
- "ddp_backend": null,
59
- "tpu_num_cores": null,
60
- "tpu_metrics_debug": false,
61
- "debug": null,
62
- "dataloader_drop_last": false,
63
- "eval_steps": 20.0,
64
- "dataloader_num_workers": null,
65
- "dataloader_prefetch_factor": null,
66
- "past_index": -1,
67
- "run_name": "qwen3asr_dapo_reward5_3x8x8_12gen_3GPU",
68
- "disable_tqdm": null,
69
- "remove_unused_columns": false,
70
- "label_names": null,
71
- "load_best_model_at_end": false,
72
- "metric_for_best_model": "loss",
73
- "greater_is_better": false,
74
- "ignore_data_skip": false,
75
- "fsdp": [],
76
- "fsdp_min_num_params": 0,
77
- "fsdp_config": null,
78
- "fsdp_transformer_layer_cls_to_wrap": null,
79
- "accelerator_config": {
80
- "dispatch_batches": false
81
- },
82
- "parallelism_config": null,
83
- "deepspeed": null,
84
- "label_smoothing_factor": 0.0,
85
- "optim": "adamw_torch_fused",
86
- "optim_args": null,
87
- "adafactor": false,
88
- "group_by_length": false,
89
- "length_column_name": "length",
90
- "report_to": [
91
- "wandb"
92
- ],
93
- "project": "huggingface",
94
- "trackio_space_id": "trackio",
95
- "ddp_find_unused_parameters": null,
96
- "ddp_bucket_cap_mb": null,
97
- "ddp_broadcast_buffers": null,
98
- "dataloader_pin_memory": true,
99
- "dataloader_persistent_workers": false,
100
- "skip_memory_metrics": true,
101
- "use_legacy_prediction_loop": false,
102
- "push_to_hub": false,
103
- "resume_from_checkpoint": null,
104
- "hub_model_id": null,
105
- "hub_strategy": "every_save",
106
- "hub_token": null,
107
- "hub_private_repo": null,
108
- "hub_always_push": false,
109
- "hub_revision": null,
110
- "gradient_checkpointing": true,
111
- "gradient_checkpointing_kwargs": null,
112
- "include_inputs_for_metrics": false,
113
- "include_for_metrics": [],
114
- "eval_do_concat_batches": true,
115
- "fp16_backend": "auto",
116
- "push_to_hub_model_id": null,
117
- "push_to_hub_organization": null,
118
- "push_to_hub_token": null,
119
- "mp_parameters": "",
120
- "auto_find_batch_size": false,
121
- "full_determinism": false,
122
- "torchdynamo": null,
123
- "ray_scope": "last",
124
- "ddp_timeout": 18000000,
125
- "torch_compile": false,
126
- "torch_compile_backend": null,
127
- "torch_compile_mode": null,
128
- "include_tokens_per_second": false,
129
- "include_num_input_tokens_seen": false,
130
- "neftune_noise_alpha": null,
131
- "optim_target_modules": null,
132
- "batch_eval_metrics": false,
133
- "eval_on_start": false,
134
- "use_liger_kernel": false,
135
- "liger_kernel_config": null,
136
- "eval_use_gather_object": false,
137
- "average_tokens_across_devices": true,
138
- "sortish_sampler": false,
139
- "predict_with_generate": false,
140
- "generation_max_length": null,
141
- "generation_num_beams": null,
142
- "generation_config": null,
143
- "tuner_backend": "peft",
144
- "vit_gradient_checkpointing": null,
145
- "router_aux_loss_coef": 0.0,
146
- "enable_dft_loss": false,
147
- "enable_channel_loss": false,
148
- "safe_serialization": true,
149
- "max_shard_size": "5GB",
150
- "check_model": true,
151
- "acc_strategy": "token",
152
- "train_dataloader_shuffle": true,
153
- "max_epochs": null,
154
- "aligner_lr": null,
155
- "vit_lr": null,
156
- "use_logits_to_keep": null,
157
- "ds3_gather_for_generation": true,
158
- "resume_only_model": false,
159
- "optimizer": null,
160
- "loss_type": "dapo",
161
- "eval_metric": null,
162
- "callbacks": [],
163
- "early_stop_interval": null,
164
- "eval_use_evalscope": false,
165
- "eval_dataset": [],
166
- "eval_dataset_args": null,
167
- "eval_limit": null,
168
- "eval_generation_config": null,
169
- "extra_eval_args": null,
170
- "tuner_type": "lora",
171
- "use_galore": false,
172
- "galore_target_modules": null,
173
- "galore_rank": 128,
174
- "galore_update_proj_gap": 50,
175
- "galore_scale": 1.0,
176
- "galore_proj_type": "std",
177
- "galore_optim_per_parameter": false,
178
- "galore_with_embedding": false,
179
- "galore_quantization": false,
180
- "galore_proj_quant": false,
181
- "galore_proj_bits": 4,
182
- "galore_proj_group_size": 256,
183
- "galore_cos_threshold": 0.4,
184
- "galore_gamma_proj": 2,
185
- "galore_queue_size": 5,
186
- "lisa_activated_layers": 0,
187
- "lisa_step_interval": 20,
188
- "use_flash_ckpt": false,
189
- "use_ray": false,
190
- "ray_exp_name": null,
191
- "device_groups": null,
192
- "model": "/data/haobin/Qwen3-ASR/Qwen3-ASR-1.7B-lora-merged",
193
- "model_type": "my_qwen3_asr_rl",
194
- "model_revision": null,
195
- "task_type": "causal_lm",
196
- "torch_dtype": "bfloat16",
197
- "attn_impl": null,
198
- "experts_impl": null,
199
- "new_special_tokens": [],
200
- "num_labels": null,
201
- "problem_type": null,
202
- "rope_scaling": null,
203
- "device_map": null,
204
- "max_memory": {},
205
- "max_model_len": null,
206
- "local_repo_path": null,
207
- "init_strategy": null,
208
- "template": "my_qwen3_asr_rl",
209
- "system": null,
210
- "max_length": 65536,
211
- "truncation_strategy": "delete",
212
- "max_pixels": null,
213
- "agent_template": null,
214
- "norm_bbox": null,
215
- "use_chat_template": true,
216
- "padding_side": "left",
217
- "padding_free": false,
218
- "loss_scale": "last_round",
219
- "sequence_parallel_size": 1,
220
- "template_backend": "swift",
221
- "response_prefix": null,
222
- "enable_thinking": null,
223
- "add_non_thinking_prefix": true,
224
- "dataset": [
225
- "/data/haobin/batch_process/lora_0323_10w+55w+error+syn_with_domain_train90_targeted_rl_train90_loramerged_basewer_271.jsonl"
226
- ],
227
- "val_dataset": [
228
- "/data/haobin/batch_process/lora_0323_10w+55w+error+syn_with_domain_train90_targeted_rl_val5_sample5p.jsonl"
229
- ],
230
- "cached_dataset": [],
231
- "cached_val_dataset": [],
232
- "split_dataset_ratio": 0.0,
233
- "dataset_num_proc": 1,
234
- "load_from_cache_file": false,
235
- "dataset_shuffle": true,
236
- "val_dataset_shuffle": false,
237
- "streaming": false,
238
- "interleave_prob": null,
239
- "stopping_strategy": "first_exhausted",
240
- "shuffle_buffer_size": 1000,
241
- "download_mode": "reuse_dataset_if_exists",
242
- "columns": {},
243
- "strict": false,
244
- "model_name": null,
245
- "model_author": null,
246
- "custom_dataset_info": [],
247
- "quant_method": null,
248
- "quant_bits": null,
249
- "hqq_axis": null,
250
- "bnb_4bit_compute_dtype": "bfloat16",
251
- "bnb_4bit_quant_type": "nf4",
252
- "bnb_4bit_use_double_quant": true,
253
- "bnb_4bit_quant_storage": null,
254
- "max_new_tokens": 256,
255
- "temperature": 0.5,
256
- "top_k": 50,
257
- "top_p": 0.95,
258
- "repetition_penalty": 1.08,
259
- "num_beams": 1,
260
- "stream": false,
261
- "stop_words": [],
262
- "logprobs": false,
263
- "top_logprobs": null,
264
- "structured_outputs_regex": null,
265
- "train_type": "lora",
266
- "adapters": [],
267
- "external_plugins": [
268
- "/data/haobin/pky_train/qwen3_swift/my_qwen3_asr_dapo_register.py",
269
- "/data/haobin/pky_train/qwen3_swift/qwen3_RL_reward5.py"
270
- ],
271
- "custom_register_path": [],
272
- "model_kwargs": {},
273
- "load_args": false,
274
- "load_data_args": false,
275
- "packing": false,
276
- "packing_length": null,
277
- "packing_num_proc": 1,
278
- "lazy_tokenize": true,
279
- "use_hf": false,
280
- "ignore_args_error": false,
281
- "use_swift_lora": false,
282
- "freeze_parameters": [],
283
- "freeze_parameters_regex": null,
284
- "freeze_parameters_ratio": 0.0,
285
- "trainable_parameters": [],
286
- "trainable_parameters_regex": null,
287
- "freeze_llm": false,
288
- "freeze_vit": false,
289
- "freeze_aligner": false,
290
- "target_modules": [
291
- "all-linear"
292
- ],
293
- "target_regex": null,
294
- "target_parameters": null,
295
- "modules_to_save": [],
296
- "lora_rank": 8,
297
- "lora_alpha": 32,
298
- "lora_dropout": 0.05,
299
- "lora_bias": "none",
300
- "lora_dtype": null,
301
- "lorap_lr_ratio": null,
302
- "use_rslora": false,
303
- "use_dora": false,
304
- "lora_ga_batch_size": 2,
305
- "lora_ga_iters": 2,
306
- "lora_ga_max_length": 1024,
307
- "lora_ga_direction": "ArB2r",
308
- "lora_ga_scale": "stable",
309
- "lora_ga_stable_gamma": 16,
310
- "init_weights": true,
311
- "fourier_n_frequency": 2000,
312
- "fourier_scaling": 300.0,
313
- "boft_block_size": 4,
314
- "boft_block_num": 0,
315
- "boft_n_butterfly_factor": 1,
316
- "boft_dropout": 0.0,
317
- "vera_rank": 256,
318
- "vera_projection_prng_key": 0,
319
- "vera_dropout": 0.0,
320
- "vera_d_initial": 0.1,
321
- "adapter_act": "gelu",
322
- "adapter_length": 128,
323
- "adalora_target_r": 8,
324
- "adalora_init_r": 12,
325
- "adalora_tinit": 0,
326
- "adalora_tfinal": 0,
327
- "adalora_deltaT": 1,
328
- "adalora_beta1": 0.85,
329
- "adalora_beta2": 0.85,
330
- "adalora_orth_reg_weight": 0.5,
331
- "llamapro_num_new_blocks": 4,
332
- "llamapro_num_groups": null,
333
- "reft_layer_key": null,
334
- "reft_layers": null,
335
- "reft_rank": 4,
336
- "reft_intervention_type": "LoreftIntervention",
337
- "reft_args": null,
338
- "swanlab_token": null,
339
- "swanlab_project": "ms-swift",
340
- "swanlab_workspace": null,
341
- "swanlab_exp_name": null,
342
- "swanlab_notification_method": null,
343
- "swanlab_webhook_url": null,
344
- "swanlab_secret": null,
345
- "swanlab_sender_email": null,
346
- "swanlab_receiver_email": null,
347
- "swanlab_smtp_server": null,
348
- "swanlab_smtp_port": null,
349
- "swanlab_email_language": "zh",
350
- "swanlab_mode": "cloud",
351
- "add_version": true,
352
- "create_checkpoint_symlink": false,
353
- "zero_hpz_partition_size": null,
354
- "deepspeed_autotp_size": null,
355
- "reward_model": null,
356
- "reward_adapters": [],
357
- "reward_model_type": null,
358
- "reward_model_revision": null,
359
- "num_ppo_epochs": 4,
360
- "whiten_rewards": false,
361
- "kl_coef": 0.05,
362
- "cliprange": 0.2,
363
- "vf_coef": 0.1,
364
- "cliprange_value": 0.2,
365
- "gamma": 1.0,
366
- "lam": 0.95,
367
- "num_mini_batches": 1,
368
- "local_rollout_forward_batch_size": 64,
369
- "num_sample_generations": 10,
370
- "response_length": 256,
371
- "missing_eos_penalty": null,
372
- "vllm_gpu_memory_utilization": 0.9,
373
- "vllm_tensor_parallel_size": 1,
374
- "vllm_pipeline_parallel_size": 1,
375
- "vllm_enable_expert_parallel": false,
376
- "vllm_max_num_seqs": null,
377
- "vllm_max_model_len": null,
378
- "vllm_disable_custom_all_reduce": true,
379
- "vllm_enforce_eager": false,
380
- "vllm_limit_mm_per_prompt": null,
381
- "vllm_max_lora_rank": 16,
382
- "vllm_enable_prefix_caching": true,
383
- "vllm_use_async_engine": null,
384
- "vllm_quantization": null,
385
- "vllm_reasoning_parser": null,
386
- "vllm_disable_cascade_attn": false,
387
- "vllm_mm_processor_cache_gb": null,
388
- "vllm_speculative_config": null,
389
- "vllm_engine_kwargs": {},
390
- "vllm_data_parallel_size": 1,
391
- "use_vllm": false,
392
- "vllm_mode": null,
393
- "vllm_enable_lora": false,
394
- "vllm_server_base_url": null,
395
- "vllm_server_host": null,
396
- "vllm_server_port": [
397
- 8000
398
- ],
399
- "vllm_server_timeout": 240.0,
400
- "vllm_server_group_port": null,
401
- "enable_flattened_weight_sync": true,
402
- "async_generate": false,
403
- "sleep_level": 0,
404
- "move_model_batches": null,
405
- "offload_optimizer": false,
406
- "offload_model": false,
407
- "wandb_log_unique_prompts": null,
408
- "epsilon": 0.2,
409
- "epsilon_high": 0.28,
410
- "delta": null,
411
- "cosine_min_len_value_wrong": -0.5,
412
- "cosine_max_len_value_wrong": 0.0,
413
- "cosine_min_len_value_correct": 1.0,
414
- "cosine_max_len_value_correct": 0.5,
415
- "cosine_max_len": null,
416
- "repetition_n_grams": 3,
417
- "repetition_max_penalty": -1.0,
418
- "reward_model_plugin": null,
419
- "chord_sft_dataset": [],
420
- "chord_sft_per_device_train_batch_size": null,
421
- "chord_enable_phi_function": false,
422
- "chord_mu_warmup_steps": null,
423
- "chord_mu_decay_steps": null,
424
- "chord_mu_peak": null,
425
- "chord_mu_valley": null,
426
- "sync_ref_model": false,
427
- "ref_model_sync_steps": 512,
428
- "ref_model_mixup_alpha": 0.6,
429
- "multi_turn_scheduler": null,
430
- "max_turns": null,
431
- "completion_length_limit_scope": "per_round",
432
- "vllm_server_pass_dataset": false,
433
- "dynamic_sample": true,
434
- "max_resample_times": 4,
435
- "overlong_filter": true,
436
- "soft_max_length": null,
437
- "soft_cache_length": null,
438
- "scale_rewards": "group",
439
- "log_entropy": false,
440
- "top_entropy_quantile": 1.0,
441
- "importance_sampling_level": "token",
442
- "tau_pos": 1.0,
443
- "tau_neg": 1.05,
444
- "advantage_estimator": "grpo",
445
- "kl_in_reward": false,
446
- "generation_batch_size": 48,
447
- "steps_per_generation": null,
448
- "num_generations_eval": 4,
449
- "rollout_importance_sampling_mode": null,
450
- "rollout_importance_sampling_threshold": 2.0,
451
- "log_rollout_offpolicy_metrics": false,
452
- "off_policy_sequence_mask_delta": null,
453
- "num_generations": 12,
454
- "reward_funcs": [
455
- "asr_wer_hallu_len_v5"
456
- ],
457
- "reward_weights": null,
458
- "log_completions": true,
459
- "num_iterations": 2,
460
- "teacher_model": null,
461
- "teacher_adapters": [],
462
- "teacher_model_type": null,
463
- "teacher_model_revision": null,
464
- "teacher_deepspeed": null,
465
- "teacher_model_server": null,
466
- "rlhf_type": "grpo",
467
- "ref_model": null,
468
- "ref_adapters": [],
469
- "ref_model_type": null,
470
- "ref_model_revision": null,
471
- "beta": 0.04,
472
- "label_smoothing": 0,
473
- "max_completion_length": 256,
474
- "rpo_alpha": null,
475
- "ld_alpha": null,
476
- "discopop_tau": 0.05,
477
- "loss_weights": null,
478
- "cpo_alpha": 1.0,
479
- "simpo_gamma": 1,
480
- "desirable_weight": 1.0,
481
- "undesirable_weight": 1.0,
482
- "center_rewards_coefficient": null,
483
- "sft_alpha": 0,
484
- "lmbda": 0.5,
485
- "seq_kd": false,
486
- "gkd_logits_topk": null,
487
- "offload_teacher_model": false,
488
- "swift_version": "4.0.3",
489
- "ckpt_dir": null,
490
- "rank": 0,
491
- "global_world_size": 3,
492
- "local_world_size": 3,
493
- "model_suffix": "Qwen3-ASR-1.7B-lora-merged",
494
- "model_info": "ModelInfo(model_type='my_qwen3_asr_rl', model_dir='/data/haobin/Qwen3-ASR/Qwen3-ASR-1.7B-lora-merged', torch_dtype=torch.bfloat16, max_model_len=65536, quant_method=None, quant_bits=None, rope_scaling={'interleaved': True, 'mrope_interleaved': True, 'mrope_section': [24, 20, 20], 'rope_type': 'default', 'type': 'default'}, is_moe_model=False, is_multimodal=True, config=None, task_type='causal_lm', num_labels=None)",
495
- "model_meta": "ModelMeta(model_type='my_qwen3_asr_rl', model_groups=[ModelGroup(models=[Model(ms_model_id='Qwen/Qwen3-ASR-0.6B', hf_model_id=None, model_path=None, ms_revision=None, hf_revision=None), Model(ms_model_id='Qwen/Qwen3-ASR-1.7B', hf_model_id=None, model_path=None, ms_revision=None, hf_revision=None)], template=None, ignore_patterns=None, requires=None, tags=[])], loader=<class 'my_qwen3_asr_dapo_register.Qwen3ASRRLLoader'>, template='my_qwen3_asr_rl', model_arch=MultiModelKeys(arch_name='my_qwen3_asr_rl', embedding=None, module_list=None, lm_head=None, q_proj=None, k_proj=None, v_proj=None, o_proj=None, attention=None, mlp=None, down_proj=None, qkv_proj=None, qk_proj=None, qa_proj=None, qb_proj=None, kv_proj=None, kva_proj=None, kvb_proj=None, language_model=['thinker.model', 'thinker.lm_head'], aligner=['thinker.audio_tower.proj1', 'thinker.audio_tower.proj2'], vision_tower=['thinker.audio_tower'], generator=[]), architectures=['Qwen3ASRForConditionalGeneration'], additional_saved_files=['generation_config.json', 'preprocessor_config.json', 'processor_config.json', 'tokenizer_config.json', 'tokenizer.json', 'special_tokens_map.json', 'chat_template.json', 'merges.txt', 'vocab.json'], torch_dtype=None, is_multimodal=True, is_reward=False, task_type=None, ignore_patterns=None, requires=['transformers>=4.57', 'qwen-asr', 'librosa'], tags=['audio'])",
496
- "model_dir": "/data/haobin/Qwen3-ASR/Qwen3-ASR-1.7B-lora-merged",
497
- "template_meta": "TemplateMeta(template_type='my_qwen3_asr_rl', prefix=[], prompt=['{{QUERY}}'], chat_sep=[], suffix=[''], template_cls=<class 'my_qwen3_asr_dapo_register.Qwen3ASRRLTemplate'>, system_prefix=[], default_system=None, auto_add_bos=False, stop_words=[], agent_template='react_en', is_thinking=False, thinking_prefix='', non_thinking_prefix='', history_thinking_prefix='')",
498
- "_val_dataset_exists": true,
499
- "hub": "<class 'swift.hub.hub.MSHub'>",
500
- "evaluation_strategy": "steps",
501
- "training_args": "GRPOConfig(output_dir='/data/haobin/pky_train/qwen3_swift/pky_out/qwen3asr_dapo_reward5_3x8x8_12gen_3GPU/v3-20260410-173721', overwrite_output_dir=False, do_train=False, do_eval=True, do_predict=False, eval_strategy=<IntervalStrategy.STEPS: 'steps'>, prediction_loss_only=False, per_device_train_batch_size=4, per_device_eval_batch_size=4, per_gpu_train_batch_size=None, per_gpu_eval_batch_size=None, gradient_accumulation_steps=16, eval_accumulation_steps=None, eval_delay=0, torch_empty_cache_steps=None, learning_rate=5e-05, weight_decay=0.1, adam_beta1=0.9, adam_beta2=0.95, adam_epsilon=1e-08, max_grad_norm=1.0, num_train_epochs=3.0, max_steps=-1, lr_scheduler_type=<SchedulerType.COSINE: 'cosine'>, lr_scheduler_kwargs=None, warmup_ratio=0.03, warmup_steps=0, log_level='passive', log_level_replica='warning', log_on_each_node=True, logging_dir='/data/haobin/pky_train/qwen3_swift/pky_out/qwen3asr_dapo_reward5_3x8x8_12gen_3GPU/v3-20260410-173721/runs', logging_strategy=<IntervalStrategy.STEPS: 'steps'>, logging_first_step=True, logging_steps=5, logging_nan_inf_filter=True, save_strategy=<SaveStrategy.STEPS: 'steps'>, save_steps=20, save_total_limit=None, save_safetensors=True, save_on_each_node=False, save_only_model=False, restore_callback_states_from_checkpoint=False, no_cuda=False, use_cpu=False, use_mps_device=False, seed=42, data_seed=42, jit_mode_eval=False, bf16=True, fp16=False, fp16_opt_level='O1', half_precision_backend='auto', bf16_full_eval=False, fp16_full_eval=False, tf32=None, local_rank=0, ddp_backend=None, tpu_num_cores=None, tpu_metrics_debug=False, debug=[], dataloader_drop_last=True, eval_steps=20, dataloader_num_workers=1, dataloader_prefetch_factor=2, past_index=-1, run_name='qwen3asr_dapo_reward5_3x8x8_12gen_3GPU', disable_tqdm=False, remove_unused_columns=False, label_names=None, load_best_model_at_end=False, metric_for_best_model='loss', greater_is_better=False, ignore_data_skip=False, fsdp=[], fsdp_min_num_params=0, fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_transformer_layer_cls_to_wrap=None, accelerator_config=AcceleratorConfig(split_batches=False, dispatch_batches=False, even_batches=True, use_seedable_sampler=True, non_blocking=False, gradient_accumulation_kwargs=None, use_configured_state=False), parallelism_config=None, deepspeed=None, label_smoothing_factor=0.0, optim=<OptimizerNames.ADAMW_TORCH_FUSED: 'adamw_torch_fused'>, optim_args=None, adafactor=False, group_by_length=False, length_column_name='length', report_to=['wandb'], project='huggingface', trackio_space_id='trackio', ddp_find_unused_parameters=None, ddp_bucket_cap_mb=None, ddp_broadcast_buffers=None, dataloader_pin_memory=True, dataloader_persistent_workers=False, skip_memory_metrics=True, use_legacy_prediction_loop=False, push_to_hub=False, resume_from_checkpoint=None, hub_model_id=None, hub_strategy=<HubStrategy.EVERY_SAVE: 'every_save'>, hub_token=None, hub_private_repo=None, hub_always_push=False, hub_revision=None, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, include_inputs_for_metrics=False, include_for_metrics=[], eval_do_concat_batches=True, fp16_backend='auto', push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=None, mp_parameters='', auto_find_batch_size=False, full_determinism=False, torchdynamo=None, ray_scope='last', ddp_timeout=18000000, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, include_tokens_per_second=None, include_num_input_tokens_seen=None, neftune_noise_alpha=None, optim_target_modules=None, batch_eval_metrics=False, eval_on_start=False, use_liger_kernel=False, liger_kernel_config=None, eval_use_gather_object=False, average_tokens_across_devices=None, model_init_kwargs=None, disable_dropout=False, cast_lm_head_to_fp32=False, num_generations=12, num_generations_eval=4, max_completion_length=256, ds3_gather_for_generation=True, shuffle_dataset=True, generation_batch_size=48, steps_per_generation=4, temperature=0.5, top_p=0.95, top_k=50, min_p=None, generation_kwargs=None, chat_template_kwargs=None, repetition_penalty=1.08, use_transformers_paged=False, cache_implementation=None, use_vllm=False, vllm_mode=None, vllm_model_impl='vllm', vllm_enable_sleep_mode=False, vllm_structured_outputs_regex=None, vllm_server_base_url=None, vllm_server_host=None, vllm_server_port=[8000], vllm_server_timeout=240.0, vllm_group_port=51216, vllm_gpu_memory_utilization=0.9, vllm_max_model_length=None, vllm_tensor_parallel_size=1, beta=0.04, num_iterations=2, epsilon=0.2, delta=None, epsilon_high=0.28, sapo_temperature_neg=1.05, sapo_temperature_pos=1.0, importance_sampling_level='token', reward_weights=None, multi_objective_aggregation='sum_then_normalize', scale_rewards='group', loss_type='dapo', mask_truncated_completions=False, sync_ref_model=False, ref_model_mixup_alpha=0.6, ref_model_sync_steps=512, top_entropy_quantile=1.0, max_tool_calling_iterations=None, vllm_importance_sampling_correction=True, vllm_importance_sampling_mode='sequence_mask', vllm_importance_sampling_cap=3.0, off_policy_mask_threshold=None, use_bias_correction_kl=False, log_completions=True, num_completions_to_print=None, log_unique_prompts=False, log_completions_hub_repo=None, tuner_backend='peft', vit_gradient_checkpointing=True, router_aux_loss_coef=0.0, enable_dft_loss=False, enable_channel_loss=False, safe_serialization=True, max_shard_size='5GB', check_model=True, acc_strategy='token', train_dataloader_shuffle=True, max_epochs=None, aligner_lr=None, vit_lr=None, use_logits_to_keep=None, resume_only_model=False, optimizer=None, eval_metric=None, callbacks=[], early_stop_interval=None, eval_use_evalscope=False, eval_dataset=[], eval_dataset_args=None, eval_limit=None, eval_generation_config=None, extra_eval_args=None, tuner_type='lora', use_galore=False, galore_target_modules=None, galore_rank=128, galore_update_proj_gap=50, galore_scale=1.0, galore_proj_type='std', galore_optim_per_parameter=False, galore_with_embedding=False, galore_quantization=False, galore_proj_quant=False, galore_proj_bits=4, galore_proj_group_size=256, galore_cos_threshold=0.4, galore_gamma_proj=2, galore_queue_size=5, lisa_activated_layers=0, lisa_step_interval=20, use_flash_ckpt=False, vllm_pipeline_parallel_size=1, vllm_enable_expert_parallel=False, vllm_max_num_seqs=None, vllm_max_model_len=None, vllm_disable_custom_all_reduce=True, vllm_enforce_eager=False, vllm_limit_mm_per_prompt=None, vllm_max_lora_rank=16, vllm_enable_prefix_caching=True, vllm_use_async_engine=None, vllm_quantization=None, vllm_reasoning_parser=None, vllm_disable_cascade_attn=False, vllm_mm_processor_cache_gb=None, vllm_speculative_config=None, vllm_engine_kwargs={}, vllm_data_parallel_size=1, stop_words=[], vllm_enable_lora=False, lora_rank=8, vllm_server_group_port=None, enable_flattened_weight_sync=True, async_generate=False, structured_outputs_regex=None, sleep_level=0, move_model_batches=None, offload_optimizer=False, offload_model=False, wandb_log_unique_prompts=None, cosine_min_len_value_wrong=-0.5, cosine_max_len_value_wrong=0.0, cosine_min_len_value_correct=1.0, cosine_max_len_value_correct=0.5, cosine_max_len=256, repetition_n_grams=3, repetition_max_penalty=-1.0, reward_model=None, reward_model_plugin=None, chord_sft_dataset=[], chord_sft_per_device_train_batch_size=None, chord_enable_phi_function=False, chord_mu_warmup_steps=None, chord_mu_decay_steps=None, chord_mu_peak=None, chord_mu_valley=None, multi_turn_scheduler=None, max_turns=None, completion_length_limit_scope='per_round', vllm_server_pass_dataset=False, dynamic_sample=True, max_resample_times=4, overlong_filter=True, soft_max_length=None, soft_cache_length=None, log_entropy=False, tau_pos=1.0, tau_neg=1.05, advantage_estimator='grpo', kl_in_reward=False, dataset_shuffle=True, rollout_importance_sampling_mode=None, rollout_importance_sampling_threshold=2.0, log_rollout_offpolicy_metrics=False, off_policy_sequence_mask_delta=None)"
502
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
lora/lora-stage3/optimizer.pt DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:226e26b176c37ed7c792adfd2fe4136f95d2fdb572f9bb695f787161e8da0faa
3
- size 99183201
 
 
 
 
lora/lora-stage3/rng_state_0.pth DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:e6aa29f654dcff45f4d494e85fba95c300e2ba77360edeca5a3899f79909e7ce
3
- size 14725
 
 
 
 
lora/lora-stage3/rng_state_1.pth DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:47db367bb33a2abe8e3e662eec69e0be4925b4a0a64b5b6c12647bc9faa62ad2
3
- size 14661
 
 
 
 
lora/lora-stage3/rng_state_2.pth DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:2faaa8f2708f53af7418ce42a1a06c28bcd4f75dce65c528b4f754d02132f5c0
3
- size 14661