Remove staged LoRA folder
Browse filesThis view is limited to 50 files because it contains too many changes. See raw diff
- lora/lora-stage1/README.md +0 -207
- lora/lora-stage1/adapter_config.json +0 -38
- lora/lora-stage1/adapter_model.safetensors +0 -3
- lora/lora-stage1/added_tokens.json +0 -64
- lora/lora-stage1/base_model.txt +0 -1
- lora/lora-stage1/chat_template.jinja +0 -31
- lora/lora-stage1/chat_template.json +0 -1
- lora/lora-stage1/config.json +0 -221
- lora/lora-stage1/generation_config.json +0 -7
- lora/lora-stage1/merges.txt +0 -0
- lora/lora-stage1/optimizer.pt +0 -3
- lora/lora-stage1/preprocessor_config.json +0 -14
- lora/lora-stage1/rng_state_0.pth +0 -3
- lora/lora-stage1/rng_state_1.pth +0 -3
- lora/lora-stage1/scheduler.pt +0 -3
- lora/lora-stage1/special_tokens_map.json +0 -44
- lora/lora-stage1/tokenizer.json +0 -3
- lora/lora-stage1/tokenizer_config.json +0 -549
- lora/lora-stage1/trainer_state.json +0 -774
- lora/lora-stage1/vocab.json +0 -0
- lora/lora-stage2/README.md +0 -207
- lora/lora-stage2/adapter_config.json +0 -38
- lora/lora-stage2/adapter_model.safetensors +0 -3
- lora/lora-stage2/added_tokens.json +0 -64
- lora/lora-stage2/base_model.txt +0 -1
- lora/lora-stage2/chat_template.jinja +0 -31
- lora/lora-stage2/chat_template.json +0 -1
- lora/lora-stage2/config.json +0 -221
- lora/lora-stage2/generation_config.json +0 -7
- lora/lora-stage2/merged_from_lora.txt +0 -1
- lora/lora-stage2/merges.txt +0 -0
- lora/lora-stage2/optimizer.pt +0 -3
- lora/lora-stage2/preprocessor_config.json +0 -14
- lora/lora-stage2/rng_state_0.pth +0 -3
- lora/lora-stage2/rng_state_1.pth +0 -3
- lora/lora-stage2/scheduler.pt +0 -3
- lora/lora-stage2/special_tokens_map.json +0 -44
- lora/lora-stage2/tokenizer.json +0 -3
- lora/lora-stage2/tokenizer_config.json +0 -549
- lora/lora-stage2/trainer_state.json +0 -0
- lora/lora-stage2/vocab.json +0 -0
- lora/lora-stage3/README.md +0 -207
- lora/lora-stage3/adapter_config.json +0 -38
- lora/lora-stage3/adapter_model.safetensors +0 -3
- lora/lora-stage3/additional_config.json +0 -1
- lora/lora-stage3/args.json +0 -502
- lora/lora-stage3/optimizer.pt +0 -3
- lora/lora-stage3/rng_state_0.pth +0 -3
- lora/lora-stage3/rng_state_1.pth +0 -3
- lora/lora-stage3/rng_state_2.pth +0 -3
lora/lora-stage1/README.md
DELETED
|
@@ -1,207 +0,0 @@
|
|
| 1 |
-
---
|
| 2 |
-
base_model: ''
|
| 3 |
-
library_name: peft
|
| 4 |
-
pipeline_tag: text-generation
|
| 5 |
-
tags:
|
| 6 |
-
- 'base_model:adapter:'
|
| 7 |
-
- lora
|
| 8 |
-
- transformers
|
| 9 |
-
---
|
| 10 |
-
|
| 11 |
-
# Model Card for Model ID
|
| 12 |
-
|
| 13 |
-
<!-- Provide a quick summary of what the model is/does. -->
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
## Model Details
|
| 18 |
-
|
| 19 |
-
### Model Description
|
| 20 |
-
|
| 21 |
-
<!-- Provide a longer summary of what this model is. -->
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
- **Developed by:** [More Information Needed]
|
| 26 |
-
- **Funded by [optional]:** [More Information Needed]
|
| 27 |
-
- **Shared by [optional]:** [More Information Needed]
|
| 28 |
-
- **Model type:** [More Information Needed]
|
| 29 |
-
- **Language(s) (NLP):** [More Information Needed]
|
| 30 |
-
- **License:** [More Information Needed]
|
| 31 |
-
- **Finetuned from model [optional]:** [More Information Needed]
|
| 32 |
-
|
| 33 |
-
### Model Sources [optional]
|
| 34 |
-
|
| 35 |
-
<!-- Provide the basic links for the model. -->
|
| 36 |
-
|
| 37 |
-
- **Repository:** [More Information Needed]
|
| 38 |
-
- **Paper [optional]:** [More Information Needed]
|
| 39 |
-
- **Demo [optional]:** [More Information Needed]
|
| 40 |
-
|
| 41 |
-
## Uses
|
| 42 |
-
|
| 43 |
-
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
| 44 |
-
|
| 45 |
-
### Direct Use
|
| 46 |
-
|
| 47 |
-
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
|
| 48 |
-
|
| 49 |
-
[More Information Needed]
|
| 50 |
-
|
| 51 |
-
### Downstream Use [optional]
|
| 52 |
-
|
| 53 |
-
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
|
| 54 |
-
|
| 55 |
-
[More Information Needed]
|
| 56 |
-
|
| 57 |
-
### Out-of-Scope Use
|
| 58 |
-
|
| 59 |
-
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
|
| 60 |
-
|
| 61 |
-
[More Information Needed]
|
| 62 |
-
|
| 63 |
-
## Bias, Risks, and Limitations
|
| 64 |
-
|
| 65 |
-
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
| 66 |
-
|
| 67 |
-
[More Information Needed]
|
| 68 |
-
|
| 69 |
-
### Recommendations
|
| 70 |
-
|
| 71 |
-
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
|
| 72 |
-
|
| 73 |
-
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
|
| 74 |
-
|
| 75 |
-
## How to Get Started with the Model
|
| 76 |
-
|
| 77 |
-
Use the code below to get started with the model.
|
| 78 |
-
|
| 79 |
-
[More Information Needed]
|
| 80 |
-
|
| 81 |
-
## Training Details
|
| 82 |
-
|
| 83 |
-
### Training Data
|
| 84 |
-
|
| 85 |
-
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
| 86 |
-
|
| 87 |
-
[More Information Needed]
|
| 88 |
-
|
| 89 |
-
### Training Procedure
|
| 90 |
-
|
| 91 |
-
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
| 92 |
-
|
| 93 |
-
#### Preprocessing [optional]
|
| 94 |
-
|
| 95 |
-
[More Information Needed]
|
| 96 |
-
|
| 97 |
-
|
| 98 |
-
#### Training Hyperparameters
|
| 99 |
-
|
| 100 |
-
- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
|
| 101 |
-
|
| 102 |
-
#### Speeds, Sizes, Times [optional]
|
| 103 |
-
|
| 104 |
-
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
|
| 105 |
-
|
| 106 |
-
[More Information Needed]
|
| 107 |
-
|
| 108 |
-
## Evaluation
|
| 109 |
-
|
| 110 |
-
<!-- This section describes the evaluation protocols and provides the results. -->
|
| 111 |
-
|
| 112 |
-
### Testing Data, Factors & Metrics
|
| 113 |
-
|
| 114 |
-
#### Testing Data
|
| 115 |
-
|
| 116 |
-
<!-- This should link to a Dataset Card if possible. -->
|
| 117 |
-
|
| 118 |
-
[More Information Needed]
|
| 119 |
-
|
| 120 |
-
#### Factors
|
| 121 |
-
|
| 122 |
-
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
|
| 123 |
-
|
| 124 |
-
[More Information Needed]
|
| 125 |
-
|
| 126 |
-
#### Metrics
|
| 127 |
-
|
| 128 |
-
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
| 129 |
-
|
| 130 |
-
[More Information Needed]
|
| 131 |
-
|
| 132 |
-
### Results
|
| 133 |
-
|
| 134 |
-
[More Information Needed]
|
| 135 |
-
|
| 136 |
-
#### Summary
|
| 137 |
-
|
| 138 |
-
|
| 139 |
-
|
| 140 |
-
## Model Examination [optional]
|
| 141 |
-
|
| 142 |
-
<!-- Relevant interpretability work for the model goes here -->
|
| 143 |
-
|
| 144 |
-
[More Information Needed]
|
| 145 |
-
|
| 146 |
-
## Environmental Impact
|
| 147 |
-
|
| 148 |
-
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
|
| 149 |
-
|
| 150 |
-
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
|
| 151 |
-
|
| 152 |
-
- **Hardware Type:** [More Information Needed]
|
| 153 |
-
- **Hours used:** [More Information Needed]
|
| 154 |
-
- **Cloud Provider:** [More Information Needed]
|
| 155 |
-
- **Compute Region:** [More Information Needed]
|
| 156 |
-
- **Carbon Emitted:** [More Information Needed]
|
| 157 |
-
|
| 158 |
-
## Technical Specifications [optional]
|
| 159 |
-
|
| 160 |
-
### Model Architecture and Objective
|
| 161 |
-
|
| 162 |
-
[More Information Needed]
|
| 163 |
-
|
| 164 |
-
### Compute Infrastructure
|
| 165 |
-
|
| 166 |
-
[More Information Needed]
|
| 167 |
-
|
| 168 |
-
#### Hardware
|
| 169 |
-
|
| 170 |
-
[More Information Needed]
|
| 171 |
-
|
| 172 |
-
#### Software
|
| 173 |
-
|
| 174 |
-
[More Information Needed]
|
| 175 |
-
|
| 176 |
-
## Citation [optional]
|
| 177 |
-
|
| 178 |
-
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
| 179 |
-
|
| 180 |
-
**BibTeX:**
|
| 181 |
-
|
| 182 |
-
[More Information Needed]
|
| 183 |
-
|
| 184 |
-
**APA:**
|
| 185 |
-
|
| 186 |
-
[More Information Needed]
|
| 187 |
-
|
| 188 |
-
## Glossary [optional]
|
| 189 |
-
|
| 190 |
-
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
|
| 191 |
-
|
| 192 |
-
[More Information Needed]
|
| 193 |
-
|
| 194 |
-
## More Information [optional]
|
| 195 |
-
|
| 196 |
-
[More Information Needed]
|
| 197 |
-
|
| 198 |
-
## Model Card Authors [optional]
|
| 199 |
-
|
| 200 |
-
[More Information Needed]
|
| 201 |
-
|
| 202 |
-
## Model Card Contact
|
| 203 |
-
|
| 204 |
-
[More Information Needed]
|
| 205 |
-
### Framework versions
|
| 206 |
-
|
| 207 |
-
- PEFT 0.18.1
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
lora/lora-stage1/adapter_config.json
DELETED
|
@@ -1,38 +0,0 @@
|
|
| 1 |
-
{
|
| 2 |
-
"alora_invocation_tokens": null,
|
| 3 |
-
"alpha_pattern": {},
|
| 4 |
-
"arrow_config": null,
|
| 5 |
-
"auto_mapping": null,
|
| 6 |
-
"base_model_name_or_path": "",
|
| 7 |
-
"bias": "none",
|
| 8 |
-
"corda_config": null,
|
| 9 |
-
"ensure_weight_tying": false,
|
| 10 |
-
"eva_config": null,
|
| 11 |
-
"exclude_modules": null,
|
| 12 |
-
"fan_in_fan_out": false,
|
| 13 |
-
"inference_mode": true,
|
| 14 |
-
"init_lora_weights": true,
|
| 15 |
-
"layer_replication": null,
|
| 16 |
-
"layers_pattern": null,
|
| 17 |
-
"layers_to_transform": null,
|
| 18 |
-
"loftq_config": {},
|
| 19 |
-
"lora_alpha": 16,
|
| 20 |
-
"lora_bias": false,
|
| 21 |
-
"lora_dropout": 0.05,
|
| 22 |
-
"megatron_config": null,
|
| 23 |
-
"megatron_core": "megatron.core",
|
| 24 |
-
"modules_to_save": null,
|
| 25 |
-
"peft_type": "LORA",
|
| 26 |
-
"peft_version": "0.18.1",
|
| 27 |
-
"qalora_group_size": 16,
|
| 28 |
-
"r": 8,
|
| 29 |
-
"rank_pattern": {},
|
| 30 |
-
"revision": null,
|
| 31 |
-
"target_modules": "^(audio_tower\\.(conv_out|proj1|proj2)$|audio_tower\\.layers\\.(20|21|22|23)\\..*\\.(q_proj|k_proj|v_proj|out_proj|fc1|fc2)$)",
|
| 32 |
-
"target_parameters": null,
|
| 33 |
-
"task_type": "CAUSAL_LM",
|
| 34 |
-
"trainable_token_indices": null,
|
| 35 |
-
"use_dora": false,
|
| 36 |
-
"use_qalora": false,
|
| 37 |
-
"use_rslora": false
|
| 38 |
-
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
lora/lora-stage1/adapter_model.safetensors
DELETED
|
@@ -1,3 +0,0 @@
|
|
| 1 |
-
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:31052a993cbb582a250886db7dfcc327ab86ee8adc5229882bd48227b892c752
|
| 3 |
-
size 1496072
|
|
|
|
|
|
|
|
|
|
|
|
lora/lora-stage1/added_tokens.json
DELETED
|
@@ -1,64 +0,0 @@
|
|
| 1 |
-
{
|
| 2 |
-
"</think>": 151668,
|
| 3 |
-
"</tool_call>": 151658,
|
| 4 |
-
"</tool_response>": 151666,
|
| 5 |
-
"<asr_text>": 151704,
|
| 6 |
-
"<blank10>": 151686,
|
| 7 |
-
"<blank11>": 151687,
|
| 8 |
-
"<blank12>": 151688,
|
| 9 |
-
"<blank13>": 151689,
|
| 10 |
-
"<blank14>": 151690,
|
| 11 |
-
"<blank15>": 151691,
|
| 12 |
-
"<blank16>": 151692,
|
| 13 |
-
"<blank17>": 151693,
|
| 14 |
-
"<blank18>": 151694,
|
| 15 |
-
"<blank19>": 151695,
|
| 16 |
-
"<blank1>": 151677,
|
| 17 |
-
"<blank20>": 151696,
|
| 18 |
-
"<blank21>": 151697,
|
| 19 |
-
"<blank22>": 151698,
|
| 20 |
-
"<blank23>": 151699,
|
| 21 |
-
"<blank24>": 151700,
|
| 22 |
-
"<blank25>": 151701,
|
| 23 |
-
"<blank26>": 151702,
|
| 24 |
-
"<blank27>": 151703,
|
| 25 |
-
"<blank2>": 151678,
|
| 26 |
-
"<blank3>": 151679,
|
| 27 |
-
"<blank4>": 151680,
|
| 28 |
-
"<blank5>": 151681,
|
| 29 |
-
"<blank6>": 151682,
|
| 30 |
-
"<blank7>": 151683,
|
| 31 |
-
"<blank8>": 151684,
|
| 32 |
-
"<blank9>": 151685,
|
| 33 |
-
"<non_speech>": 151675,
|
| 34 |
-
"<think>": 151667,
|
| 35 |
-
"<tool_call>": 151657,
|
| 36 |
-
"<tool_response>": 151665,
|
| 37 |
-
"<tts_pad>": 151671,
|
| 38 |
-
"<tts_text_bos>": 151672,
|
| 39 |
-
"<tts_text_bos_single>": 151674,
|
| 40 |
-
"<tts_text_eod>": 151673,
|
| 41 |
-
"<|audio_end|>": 151670,
|
| 42 |
-
"<|audio_pad|>": 151676,
|
| 43 |
-
"<|audio_start|>": 151669,
|
| 44 |
-
"<|box_end|>": 151649,
|
| 45 |
-
"<|box_start|>": 151648,
|
| 46 |
-
"<|endoftext|>": 151643,
|
| 47 |
-
"<|file_sep|>": 151664,
|
| 48 |
-
"<|fim_middle|>": 151660,
|
| 49 |
-
"<|fim_pad|>": 151662,
|
| 50 |
-
"<|fim_prefix|>": 151659,
|
| 51 |
-
"<|fim_suffix|>": 151661,
|
| 52 |
-
"<|im_end|>": 151645,
|
| 53 |
-
"<|im_start|>": 151644,
|
| 54 |
-
"<|image_pad|>": 151655,
|
| 55 |
-
"<|object_ref_end|>": 151647,
|
| 56 |
-
"<|object_ref_start|>": 151646,
|
| 57 |
-
"<|quad_end|>": 151651,
|
| 58 |
-
"<|quad_start|>": 151650,
|
| 59 |
-
"<|repo_name|>": 151663,
|
| 60 |
-
"<|video_pad|>": 151656,
|
| 61 |
-
"<|vision_end|>": 151653,
|
| 62 |
-
"<|vision_pad|>": 151654,
|
| 63 |
-
"<|vision_start|>": 151652
|
| 64 |
-
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
lora/lora-stage1/base_model.txt
DELETED
|
@@ -1 +0,0 @@
|
|
| 1 |
-
/data/haobin/pky_train/qwen3/Qwen3-ASR-1.7B
|
|
|
|
|
|
lora/lora-stage1/chat_template.jinja
DELETED
|
@@ -1,31 +0,0 @@
|
|
| 1 |
-
{%- set ns = namespace(system_text="") -%}
|
| 2 |
-
{%- for m in messages -%}
|
| 3 |
-
{%- if m.role == 'system' -%}
|
| 4 |
-
{%- if m.content is string -%}
|
| 5 |
-
{%- set ns.system_text = ns.system_text + m.content -%}
|
| 6 |
-
{%- else -%}
|
| 7 |
-
{%- for c in m.content -%}
|
| 8 |
-
{%- if c.type == 'text' and (c.text is defined) -%}
|
| 9 |
-
{%- set ns.system_text = ns.system_text + c.text -%}
|
| 10 |
-
{%- endif -%}
|
| 11 |
-
{%- endfor -%}
|
| 12 |
-
{%- endif -%}
|
| 13 |
-
{%- endif -%}
|
| 14 |
-
{%- endfor -%}
|
| 15 |
-
|
| 16 |
-
{%- set ns2 = namespace(audio_tokens="") -%}
|
| 17 |
-
{%- for m in messages -%}
|
| 18 |
-
{%- if m.content is not string -%}
|
| 19 |
-
{%- for c in m.content -%}
|
| 20 |
-
{%- if c.type == 'audio' or ('audio' in c) or ('audio_url' in c) -%}
|
| 21 |
-
{%- set ns2.audio_tokens = ns2.audio_tokens + "<|audio_start|><|audio_pad|><|audio_end|>" -%}
|
| 22 |
-
{%- endif -%}
|
| 23 |
-
{%- endfor -%}
|
| 24 |
-
{%- endif -%}
|
| 25 |
-
{%- endfor -%}
|
| 26 |
-
|
| 27 |
-
{{- '<|im_start|>system\n' + (ns.system_text if ns.system_text is string else '') + '<|im_end|>\n' -}}
|
| 28 |
-
{{- '<|im_start|>user\n' + ns2.audio_tokens + '<|im_end|>\n' -}}
|
| 29 |
-
{%- if add_generation_prompt -%}
|
| 30 |
-
{{- '<|im_start|>assistant\n' -}}
|
| 31 |
-
{%- endif -%}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
lora/lora-stage1/chat_template.json
DELETED
|
@@ -1 +0,0 @@
|
|
| 1 |
-
{"chat_template": "{%- set ns = namespace(system_text=\"\") -%}\n{%- for m in messages -%}\n {%- if m.role == 'system' -%}\n {%- if m.content is string -%}\n {%- set ns.system_text = ns.system_text + m.content -%}\n {%- else -%}\n {%- for c in m.content -%}\n {%- if c.type == 'text' and (c.text is defined) -%}\n {%- set ns.system_text = ns.system_text + c.text -%}\n {%- endif -%}\n {%- endfor -%}\n {%- endif -%}\n {%- endif -%}\n{%- endfor -%}\n\n{%- set ns2 = namespace(audio_tokens=\"\") -%}\n{%- for m in messages -%}\n {%- if m.content is not string -%}\n {%- for c in m.content -%}\n {%- if c.type == 'audio' or ('audio' in c) or ('audio_url' in c) -%}\n {%- set ns2.audio_tokens = ns2.audio_tokens + \"<|audio_start|><|audio_pad|><|audio_end|>\" -%}\n {%- endif -%}\n {%- endfor -%}\n {%- endif -%}\n{%- endfor -%}\n\n{{- '<|im_start|>system\\n' + (ns.system_text if ns.system_text is string else '') + '<|im_end|>\\n' -}}\n{{- '<|im_start|>user\\n' + ns2.audio_tokens + '<|im_end|>\\n' -}}\n{%- if add_generation_prompt -%}\n{{- '<|im_start|>assistant\\n' -}}\n{%- endif -%}"}
|
|
|
|
|
|
lora/lora-stage1/config.json
DELETED
|
@@ -1,221 +0,0 @@
|
|
| 1 |
-
{
|
| 2 |
-
"architectures": [
|
| 3 |
-
"Qwen3ASRForConditionalGeneration"
|
| 4 |
-
],
|
| 5 |
-
"model_type": "qwen3_asr",
|
| 6 |
-
"support_languages": [
|
| 7 |
-
"Chinese",
|
| 8 |
-
"English",
|
| 9 |
-
"Cantonese",
|
| 10 |
-
"Arabic",
|
| 11 |
-
"German",
|
| 12 |
-
"French",
|
| 13 |
-
"Spanish",
|
| 14 |
-
"Portuguese",
|
| 15 |
-
"Indonesian",
|
| 16 |
-
"Italian",
|
| 17 |
-
"Korean",
|
| 18 |
-
"Russian",
|
| 19 |
-
"Thai",
|
| 20 |
-
"Vietnamese",
|
| 21 |
-
"Japanese",
|
| 22 |
-
"Turkish",
|
| 23 |
-
"Hindi",
|
| 24 |
-
"Malay",
|
| 25 |
-
"Dutch",
|
| 26 |
-
"Swedish",
|
| 27 |
-
"Danish",
|
| 28 |
-
"Finnish",
|
| 29 |
-
"Polish",
|
| 30 |
-
"Czech",
|
| 31 |
-
"Filipino",
|
| 32 |
-
"Persian",
|
| 33 |
-
"Greek",
|
| 34 |
-
"Romanian",
|
| 35 |
-
"Hungarian",
|
| 36 |
-
"Macedonian"
|
| 37 |
-
],
|
| 38 |
-
"thinker_config": {
|
| 39 |
-
"model_type": "qwen3_asr",
|
| 40 |
-
"architectures": [
|
| 41 |
-
"Qwen3ASRForConditionalGeneration"
|
| 42 |
-
],
|
| 43 |
-
"audio_config": {
|
| 44 |
-
"_name_or_path": "",
|
| 45 |
-
"activation_dropout": 0,
|
| 46 |
-
"activation_function": "gelu",
|
| 47 |
-
"add_cross_attention": false,
|
| 48 |
-
"architectures": null,
|
| 49 |
-
"attention_dropout": 0,
|
| 50 |
-
"bad_words_ids": null,
|
| 51 |
-
"begin_suppress_tokens": null,
|
| 52 |
-
"bos_token_id": null,
|
| 53 |
-
"chunk_size_feed_forward": 0,
|
| 54 |
-
"conv_chunksize": 500,
|
| 55 |
-
"cross_attention_hidden_size": null,
|
| 56 |
-
"d_model": 1024,
|
| 57 |
-
"decoder_start_token_id": null,
|
| 58 |
-
"diversity_penalty": 0.0,
|
| 59 |
-
"do_sample": false,
|
| 60 |
-
"downsample_hidden_size": 480,
|
| 61 |
-
"dropout": 0,
|
| 62 |
-
"dtype": null,
|
| 63 |
-
"early_stopping": false,
|
| 64 |
-
"encoder_attention_heads": 16,
|
| 65 |
-
"encoder_ffn_dim": 4096,
|
| 66 |
-
"encoder_layers": 24,
|
| 67 |
-
"encoder_no_repeat_ngram_size": 0,
|
| 68 |
-
"eos_token_id": null,
|
| 69 |
-
"exponential_decay_length_penalty": null,
|
| 70 |
-
"finetuning_task": null,
|
| 71 |
-
"forced_bos_token_id": null,
|
| 72 |
-
"forced_eos_token_id": null,
|
| 73 |
-
"id2label": {
|
| 74 |
-
"0": "LABEL_0",
|
| 75 |
-
"1": "LABEL_1"
|
| 76 |
-
},
|
| 77 |
-
"initializer_range": 0.02,
|
| 78 |
-
"is_decoder": false,
|
| 79 |
-
"is_encoder_decoder": false,
|
| 80 |
-
"label2id": {
|
| 81 |
-
"LABEL_0": 0,
|
| 82 |
-
"LABEL_1": 1
|
| 83 |
-
},
|
| 84 |
-
"length_penalty": 1.0,
|
| 85 |
-
"max_length": 20,
|
| 86 |
-
"max_source_positions": 1500,
|
| 87 |
-
"min_length": 0,
|
| 88 |
-
"model_type": "qwen3_asr_audio_encoder",
|
| 89 |
-
"n_window": 50,
|
| 90 |
-
"n_window_infer": 800,
|
| 91 |
-
"no_repeat_ngram_size": 0,
|
| 92 |
-
"num_beam_groups": 1,
|
| 93 |
-
"num_beams": 1,
|
| 94 |
-
"num_hidden_layers": 24,
|
| 95 |
-
"num_mel_bins": 128,
|
| 96 |
-
"num_return_sequences": 1,
|
| 97 |
-
"output_attentions": false,
|
| 98 |
-
"output_dim": 2048,
|
| 99 |
-
"output_hidden_states": false,
|
| 100 |
-
"output_scores": false,
|
| 101 |
-
"pad_token_id": null,
|
| 102 |
-
"prefix": null,
|
| 103 |
-
"problem_type": null,
|
| 104 |
-
"pruned_heads": {},
|
| 105 |
-
"remove_invalid_values": false,
|
| 106 |
-
"repetition_penalty": 1.0,
|
| 107 |
-
"return_dict": true,
|
| 108 |
-
"return_dict_in_generate": false,
|
| 109 |
-
"scale_embedding": false,
|
| 110 |
-
"sep_token_id": null,
|
| 111 |
-
"suppress_tokens": null,
|
| 112 |
-
"task_specific_params": null,
|
| 113 |
-
"temperature": 1.0,
|
| 114 |
-
"tf_legacy_loss": false,
|
| 115 |
-
"tie_encoder_decoder": false,
|
| 116 |
-
"tie_word_embeddings": true,
|
| 117 |
-
"tokenizer_class": null,
|
| 118 |
-
"top_k": 50,
|
| 119 |
-
"top_p": 1.0,
|
| 120 |
-
"torchscript": false,
|
| 121 |
-
"typical_p": 1.0,
|
| 122 |
-
"use_bfloat16": false
|
| 123 |
-
},
|
| 124 |
-
"audio_end_token_id": 151670,
|
| 125 |
-
"audio_start_token_id": 151669,
|
| 126 |
-
"audio_token_id": 151676,
|
| 127 |
-
"dtype": "bfloat16",
|
| 128 |
-
"initializer_range": 0.02,
|
| 129 |
-
"text_config": {
|
| 130 |
-
"_name_or_path": "",
|
| 131 |
-
"add_cross_attention": false,
|
| 132 |
-
"architectures": null,
|
| 133 |
-
"attention_bias": false,
|
| 134 |
-
"attention_dropout": 0.0,
|
| 135 |
-
"bad_words_ids": null,
|
| 136 |
-
"begin_suppress_tokens": null,
|
| 137 |
-
"bos_token_id": null,
|
| 138 |
-
"chunk_size_feed_forward": 0,
|
| 139 |
-
"cross_attention_hidden_size": null,
|
| 140 |
-
"decoder_start_token_id": null,
|
| 141 |
-
"diversity_penalty": 0.0,
|
| 142 |
-
"do_sample": false,
|
| 143 |
-
"dtype": null,
|
| 144 |
-
"early_stopping": false,
|
| 145 |
-
"encoder_no_repeat_ngram_size": 0,
|
| 146 |
-
"eos_token_id": null,
|
| 147 |
-
"exponential_decay_length_penalty": null,
|
| 148 |
-
"finetuning_task": null,
|
| 149 |
-
"forced_bos_token_id": null,
|
| 150 |
-
"forced_eos_token_id": null,
|
| 151 |
-
"head_dim": 128,
|
| 152 |
-
"hidden_act": "silu",
|
| 153 |
-
"hidden_size": 2048,
|
| 154 |
-
"id2label": {
|
| 155 |
-
"0": "LABEL_0",
|
| 156 |
-
"1": "LABEL_1"
|
| 157 |
-
},
|
| 158 |
-
"initializer_range": 0.02,
|
| 159 |
-
"intermediate_size": 6144,
|
| 160 |
-
"is_decoder": false,
|
| 161 |
-
"is_encoder_decoder": false,
|
| 162 |
-
"label2id": {
|
| 163 |
-
"LABEL_0": 0,
|
| 164 |
-
"LABEL_1": 1
|
| 165 |
-
},
|
| 166 |
-
"length_penalty": 1.0,
|
| 167 |
-
"max_length": 20,
|
| 168 |
-
"max_position_embeddings": 65536,
|
| 169 |
-
"min_length": 0,
|
| 170 |
-
"model_type": "qwen3",
|
| 171 |
-
"no_repeat_ngram_size": 0,
|
| 172 |
-
"num_attention_heads": 16,
|
| 173 |
-
"num_beam_groups": 1,
|
| 174 |
-
"num_beams": 1,
|
| 175 |
-
"num_hidden_layers": 28,
|
| 176 |
-
"num_key_value_heads": 8,
|
| 177 |
-
"num_return_sequences": 1,
|
| 178 |
-
"output_attentions": false,
|
| 179 |
-
"output_hidden_states": false,
|
| 180 |
-
"output_scores": false,
|
| 181 |
-
"pad_token_id": null,
|
| 182 |
-
"prefix": null,
|
| 183 |
-
"problem_type": null,
|
| 184 |
-
"pruned_heads": {},
|
| 185 |
-
"remove_invalid_values": false,
|
| 186 |
-
"repetition_penalty": 1.0,
|
| 187 |
-
"return_dict": true,
|
| 188 |
-
"return_dict_in_generate": false,
|
| 189 |
-
"rms_norm_eps": 1e-06,
|
| 190 |
-
"rope_scaling": {
|
| 191 |
-
"interleaved": true,
|
| 192 |
-
"mrope_interleaved": true,
|
| 193 |
-
"mrope_section": [
|
| 194 |
-
24,
|
| 195 |
-
20,
|
| 196 |
-
20
|
| 197 |
-
],
|
| 198 |
-
"rope_type": "default",
|
| 199 |
-
"type": "default"
|
| 200 |
-
},
|
| 201 |
-
"rope_theta": 1000000,
|
| 202 |
-
"sep_token_id": null,
|
| 203 |
-
"suppress_tokens": null,
|
| 204 |
-
"task_specific_params": null,
|
| 205 |
-
"temperature": 1.0,
|
| 206 |
-
"tf_legacy_loss": false,
|
| 207 |
-
"tie_encoder_decoder": false,
|
| 208 |
-
"tie_word_embeddings": true,
|
| 209 |
-
"tokenizer_class": null,
|
| 210 |
-
"top_k": 50,
|
| 211 |
-
"top_p": 1.0,
|
| 212 |
-
"torchscript": false,
|
| 213 |
-
"typical_p": 1.0,
|
| 214 |
-
"use_bfloat16": false,
|
| 215 |
-
"use_cache": true,
|
| 216 |
-
"vocab_size": 151936
|
| 217 |
-
}
|
| 218 |
-
},
|
| 219 |
-
"transformers_version": "4.57.6"
|
| 220 |
-
}
|
| 221 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
lora/lora-stage1/generation_config.json
DELETED
|
@@ -1,7 +0,0 @@
|
|
| 1 |
-
{
|
| 2 |
-
"_from_model_config": true,
|
| 3 |
-
"eos_token_id": [151643,151645],
|
| 4 |
-
"pad_token_id": 151643,
|
| 5 |
-
"do_sample": false,
|
| 6 |
-
"temperature": 0.000001
|
| 7 |
-
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
lora/lora-stage1/merges.txt
DELETED
|
The diff for this file is too large to render.
See raw diff
|
|
|
lora/lora-stage1/optimizer.pt
DELETED
|
@@ -1,3 +0,0 @@
|
|
| 1 |
-
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:edb1941ff96a7dc9ec4447ad24fd82907e8053fa66090d9f91e1fe84d42fde4c
|
| 3 |
-
size 3014667
|
|
|
|
|
|
|
|
|
|
|
|
lora/lora-stage1/preprocessor_config.json
DELETED
|
@@ -1,14 +0,0 @@
|
|
| 1 |
-
{
|
| 2 |
-
"chunk_length": 30,
|
| 3 |
-
"dither": 0.0,
|
| 4 |
-
"feature_extractor_type": "WhisperFeatureExtractor",
|
| 5 |
-
"feature_size": 128,
|
| 6 |
-
"hop_length": 160,
|
| 7 |
-
"n_fft": 400,
|
| 8 |
-
"n_samples": 480000,
|
| 9 |
-
"nb_max_frames": 3000,
|
| 10 |
-
"padding_side": "right",
|
| 11 |
-
"padding_value": 0.0,
|
| 12 |
-
"processor_class": "Qwen3ASRProcessor",
|
| 13 |
-
"return_attention_mask": true
|
| 14 |
-
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
lora/lora-stage1/rng_state_0.pth
DELETED
|
@@ -1,3 +0,0 @@
|
|
| 1 |
-
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:916059f3d5e18a65741db0b5dc2209e8c6aad0736bace4b346dacc3a0ed5408c
|
| 3 |
-
size 14917
|
|
|
|
|
|
|
|
|
|
|
|
lora/lora-stage1/rng_state_1.pth
DELETED
|
@@ -1,3 +0,0 @@
|
|
| 1 |
-
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:ed259d907743b5e6197bd66f79158ba04a3fdb590d48d290ac086e406341e1de
|
| 3 |
-
size 14917
|
|
|
|
|
|
|
|
|
|
|
|
lora/lora-stage1/scheduler.pt
DELETED
|
@@ -1,3 +0,0 @@
|
|
| 1 |
-
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:e54a63f2bf7b963121e794c245cd0c84f1ea5bad1a8e2686e9c59fa50b56ee1e
|
| 3 |
-
size 1465
|
|
|
|
|
|
|
|
|
|
|
|
lora/lora-stage1/special_tokens_map.json
DELETED
|
@@ -1,44 +0,0 @@
|
|
| 1 |
-
{
|
| 2 |
-
"additional_special_tokens": [
|
| 3 |
-
"<|im_start|>",
|
| 4 |
-
"<|im_end|>",
|
| 5 |
-
"<|object_ref_start|>",
|
| 6 |
-
"<|object_ref_end|>",
|
| 7 |
-
"<|box_start|>",
|
| 8 |
-
"<|box_end|>",
|
| 9 |
-
"<|quad_start|>",
|
| 10 |
-
"<|quad_end|>",
|
| 11 |
-
"<|vision_start|>",
|
| 12 |
-
"<|vision_end|>",
|
| 13 |
-
"<|vision_pad|>",
|
| 14 |
-
"<|image_pad|>",
|
| 15 |
-
"<|video_pad|>",
|
| 16 |
-
"<|audio_start|>",
|
| 17 |
-
"<|audio_end|>",
|
| 18 |
-
"<tts_pad>",
|
| 19 |
-
"<tts_text_bos>",
|
| 20 |
-
"<tts_text_bos_single>",
|
| 21 |
-
"<|audio_pad|>"
|
| 22 |
-
],
|
| 23 |
-
"audio_bos_token": "<|audio_start|>",
|
| 24 |
-
"audio_eos_token": "<|audio_end|>",
|
| 25 |
-
"audio_token": "<|audio_pad|>",
|
| 26 |
-
"eos_token": {
|
| 27 |
-
"content": "<|im_end|>",
|
| 28 |
-
"lstrip": false,
|
| 29 |
-
"normalized": false,
|
| 30 |
-
"rstrip": false,
|
| 31 |
-
"single_word": false
|
| 32 |
-
},
|
| 33 |
-
"image_token": "<|image_pad|>",
|
| 34 |
-
"pad_token": {
|
| 35 |
-
"content": "<|endoftext|>",
|
| 36 |
-
"lstrip": false,
|
| 37 |
-
"normalized": false,
|
| 38 |
-
"rstrip": false,
|
| 39 |
-
"single_word": false
|
| 40 |
-
},
|
| 41 |
-
"video_token": "<|video_pad|>",
|
| 42 |
-
"vision_bos_token": "<|vision_start|>",
|
| 43 |
-
"vision_eos_token": "<|vision_end|>"
|
| 44 |
-
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
lora/lora-stage1/tokenizer.json
DELETED
|
@@ -1,3 +0,0 @@
|
|
| 1 |
-
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:0499602714160467f2d68b910651d6216020689f1e016be87a2d0019ee3baeab
|
| 3 |
-
size 11429499
|
|
|
|
|
|
|
|
|
|
|
|
lora/lora-stage1/tokenizer_config.json
DELETED
|
@@ -1,549 +0,0 @@
|
|
| 1 |
-
{
|
| 2 |
-
"add_bos_token": false,
|
| 3 |
-
"add_prefix_space": false,
|
| 4 |
-
"added_tokens_decoder": {
|
| 5 |
-
"151643": {
|
| 6 |
-
"content": "<|endoftext|>",
|
| 7 |
-
"lstrip": false,
|
| 8 |
-
"normalized": false,
|
| 9 |
-
"rstrip": false,
|
| 10 |
-
"single_word": false,
|
| 11 |
-
"special": true
|
| 12 |
-
},
|
| 13 |
-
"151644": {
|
| 14 |
-
"content": "<|im_start|>",
|
| 15 |
-
"lstrip": false,
|
| 16 |
-
"normalized": false,
|
| 17 |
-
"rstrip": false,
|
| 18 |
-
"single_word": false,
|
| 19 |
-
"special": true
|
| 20 |
-
},
|
| 21 |
-
"151645": {
|
| 22 |
-
"content": "<|im_end|>",
|
| 23 |
-
"lstrip": false,
|
| 24 |
-
"normalized": false,
|
| 25 |
-
"rstrip": false,
|
| 26 |
-
"single_word": false,
|
| 27 |
-
"special": true
|
| 28 |
-
},
|
| 29 |
-
"151646": {
|
| 30 |
-
"content": "<|object_ref_start|>",
|
| 31 |
-
"lstrip": false,
|
| 32 |
-
"normalized": false,
|
| 33 |
-
"rstrip": false,
|
| 34 |
-
"single_word": false,
|
| 35 |
-
"special": true
|
| 36 |
-
},
|
| 37 |
-
"151647": {
|
| 38 |
-
"content": "<|object_ref_end|>",
|
| 39 |
-
"lstrip": false,
|
| 40 |
-
"normalized": false,
|
| 41 |
-
"rstrip": false,
|
| 42 |
-
"single_word": false,
|
| 43 |
-
"special": true
|
| 44 |
-
},
|
| 45 |
-
"151648": {
|
| 46 |
-
"content": "<|box_start|>",
|
| 47 |
-
"lstrip": false,
|
| 48 |
-
"normalized": false,
|
| 49 |
-
"rstrip": false,
|
| 50 |
-
"single_word": false,
|
| 51 |
-
"special": true
|
| 52 |
-
},
|
| 53 |
-
"151649": {
|
| 54 |
-
"content": "<|box_end|>",
|
| 55 |
-
"lstrip": false,
|
| 56 |
-
"normalized": false,
|
| 57 |
-
"rstrip": false,
|
| 58 |
-
"single_word": false,
|
| 59 |
-
"special": true
|
| 60 |
-
},
|
| 61 |
-
"151650": {
|
| 62 |
-
"content": "<|quad_start|>",
|
| 63 |
-
"lstrip": false,
|
| 64 |
-
"normalized": false,
|
| 65 |
-
"rstrip": false,
|
| 66 |
-
"single_word": false,
|
| 67 |
-
"special": true
|
| 68 |
-
},
|
| 69 |
-
"151651": {
|
| 70 |
-
"content": "<|quad_end|>",
|
| 71 |
-
"lstrip": false,
|
| 72 |
-
"normalized": false,
|
| 73 |
-
"rstrip": false,
|
| 74 |
-
"single_word": false,
|
| 75 |
-
"special": true
|
| 76 |
-
},
|
| 77 |
-
"151652": {
|
| 78 |
-
"content": "<|vision_start|>",
|
| 79 |
-
"lstrip": false,
|
| 80 |
-
"normalized": false,
|
| 81 |
-
"rstrip": false,
|
| 82 |
-
"single_word": false,
|
| 83 |
-
"special": true
|
| 84 |
-
},
|
| 85 |
-
"151653": {
|
| 86 |
-
"content": "<|vision_end|>",
|
| 87 |
-
"lstrip": false,
|
| 88 |
-
"normalized": false,
|
| 89 |
-
"rstrip": false,
|
| 90 |
-
"single_word": false,
|
| 91 |
-
"special": true
|
| 92 |
-
},
|
| 93 |
-
"151654": {
|
| 94 |
-
"content": "<|vision_pad|>",
|
| 95 |
-
"lstrip": false,
|
| 96 |
-
"normalized": false,
|
| 97 |
-
"rstrip": false,
|
| 98 |
-
"single_word": false,
|
| 99 |
-
"special": true
|
| 100 |
-
},
|
| 101 |
-
"151655": {
|
| 102 |
-
"content": "<|image_pad|>",
|
| 103 |
-
"lstrip": false,
|
| 104 |
-
"normalized": false,
|
| 105 |
-
"rstrip": false,
|
| 106 |
-
"single_word": false,
|
| 107 |
-
"special": true
|
| 108 |
-
},
|
| 109 |
-
"151656": {
|
| 110 |
-
"content": "<|video_pad|>",
|
| 111 |
-
"lstrip": false,
|
| 112 |
-
"normalized": false,
|
| 113 |
-
"rstrip": false,
|
| 114 |
-
"single_word": false,
|
| 115 |
-
"special": true
|
| 116 |
-
},
|
| 117 |
-
"151657": {
|
| 118 |
-
"content": "<tool_call>",
|
| 119 |
-
"lstrip": false,
|
| 120 |
-
"normalized": false,
|
| 121 |
-
"rstrip": false,
|
| 122 |
-
"single_word": false,
|
| 123 |
-
"special": false
|
| 124 |
-
},
|
| 125 |
-
"151658": {
|
| 126 |
-
"content": "</tool_call>",
|
| 127 |
-
"lstrip": false,
|
| 128 |
-
"normalized": false,
|
| 129 |
-
"rstrip": false,
|
| 130 |
-
"single_word": false,
|
| 131 |
-
"special": false
|
| 132 |
-
},
|
| 133 |
-
"151659": {
|
| 134 |
-
"content": "<|fim_prefix|>",
|
| 135 |
-
"lstrip": false,
|
| 136 |
-
"normalized": false,
|
| 137 |
-
"rstrip": false,
|
| 138 |
-
"single_word": false,
|
| 139 |
-
"special": false
|
| 140 |
-
},
|
| 141 |
-
"151660": {
|
| 142 |
-
"content": "<|fim_middle|>",
|
| 143 |
-
"lstrip": false,
|
| 144 |
-
"normalized": false,
|
| 145 |
-
"rstrip": false,
|
| 146 |
-
"single_word": false,
|
| 147 |
-
"special": false
|
| 148 |
-
},
|
| 149 |
-
"151661": {
|
| 150 |
-
"content": "<|fim_suffix|>",
|
| 151 |
-
"lstrip": false,
|
| 152 |
-
"normalized": false,
|
| 153 |
-
"rstrip": false,
|
| 154 |
-
"single_word": false,
|
| 155 |
-
"special": false
|
| 156 |
-
},
|
| 157 |
-
"151662": {
|
| 158 |
-
"content": "<|fim_pad|>",
|
| 159 |
-
"lstrip": false,
|
| 160 |
-
"normalized": false,
|
| 161 |
-
"rstrip": false,
|
| 162 |
-
"single_word": false,
|
| 163 |
-
"special": false
|
| 164 |
-
},
|
| 165 |
-
"151663": {
|
| 166 |
-
"content": "<|repo_name|>",
|
| 167 |
-
"lstrip": false,
|
| 168 |
-
"normalized": false,
|
| 169 |
-
"rstrip": false,
|
| 170 |
-
"single_word": false,
|
| 171 |
-
"special": false
|
| 172 |
-
},
|
| 173 |
-
"151664": {
|
| 174 |
-
"content": "<|file_sep|>",
|
| 175 |
-
"lstrip": false,
|
| 176 |
-
"normalized": false,
|
| 177 |
-
"rstrip": false,
|
| 178 |
-
"single_word": false,
|
| 179 |
-
"special": false
|
| 180 |
-
},
|
| 181 |
-
"151665": {
|
| 182 |
-
"content": "<tool_response>",
|
| 183 |
-
"lstrip": false,
|
| 184 |
-
"normalized": false,
|
| 185 |
-
"rstrip": false,
|
| 186 |
-
"single_word": false,
|
| 187 |
-
"special": false
|
| 188 |
-
},
|
| 189 |
-
"151666": {
|
| 190 |
-
"content": "</tool_response>",
|
| 191 |
-
"lstrip": false,
|
| 192 |
-
"normalized": false,
|
| 193 |
-
"rstrip": false,
|
| 194 |
-
"single_word": false,
|
| 195 |
-
"special": false
|
| 196 |
-
},
|
| 197 |
-
"151667": {
|
| 198 |
-
"content": "<think>",
|
| 199 |
-
"lstrip": false,
|
| 200 |
-
"normalized": false,
|
| 201 |
-
"rstrip": false,
|
| 202 |
-
"single_word": false,
|
| 203 |
-
"special": false
|
| 204 |
-
},
|
| 205 |
-
"151668": {
|
| 206 |
-
"content": "</think>",
|
| 207 |
-
"lstrip": false,
|
| 208 |
-
"normalized": false,
|
| 209 |
-
"rstrip": false,
|
| 210 |
-
"single_word": false,
|
| 211 |
-
"special": false
|
| 212 |
-
},
|
| 213 |
-
"151669": {
|
| 214 |
-
"content": "<|audio_start|>",
|
| 215 |
-
"lstrip": false,
|
| 216 |
-
"normalized": false,
|
| 217 |
-
"rstrip": false,
|
| 218 |
-
"single_word": false,
|
| 219 |
-
"special": true
|
| 220 |
-
},
|
| 221 |
-
"151670": {
|
| 222 |
-
"content": "<|audio_end|>",
|
| 223 |
-
"lstrip": false,
|
| 224 |
-
"normalized": false,
|
| 225 |
-
"rstrip": false,
|
| 226 |
-
"single_word": false,
|
| 227 |
-
"special": true
|
| 228 |
-
},
|
| 229 |
-
"151671": {
|
| 230 |
-
"content": "<tts_pad>",
|
| 231 |
-
"lstrip": false,
|
| 232 |
-
"normalized": false,
|
| 233 |
-
"rstrip": false,
|
| 234 |
-
"single_word": false,
|
| 235 |
-
"special": true
|
| 236 |
-
},
|
| 237 |
-
"151672": {
|
| 238 |
-
"content": "<tts_text_bos>",
|
| 239 |
-
"lstrip": false,
|
| 240 |
-
"normalized": false,
|
| 241 |
-
"rstrip": false,
|
| 242 |
-
"single_word": false,
|
| 243 |
-
"special": true
|
| 244 |
-
},
|
| 245 |
-
"151673": {
|
| 246 |
-
"content": "<tts_text_eod>",
|
| 247 |
-
"lstrip": false,
|
| 248 |
-
"normalized": false,
|
| 249 |
-
"rstrip": false,
|
| 250 |
-
"single_word": false,
|
| 251 |
-
"special": true
|
| 252 |
-
},
|
| 253 |
-
"151674": {
|
| 254 |
-
"content": "<tts_text_bos_single>",
|
| 255 |
-
"lstrip": false,
|
| 256 |
-
"normalized": false,
|
| 257 |
-
"rstrip": false,
|
| 258 |
-
"single_word": false,
|
| 259 |
-
"special": true
|
| 260 |
-
},
|
| 261 |
-
"151675": {
|
| 262 |
-
"content": "<non_speech>",
|
| 263 |
-
"lstrip": false,
|
| 264 |
-
"normalized": false,
|
| 265 |
-
"rstrip": false,
|
| 266 |
-
"single_word": false,
|
| 267 |
-
"special": false
|
| 268 |
-
},
|
| 269 |
-
"151676": {
|
| 270 |
-
"content": "<|audio_pad|>",
|
| 271 |
-
"lstrip": false,
|
| 272 |
-
"normalized": false,
|
| 273 |
-
"rstrip": false,
|
| 274 |
-
"single_word": false,
|
| 275 |
-
"special": true
|
| 276 |
-
},
|
| 277 |
-
"151677": {
|
| 278 |
-
"content": "<blank1>",
|
| 279 |
-
"lstrip": false,
|
| 280 |
-
"normalized": false,
|
| 281 |
-
"rstrip": false,
|
| 282 |
-
"single_word": false,
|
| 283 |
-
"special": true
|
| 284 |
-
},
|
| 285 |
-
"151678": {
|
| 286 |
-
"content": "<blank2>",
|
| 287 |
-
"lstrip": false,
|
| 288 |
-
"normalized": false,
|
| 289 |
-
"rstrip": false,
|
| 290 |
-
"single_word": false,
|
| 291 |
-
"special": true
|
| 292 |
-
},
|
| 293 |
-
"151679": {
|
| 294 |
-
"content": "<blank3>",
|
| 295 |
-
"lstrip": false,
|
| 296 |
-
"normalized": false,
|
| 297 |
-
"rstrip": false,
|
| 298 |
-
"single_word": false,
|
| 299 |
-
"special": true
|
| 300 |
-
},
|
| 301 |
-
"151680": {
|
| 302 |
-
"content": "<blank4>",
|
| 303 |
-
"lstrip": false,
|
| 304 |
-
"normalized": false,
|
| 305 |
-
"rstrip": false,
|
| 306 |
-
"single_word": false,
|
| 307 |
-
"special": true
|
| 308 |
-
},
|
| 309 |
-
"151681": {
|
| 310 |
-
"content": "<blank5>",
|
| 311 |
-
"lstrip": false,
|
| 312 |
-
"normalized": false,
|
| 313 |
-
"rstrip": false,
|
| 314 |
-
"single_word": false,
|
| 315 |
-
"special": true
|
| 316 |
-
},
|
| 317 |
-
"151682": {
|
| 318 |
-
"content": "<blank6>",
|
| 319 |
-
"lstrip": false,
|
| 320 |
-
"normalized": false,
|
| 321 |
-
"rstrip": false,
|
| 322 |
-
"single_word": false,
|
| 323 |
-
"special": true
|
| 324 |
-
},
|
| 325 |
-
"151683": {
|
| 326 |
-
"content": "<blank7>",
|
| 327 |
-
"lstrip": false,
|
| 328 |
-
"normalized": false,
|
| 329 |
-
"rstrip": false,
|
| 330 |
-
"single_word": false,
|
| 331 |
-
"special": true
|
| 332 |
-
},
|
| 333 |
-
"151684": {
|
| 334 |
-
"content": "<blank8>",
|
| 335 |
-
"lstrip": false,
|
| 336 |
-
"normalized": false,
|
| 337 |
-
"rstrip": false,
|
| 338 |
-
"single_word": false,
|
| 339 |
-
"special": true
|
| 340 |
-
},
|
| 341 |
-
"151685": {
|
| 342 |
-
"content": "<blank9>",
|
| 343 |
-
"lstrip": false,
|
| 344 |
-
"normalized": false,
|
| 345 |
-
"rstrip": false,
|
| 346 |
-
"single_word": false,
|
| 347 |
-
"special": true
|
| 348 |
-
},
|
| 349 |
-
"151686": {
|
| 350 |
-
"content": "<blank10>",
|
| 351 |
-
"lstrip": false,
|
| 352 |
-
"normalized": false,
|
| 353 |
-
"rstrip": false,
|
| 354 |
-
"single_word": false,
|
| 355 |
-
"special": true
|
| 356 |
-
},
|
| 357 |
-
"151687": {
|
| 358 |
-
"content": "<blank11>",
|
| 359 |
-
"lstrip": false,
|
| 360 |
-
"normalized": false,
|
| 361 |
-
"rstrip": false,
|
| 362 |
-
"single_word": false,
|
| 363 |
-
"special": true
|
| 364 |
-
},
|
| 365 |
-
"151688": {
|
| 366 |
-
"content": "<blank12>",
|
| 367 |
-
"lstrip": false,
|
| 368 |
-
"normalized": false,
|
| 369 |
-
"rstrip": false,
|
| 370 |
-
"single_word": false,
|
| 371 |
-
"special": true
|
| 372 |
-
},
|
| 373 |
-
"151689": {
|
| 374 |
-
"content": "<blank13>",
|
| 375 |
-
"lstrip": false,
|
| 376 |
-
"normalized": false,
|
| 377 |
-
"rstrip": false,
|
| 378 |
-
"single_word": false,
|
| 379 |
-
"special": true
|
| 380 |
-
},
|
| 381 |
-
"151690": {
|
| 382 |
-
"content": "<blank14>",
|
| 383 |
-
"lstrip": false,
|
| 384 |
-
"normalized": false,
|
| 385 |
-
"rstrip": false,
|
| 386 |
-
"single_word": false,
|
| 387 |
-
"special": true
|
| 388 |
-
},
|
| 389 |
-
"151691": {
|
| 390 |
-
"content": "<blank15>",
|
| 391 |
-
"lstrip": false,
|
| 392 |
-
"normalized": false,
|
| 393 |
-
"rstrip": false,
|
| 394 |
-
"single_word": false,
|
| 395 |
-
"special": true
|
| 396 |
-
},
|
| 397 |
-
"151692": {
|
| 398 |
-
"content": "<blank16>",
|
| 399 |
-
"lstrip": false,
|
| 400 |
-
"normalized": false,
|
| 401 |
-
"rstrip": false,
|
| 402 |
-
"single_word": false,
|
| 403 |
-
"special": true
|
| 404 |
-
},
|
| 405 |
-
"151693": {
|
| 406 |
-
"content": "<blank17>",
|
| 407 |
-
"lstrip": false,
|
| 408 |
-
"normalized": false,
|
| 409 |
-
"rstrip": false,
|
| 410 |
-
"single_word": false,
|
| 411 |
-
"special": true
|
| 412 |
-
},
|
| 413 |
-
"151694": {
|
| 414 |
-
"content": "<blank18>",
|
| 415 |
-
"lstrip": false,
|
| 416 |
-
"normalized": false,
|
| 417 |
-
"rstrip": false,
|
| 418 |
-
"single_word": false,
|
| 419 |
-
"special": true
|
| 420 |
-
},
|
| 421 |
-
"151695": {
|
| 422 |
-
"content": "<blank19>",
|
| 423 |
-
"lstrip": false,
|
| 424 |
-
"normalized": false,
|
| 425 |
-
"rstrip": false,
|
| 426 |
-
"single_word": false,
|
| 427 |
-
"special": true
|
| 428 |
-
},
|
| 429 |
-
"151696": {
|
| 430 |
-
"content": "<blank20>",
|
| 431 |
-
"lstrip": false,
|
| 432 |
-
"normalized": false,
|
| 433 |
-
"rstrip": false,
|
| 434 |
-
"single_word": false,
|
| 435 |
-
"special": true
|
| 436 |
-
},
|
| 437 |
-
"151697": {
|
| 438 |
-
"content": "<blank21>",
|
| 439 |
-
"lstrip": false,
|
| 440 |
-
"normalized": false,
|
| 441 |
-
"rstrip": false,
|
| 442 |
-
"single_word": false,
|
| 443 |
-
"special": true
|
| 444 |
-
},
|
| 445 |
-
"151698": {
|
| 446 |
-
"content": "<blank22>",
|
| 447 |
-
"lstrip": false,
|
| 448 |
-
"normalized": false,
|
| 449 |
-
"rstrip": false,
|
| 450 |
-
"single_word": false,
|
| 451 |
-
"special": true
|
| 452 |
-
},
|
| 453 |
-
"151699": {
|
| 454 |
-
"content": "<blank23>",
|
| 455 |
-
"lstrip": false,
|
| 456 |
-
"normalized": false,
|
| 457 |
-
"rstrip": false,
|
| 458 |
-
"single_word": false,
|
| 459 |
-
"special": true
|
| 460 |
-
},
|
| 461 |
-
"151700": {
|
| 462 |
-
"content": "<blank24>",
|
| 463 |
-
"lstrip": false,
|
| 464 |
-
"normalized": false,
|
| 465 |
-
"rstrip": false,
|
| 466 |
-
"single_word": false,
|
| 467 |
-
"special": true
|
| 468 |
-
},
|
| 469 |
-
"151701": {
|
| 470 |
-
"content": "<blank25>",
|
| 471 |
-
"lstrip": false,
|
| 472 |
-
"normalized": false,
|
| 473 |
-
"rstrip": false,
|
| 474 |
-
"single_word": false,
|
| 475 |
-
"special": true
|
| 476 |
-
},
|
| 477 |
-
"151702": {
|
| 478 |
-
"content": "<blank26>",
|
| 479 |
-
"lstrip": false,
|
| 480 |
-
"normalized": false,
|
| 481 |
-
"rstrip": false,
|
| 482 |
-
"single_word": false,
|
| 483 |
-
"special": true
|
| 484 |
-
},
|
| 485 |
-
"151703": {
|
| 486 |
-
"content": "<blank27>",
|
| 487 |
-
"lstrip": false,
|
| 488 |
-
"normalized": false,
|
| 489 |
-
"rstrip": false,
|
| 490 |
-
"single_word": false,
|
| 491 |
-
"special": true
|
| 492 |
-
},
|
| 493 |
-
"151704": {
|
| 494 |
-
"content": "<asr_text>",
|
| 495 |
-
"lstrip": false,
|
| 496 |
-
"normalized": false,
|
| 497 |
-
"rstrip": false,
|
| 498 |
-
"single_word": false,
|
| 499 |
-
"special": false
|
| 500 |
-
}
|
| 501 |
-
},
|
| 502 |
-
"additional_special_tokens": [
|
| 503 |
-
"<|im_start|>",
|
| 504 |
-
"<|im_end|>",
|
| 505 |
-
"<|object_ref_start|>",
|
| 506 |
-
"<|object_ref_end|>",
|
| 507 |
-
"<|box_start|>",
|
| 508 |
-
"<|box_end|>",
|
| 509 |
-
"<|quad_start|>",
|
| 510 |
-
"<|quad_end|>",
|
| 511 |
-
"<|vision_start|>",
|
| 512 |
-
"<|vision_end|>",
|
| 513 |
-
"<|vision_pad|>",
|
| 514 |
-
"<|image_pad|>",
|
| 515 |
-
"<|video_pad|>",
|
| 516 |
-
"<|audio_start|>",
|
| 517 |
-
"<|audio_end|>",
|
| 518 |
-
"<tts_pad>",
|
| 519 |
-
"<tts_text_bos>",
|
| 520 |
-
"<tts_text_bos_single>",
|
| 521 |
-
"<|audio_pad|>"
|
| 522 |
-
],
|
| 523 |
-
"audio_bos_token": "<|audio_start|>",
|
| 524 |
-
"audio_eos_token": "<|audio_end|>",
|
| 525 |
-
"audio_token": "<|audio_pad|>",
|
| 526 |
-
"bos_token": null,
|
| 527 |
-
"clean_up_tokenization_spaces": false,
|
| 528 |
-
"eos_token": "<|im_end|>",
|
| 529 |
-
"errors": "replace",
|
| 530 |
-
"extra_special_tokens": {
|
| 531 |
-
"audio_bos_token": "<|audio_start|>",
|
| 532 |
-
"audio_eos_token": "<|audio_end|>",
|
| 533 |
-
"audio_token": "<|audio_pad|>",
|
| 534 |
-
"image_token": "<|image_pad|>",
|
| 535 |
-
"video_token": "<|video_pad|>",
|
| 536 |
-
"vision_bos_token": "<|vision_start|>",
|
| 537 |
-
"vision_eos_token": "<|vision_end|>"
|
| 538 |
-
},
|
| 539 |
-
"image_token": "<|image_pad|>",
|
| 540 |
-
"model_max_length": 131072,
|
| 541 |
-
"pad_token": "<|endoftext|>",
|
| 542 |
-
"processor_class": "Qwen3ASRProcessor",
|
| 543 |
-
"split_special_tokens": false,
|
| 544 |
-
"tokenizer_class": "Qwen2Tokenizer",
|
| 545 |
-
"unk_token": null,
|
| 546 |
-
"video_token": "<|video_pad|>",
|
| 547 |
-
"vision_bos_token": "<|vision_start|>",
|
| 548 |
-
"vision_eos_token": "<|vision_end|>"
|
| 549 |
-
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
lora/lora-stage1/trainer_state.json
DELETED
|
@@ -1,774 +0,0 @@
|
|
| 1 |
-
{
|
| 2 |
-
"best_global_step": null,
|
| 3 |
-
"best_metric": null,
|
| 4 |
-
"best_model_checkpoint": null,
|
| 5 |
-
"epoch": 0.29335191228777824,
|
| 6 |
-
"eval_steps": 200,
|
| 7 |
-
"global_step": 1000,
|
| 8 |
-
"is_hyper_param_search": false,
|
| 9 |
-
"is_local_process_zero": true,
|
| 10 |
-
"is_world_process_zero": true,
|
| 11 |
-
"log_history": [
|
| 12 |
-
{
|
| 13 |
-
"epoch": 0.002933519122877782,
|
| 14 |
-
"grad_norm": 31.058988571166992,
|
| 15 |
-
"learning_rate": 2.6392961876832844e-08,
|
| 16 |
-
"loss": 222.2233,
|
| 17 |
-
"step": 10
|
| 18 |
-
},
|
| 19 |
-
{
|
| 20 |
-
"epoch": 0.005867038245755564,
|
| 21 |
-
"grad_norm": 29.318532943725586,
|
| 22 |
-
"learning_rate": 5.571847507331378e-08,
|
| 23 |
-
"loss": 223.4508,
|
| 24 |
-
"step": 20
|
| 25 |
-
},
|
| 26 |
-
{
|
| 27 |
-
"epoch": 0.008800557368633347,
|
| 28 |
-
"grad_norm": 31.036029815673828,
|
| 29 |
-
"learning_rate": 8.504398826979471e-08,
|
| 30 |
-
"loss": 223.5497,
|
| 31 |
-
"step": 30
|
| 32 |
-
},
|
| 33 |
-
{
|
| 34 |
-
"epoch": 0.011734076491511128,
|
| 35 |
-
"grad_norm": 31.80939483642578,
|
| 36 |
-
"learning_rate": 1.1436950146627565e-07,
|
| 37 |
-
"loss": 218.2694,
|
| 38 |
-
"step": 40
|
| 39 |
-
},
|
| 40 |
-
{
|
| 41 |
-
"epoch": 0.014667595614388912,
|
| 42 |
-
"grad_norm": 32.80522918701172,
|
| 43 |
-
"learning_rate": 1.436950146627566e-07,
|
| 44 |
-
"loss": 219.2423,
|
| 45 |
-
"step": 50
|
| 46 |
-
},
|
| 47 |
-
{
|
| 48 |
-
"epoch": 0.017601114737266693,
|
| 49 |
-
"grad_norm": 32.718772888183594,
|
| 50 |
-
"learning_rate": 1.7302052785923753e-07,
|
| 51 |
-
"loss": 224.9209,
|
| 52 |
-
"step": 60
|
| 53 |
-
},
|
| 54 |
-
{
|
| 55 |
-
"epoch": 0.020534633860144477,
|
| 56 |
-
"grad_norm": 30.853660583496094,
|
| 57 |
-
"learning_rate": 2.0234604105571846e-07,
|
| 58 |
-
"loss": 220.9806,
|
| 59 |
-
"step": 70
|
| 60 |
-
},
|
| 61 |
-
{
|
| 62 |
-
"epoch": 0.023468152983022256,
|
| 63 |
-
"grad_norm": 31.83987045288086,
|
| 64 |
-
"learning_rate": 2.3167155425219938e-07,
|
| 65 |
-
"loss": 221.7758,
|
| 66 |
-
"step": 80
|
| 67 |
-
},
|
| 68 |
-
{
|
| 69 |
-
"epoch": 0.02640167210590004,
|
| 70 |
-
"grad_norm": 33.82211685180664,
|
| 71 |
-
"learning_rate": 2.609970674486803e-07,
|
| 72 |
-
"loss": 220.2104,
|
| 73 |
-
"step": 90
|
| 74 |
-
},
|
| 75 |
-
{
|
| 76 |
-
"epoch": 0.029335191228777823,
|
| 77 |
-
"grad_norm": 39.342655181884766,
|
| 78 |
-
"learning_rate": 2.903225806451613e-07,
|
| 79 |
-
"loss": 223.5162,
|
| 80 |
-
"step": 100
|
| 81 |
-
},
|
| 82 |
-
{
|
| 83 |
-
"epoch": 0.032268710351655606,
|
| 84 |
-
"grad_norm": 32.44097900390625,
|
| 85 |
-
"learning_rate": 3.196480938416422e-07,
|
| 86 |
-
"loss": 222.4887,
|
| 87 |
-
"step": 110
|
| 88 |
-
},
|
| 89 |
-
{
|
| 90 |
-
"epoch": 0.035202229474533386,
|
| 91 |
-
"grad_norm": 30.906185150146484,
|
| 92 |
-
"learning_rate": 3.489736070381232e-07,
|
| 93 |
-
"loss": 221.0162,
|
| 94 |
-
"step": 120
|
| 95 |
-
},
|
| 96 |
-
{
|
| 97 |
-
"epoch": 0.038135748597411166,
|
| 98 |
-
"grad_norm": 30.318588256835938,
|
| 99 |
-
"learning_rate": 3.7829912023460407e-07,
|
| 100 |
-
"loss": 219.1895,
|
| 101 |
-
"step": 130
|
| 102 |
-
},
|
| 103 |
-
{
|
| 104 |
-
"epoch": 0.04106926772028895,
|
| 105 |
-
"grad_norm": 33.13260269165039,
|
| 106 |
-
"learning_rate": 4.0762463343108505e-07,
|
| 107 |
-
"loss": 219.1354,
|
| 108 |
-
"step": 140
|
| 109 |
-
},
|
| 110 |
-
{
|
| 111 |
-
"epoch": 0.04400278684316673,
|
| 112 |
-
"grad_norm": 32.98201370239258,
|
| 113 |
-
"learning_rate": 4.36950146627566e-07,
|
| 114 |
-
"loss": 221.9035,
|
| 115 |
-
"step": 150
|
| 116 |
-
},
|
| 117 |
-
{
|
| 118 |
-
"epoch": 0.04693630596604451,
|
| 119 |
-
"grad_norm": 30.733919143676758,
|
| 120 |
-
"learning_rate": 4.6627565982404685e-07,
|
| 121 |
-
"loss": 219.0109,
|
| 122 |
-
"step": 160
|
| 123 |
-
},
|
| 124 |
-
{
|
| 125 |
-
"epoch": 0.0498698250889223,
|
| 126 |
-
"grad_norm": 35.68417739868164,
|
| 127 |
-
"learning_rate": 4.956011730205278e-07,
|
| 128 |
-
"loss": 222.1004,
|
| 129 |
-
"step": 170
|
| 130 |
-
},
|
| 131 |
-
{
|
| 132 |
-
"epoch": 0.05280334421180008,
|
| 133 |
-
"grad_norm": 34.876121520996094,
|
| 134 |
-
"learning_rate": 5.249266862170088e-07,
|
| 135 |
-
"loss": 220.137,
|
| 136 |
-
"step": 180
|
| 137 |
-
},
|
| 138 |
-
{
|
| 139 |
-
"epoch": 0.055736863334677866,
|
| 140 |
-
"grad_norm": 33.82151412963867,
|
| 141 |
-
"learning_rate": 5.542521994134897e-07,
|
| 142 |
-
"loss": 224.7452,
|
| 143 |
-
"step": 190
|
| 144 |
-
},
|
| 145 |
-
{
|
| 146 |
-
"epoch": 0.058670382457555646,
|
| 147 |
-
"grad_norm": 36.70476531982422,
|
| 148 |
-
"learning_rate": 5.835777126099707e-07,
|
| 149 |
-
"loss": 219.8298,
|
| 150 |
-
"step": 200
|
| 151 |
-
},
|
| 152 |
-
{
|
| 153 |
-
"epoch": 0.058670382457555646,
|
| 154 |
-
"eval_loss": 24.500732421875,
|
| 155 |
-
"eval_runtime": 98.9198,
|
| 156 |
-
"eval_samples_per_second": 98.019,
|
| 157 |
-
"eval_steps_per_second": 6.126,
|
| 158 |
-
"step": 200
|
| 159 |
-
},
|
| 160 |
-
{
|
| 161 |
-
"epoch": 0.061603901580433426,
|
| 162 |
-
"grad_norm": 34.49006652832031,
|
| 163 |
-
"learning_rate": 6.129032258064516e-07,
|
| 164 |
-
"loss": 223.5638,
|
| 165 |
-
"step": 210
|
| 166 |
-
},
|
| 167 |
-
{
|
| 168 |
-
"epoch": 0.06453742070331121,
|
| 169 |
-
"grad_norm": 32.312313079833984,
|
| 170 |
-
"learning_rate": 6.422287390029325e-07,
|
| 171 |
-
"loss": 225.3921,
|
| 172 |
-
"step": 220
|
| 173 |
-
},
|
| 174 |
-
{
|
| 175 |
-
"epoch": 0.06747093982618899,
|
| 176 |
-
"grad_norm": 33.46302032470703,
|
| 177 |
-
"learning_rate": 6.715542521994134e-07,
|
| 178 |
-
"loss": 219.3619,
|
| 179 |
-
"step": 230
|
| 180 |
-
},
|
| 181 |
-
{
|
| 182 |
-
"epoch": 0.07040445894906677,
|
| 183 |
-
"grad_norm": 47.695858001708984,
|
| 184 |
-
"learning_rate": 7.008797653958944e-07,
|
| 185 |
-
"loss": 221.6162,
|
| 186 |
-
"step": 240
|
| 187 |
-
},
|
| 188 |
-
{
|
| 189 |
-
"epoch": 0.07333797807194456,
|
| 190 |
-
"grad_norm": 36.99955368041992,
|
| 191 |
-
"learning_rate": 7.302052785923753e-07,
|
| 192 |
-
"loss": 224.4357,
|
| 193 |
-
"step": 250
|
| 194 |
-
},
|
| 195 |
-
{
|
| 196 |
-
"epoch": 0.07627149719482233,
|
| 197 |
-
"grad_norm": 33.713096618652344,
|
| 198 |
-
"learning_rate": 7.595307917888563e-07,
|
| 199 |
-
"loss": 218.6644,
|
| 200 |
-
"step": 260
|
| 201 |
-
},
|
| 202 |
-
{
|
| 203 |
-
"epoch": 0.07920501631770012,
|
| 204 |
-
"grad_norm": 36.349666595458984,
|
| 205 |
-
"learning_rate": 7.888563049853372e-07,
|
| 206 |
-
"loss": 221.4383,
|
| 207 |
-
"step": 270
|
| 208 |
-
},
|
| 209 |
-
{
|
| 210 |
-
"epoch": 0.0821385354405779,
|
| 211 |
-
"grad_norm": 36.67658615112305,
|
| 212 |
-
"learning_rate": 8.181818181818182e-07,
|
| 213 |
-
"loss": 221.6365,
|
| 214 |
-
"step": 280
|
| 215 |
-
},
|
| 216 |
-
{
|
| 217 |
-
"epoch": 0.08507205456345568,
|
| 218 |
-
"grad_norm": 31.31206512451172,
|
| 219 |
-
"learning_rate": 8.475073313782992e-07,
|
| 220 |
-
"loss": 219.8238,
|
| 221 |
-
"step": 290
|
| 222 |
-
},
|
| 223 |
-
{
|
| 224 |
-
"epoch": 0.08800557368633347,
|
| 225 |
-
"grad_norm": 33.81391525268555,
|
| 226 |
-
"learning_rate": 8.7683284457478e-07,
|
| 227 |
-
"loss": 222.3335,
|
| 228 |
-
"step": 300
|
| 229 |
-
},
|
| 230 |
-
{
|
| 231 |
-
"epoch": 0.09093909280921125,
|
| 232 |
-
"grad_norm": 39.456138610839844,
|
| 233 |
-
"learning_rate": 9.061583577712609e-07,
|
| 234 |
-
"loss": 225.0574,
|
| 235 |
-
"step": 310
|
| 236 |
-
},
|
| 237 |
-
{
|
| 238 |
-
"epoch": 0.09387261193208903,
|
| 239 |
-
"grad_norm": 62.84433364868164,
|
| 240 |
-
"learning_rate": 9.354838709677418e-07,
|
| 241 |
-
"loss": 222.3193,
|
| 242 |
-
"step": 320
|
| 243 |
-
},
|
| 244 |
-
{
|
| 245 |
-
"epoch": 0.09680613105496681,
|
| 246 |
-
"grad_norm": 37.60541915893555,
|
| 247 |
-
"learning_rate": 9.648093841642228e-07,
|
| 248 |
-
"loss": 215.4717,
|
| 249 |
-
"step": 330
|
| 250 |
-
},
|
| 251 |
-
{
|
| 252 |
-
"epoch": 0.0997396501778446,
|
| 253 |
-
"grad_norm": 42.61164855957031,
|
| 254 |
-
"learning_rate": 9.941348973607037e-07,
|
| 255 |
-
"loss": 220.7702,
|
| 256 |
-
"step": 340
|
| 257 |
-
},
|
| 258 |
-
{
|
| 259 |
-
"epoch": 0.10267316930072237,
|
| 260 |
-
"grad_norm": 41.35678482055664,
|
| 261 |
-
"learning_rate": 9.987648602748184e-07,
|
| 262 |
-
"loss": 221.7166,
|
| 263 |
-
"step": 350
|
| 264 |
-
},
|
| 265 |
-
{
|
| 266 |
-
"epoch": 0.10560668842360016,
|
| 267 |
-
"grad_norm": 41.287208557128906,
|
| 268 |
-
"learning_rate": 9.972209356183417e-07,
|
| 269 |
-
"loss": 222.1618,
|
| 270 |
-
"step": 360
|
| 271 |
-
},
|
| 272 |
-
{
|
| 273 |
-
"epoch": 0.10854020754647795,
|
| 274 |
-
"grad_norm": 54.5716667175293,
|
| 275 |
-
"learning_rate": 9.956770109618649e-07,
|
| 276 |
-
"loss": 221.5792,
|
| 277 |
-
"step": 370
|
| 278 |
-
},
|
| 279 |
-
{
|
| 280 |
-
"epoch": 0.11147372666935573,
|
| 281 |
-
"grad_norm": 40.734012603759766,
|
| 282 |
-
"learning_rate": 9.941330863053883e-07,
|
| 283 |
-
"loss": 219.901,
|
| 284 |
-
"step": 380
|
| 285 |
-
},
|
| 286 |
-
{
|
| 287 |
-
"epoch": 0.1144072457922335,
|
| 288 |
-
"grad_norm": 43.457218170166016,
|
| 289 |
-
"learning_rate": 9.925891616489115e-07,
|
| 290 |
-
"loss": 223.7378,
|
| 291 |
-
"step": 390
|
| 292 |
-
},
|
| 293 |
-
{
|
| 294 |
-
"epoch": 0.11734076491511129,
|
| 295 |
-
"grad_norm": 42.917686462402344,
|
| 296 |
-
"learning_rate": 9.910452369924347e-07,
|
| 297 |
-
"loss": 222.8944,
|
| 298 |
-
"step": 400
|
| 299 |
-
},
|
| 300 |
-
{
|
| 301 |
-
"epoch": 0.11734076491511129,
|
| 302 |
-
"eval_loss": 24.425460815429688,
|
| 303 |
-
"eval_runtime": 94.8923,
|
| 304 |
-
"eval_samples_per_second": 102.179,
|
| 305 |
-
"eval_steps_per_second": 6.386,
|
| 306 |
-
"step": 400
|
| 307 |
-
},
|
| 308 |
-
{
|
| 309 |
-
"epoch": 0.12027428403798908,
|
| 310 |
-
"grad_norm": 39.965293884277344,
|
| 311 |
-
"learning_rate": 9.89501312335958e-07,
|
| 312 |
-
"loss": 220.1472,
|
| 313 |
-
"step": 410
|
| 314 |
-
},
|
| 315 |
-
{
|
| 316 |
-
"epoch": 0.12320780316086685,
|
| 317 |
-
"grad_norm": 45.19244384765625,
|
| 318 |
-
"learning_rate": 9.879573876794812e-07,
|
| 319 |
-
"loss": 224.2056,
|
| 320 |
-
"step": 420
|
| 321 |
-
},
|
| 322 |
-
{
|
| 323 |
-
"epoch": 0.12614132228374464,
|
| 324 |
-
"grad_norm": 41.27251434326172,
|
| 325 |
-
"learning_rate": 9.864134630230044e-07,
|
| 326 |
-
"loss": 217.4446,
|
| 327 |
-
"step": 430
|
| 328 |
-
},
|
| 329 |
-
{
|
| 330 |
-
"epoch": 0.12907484140662243,
|
| 331 |
-
"grad_norm": 49.71922302246094,
|
| 332 |
-
"learning_rate": 9.848695383665276e-07,
|
| 333 |
-
"loss": 220.2578,
|
| 334 |
-
"step": 440
|
| 335 |
-
},
|
| 336 |
-
{
|
| 337 |
-
"epoch": 0.1320083605295002,
|
| 338 |
-
"grad_norm": 65.56668853759766,
|
| 339 |
-
"learning_rate": 9.833256137100508e-07,
|
| 340 |
-
"loss": 221.2077,
|
| 341 |
-
"step": 450
|
| 342 |
-
},
|
| 343 |
-
{
|
| 344 |
-
"epoch": 0.13494187965237797,
|
| 345 |
-
"grad_norm": 41.73335266113281,
|
| 346 |
-
"learning_rate": 9.817816890535742e-07,
|
| 347 |
-
"loss": 219.8,
|
| 348 |
-
"step": 460
|
| 349 |
-
},
|
| 350 |
-
{
|
| 351 |
-
"epoch": 0.13787539877525576,
|
| 352 |
-
"grad_norm": 51.275718688964844,
|
| 353 |
-
"learning_rate": 9.802377643970974e-07,
|
| 354 |
-
"loss": 221.3817,
|
| 355 |
-
"step": 470
|
| 356 |
-
},
|
| 357 |
-
{
|
| 358 |
-
"epoch": 0.14080891789813355,
|
| 359 |
-
"grad_norm": 55.4876823425293,
|
| 360 |
-
"learning_rate": 9.786938397406207e-07,
|
| 361 |
-
"loss": 216.4269,
|
| 362 |
-
"step": 480
|
| 363 |
-
},
|
| 364 |
-
{
|
| 365 |
-
"epoch": 0.14374243702101133,
|
| 366 |
-
"grad_norm": 55.99393844604492,
|
| 367 |
-
"learning_rate": 9.771499150841439e-07,
|
| 368 |
-
"loss": 218.8694,
|
| 369 |
-
"step": 490
|
| 370 |
-
},
|
| 371 |
-
{
|
| 372 |
-
"epoch": 0.14667595614388912,
|
| 373 |
-
"grad_norm": 95.5741958618164,
|
| 374 |
-
"learning_rate": 9.75605990427667e-07,
|
| 375 |
-
"loss": 221.4839,
|
| 376 |
-
"step": 500
|
| 377 |
-
},
|
| 378 |
-
{
|
| 379 |
-
"epoch": 0.1496094752667669,
|
| 380 |
-
"grad_norm": 49.25442886352539,
|
| 381 |
-
"learning_rate": 9.740620657711903e-07,
|
| 382 |
-
"loss": 222.9515,
|
| 383 |
-
"step": 510
|
| 384 |
-
},
|
| 385 |
-
{
|
| 386 |
-
"epoch": 0.15254299438964466,
|
| 387 |
-
"grad_norm": 50.05457305908203,
|
| 388 |
-
"learning_rate": 9.725181411147135e-07,
|
| 389 |
-
"loss": 218.0743,
|
| 390 |
-
"step": 520
|
| 391 |
-
},
|
| 392 |
-
{
|
| 393 |
-
"epoch": 0.15547651351252245,
|
| 394 |
-
"grad_norm": 43.44709777832031,
|
| 395 |
-
"learning_rate": 9.709742164582367e-07,
|
| 396 |
-
"loss": 218.6208,
|
| 397 |
-
"step": 530
|
| 398 |
-
},
|
| 399 |
-
{
|
| 400 |
-
"epoch": 0.15841003263540024,
|
| 401 |
-
"grad_norm": 66.39103698730469,
|
| 402 |
-
"learning_rate": 9.694302918017602e-07,
|
| 403 |
-
"loss": 219.6833,
|
| 404 |
-
"step": 540
|
| 405 |
-
},
|
| 406 |
-
{
|
| 407 |
-
"epoch": 0.16134355175827803,
|
| 408 |
-
"grad_norm": 54.72968292236328,
|
| 409 |
-
"learning_rate": 9.678863671452832e-07,
|
| 410 |
-
"loss": 221.8852,
|
| 411 |
-
"step": 550
|
| 412 |
-
},
|
| 413 |
-
{
|
| 414 |
-
"epoch": 0.1642770708811558,
|
| 415 |
-
"grad_norm": 65.26374816894531,
|
| 416 |
-
"learning_rate": 9.663424424888064e-07,
|
| 417 |
-
"loss": 219.7626,
|
| 418 |
-
"step": 560
|
| 419 |
-
},
|
| 420 |
-
{
|
| 421 |
-
"epoch": 0.1672105900040336,
|
| 422 |
-
"grad_norm": 60.0925178527832,
|
| 423 |
-
"learning_rate": 9.647985178323296e-07,
|
| 424 |
-
"loss": 217.8218,
|
| 425 |
-
"step": 570
|
| 426 |
-
},
|
| 427 |
-
{
|
| 428 |
-
"epoch": 0.17014410912691136,
|
| 429 |
-
"grad_norm": 47.97535705566406,
|
| 430 |
-
"learning_rate": 9.63254593175853e-07,
|
| 431 |
-
"loss": 217.9315,
|
| 432 |
-
"step": 580
|
| 433 |
-
},
|
| 434 |
-
{
|
| 435 |
-
"epoch": 0.17307762824978914,
|
| 436 |
-
"grad_norm": 53.61656951904297,
|
| 437 |
-
"learning_rate": 9.617106685193762e-07,
|
| 438 |
-
"loss": 219.2269,
|
| 439 |
-
"step": 590
|
| 440 |
-
},
|
| 441 |
-
{
|
| 442 |
-
"epoch": 0.17601114737266693,
|
| 443 |
-
"grad_norm": 52.75293731689453,
|
| 444 |
-
"learning_rate": 9.601667438628995e-07,
|
| 445 |
-
"loss": 216.867,
|
| 446 |
-
"step": 600
|
| 447 |
-
},
|
| 448 |
-
{
|
| 449 |
-
"epoch": 0.17601114737266693,
|
| 450 |
-
"eval_loss": 24.240764617919922,
|
| 451 |
-
"eval_runtime": 97.5766,
|
| 452 |
-
"eval_samples_per_second": 99.368,
|
| 453 |
-
"eval_steps_per_second": 6.211,
|
| 454 |
-
"step": 600
|
| 455 |
-
},
|
| 456 |
-
{
|
| 457 |
-
"epoch": 0.17894466649554472,
|
| 458 |
-
"grad_norm": 59.573219299316406,
|
| 459 |
-
"learning_rate": 9.586228192064227e-07,
|
| 460 |
-
"loss": 213.2538,
|
| 461 |
-
"step": 610
|
| 462 |
-
},
|
| 463 |
-
{
|
| 464 |
-
"epoch": 0.1818781856184225,
|
| 465 |
-
"grad_norm": 113.46548461914062,
|
| 466 |
-
"learning_rate": 9.570788945499459e-07,
|
| 467 |
-
"loss": 218.0255,
|
| 468 |
-
"step": 620
|
| 469 |
-
},
|
| 470 |
-
{
|
| 471 |
-
"epoch": 0.1848117047413003,
|
| 472 |
-
"grad_norm": 119.12982177734375,
|
| 473 |
-
"learning_rate": 9.55534969893469e-07,
|
| 474 |
-
"loss": 216.9313,
|
| 475 |
-
"step": 630
|
| 476 |
-
},
|
| 477 |
-
{
|
| 478 |
-
"epoch": 0.18774522386417805,
|
| 479 |
-
"grad_norm": 54.008338928222656,
|
| 480 |
-
"learning_rate": 9.539910452369923e-07,
|
| 481 |
-
"loss": 220.8365,
|
| 482 |
-
"step": 640
|
| 483 |
-
},
|
| 484 |
-
{
|
| 485 |
-
"epoch": 0.19067874298705584,
|
| 486 |
-
"grad_norm": 59.56270217895508,
|
| 487 |
-
"learning_rate": 9.524471205805155e-07,
|
| 488 |
-
"loss": 218.8136,
|
| 489 |
-
"step": 650
|
| 490 |
-
},
|
| 491 |
-
{
|
| 492 |
-
"epoch": 0.19361226210993362,
|
| 493 |
-
"grad_norm": 52.067115783691406,
|
| 494 |
-
"learning_rate": 9.509031959240389e-07,
|
| 495 |
-
"loss": 220.6164,
|
| 496 |
-
"step": 660
|
| 497 |
-
},
|
| 498 |
-
{
|
| 499 |
-
"epoch": 0.1965457812328114,
|
| 500 |
-
"grad_norm": 60.61309051513672,
|
| 501 |
-
"learning_rate": 9.493592712675621e-07,
|
| 502 |
-
"loss": 217.9881,
|
| 503 |
-
"step": 670
|
| 504 |
-
},
|
| 505 |
-
{
|
| 506 |
-
"epoch": 0.1994793003556892,
|
| 507 |
-
"grad_norm": 49.88456726074219,
|
| 508 |
-
"learning_rate": 9.478153466110853e-07,
|
| 509 |
-
"loss": 217.0137,
|
| 510 |
-
"step": 680
|
| 511 |
-
},
|
| 512 |
-
{
|
| 513 |
-
"epoch": 0.20241281947856699,
|
| 514 |
-
"grad_norm": 49.28492736816406,
|
| 515 |
-
"learning_rate": 9.462714219546085e-07,
|
| 516 |
-
"loss": 212.2388,
|
| 517 |
-
"step": 690
|
| 518 |
-
},
|
| 519 |
-
{
|
| 520 |
-
"epoch": 0.20534633860144474,
|
| 521 |
-
"grad_norm": 55.44947814941406,
|
| 522 |
-
"learning_rate": 9.447274972981318e-07,
|
| 523 |
-
"loss": 221.7097,
|
| 524 |
-
"step": 700
|
| 525 |
-
},
|
| 526 |
-
{
|
| 527 |
-
"epoch": 0.20827985772432253,
|
| 528 |
-
"grad_norm": 47.7352409362793,
|
| 529 |
-
"learning_rate": 9.43183572641655e-07,
|
| 530 |
-
"loss": 217.8991,
|
| 531 |
-
"step": 710
|
| 532 |
-
},
|
| 533 |
-
{
|
| 534 |
-
"epoch": 0.21121337684720032,
|
| 535 |
-
"grad_norm": 56.91552734375,
|
| 536 |
-
"learning_rate": 9.416396479851782e-07,
|
| 537 |
-
"loss": 216.618,
|
| 538 |
-
"step": 720
|
| 539 |
-
},
|
| 540 |
-
{
|
| 541 |
-
"epoch": 0.2141468959700781,
|
| 542 |
-
"grad_norm": 50.68717575073242,
|
| 543 |
-
"learning_rate": 9.400957233287015e-07,
|
| 544 |
-
"loss": 217.7346,
|
| 545 |
-
"step": 730
|
| 546 |
-
},
|
| 547 |
-
{
|
| 548 |
-
"epoch": 0.2170804150929559,
|
| 549 |
-
"grad_norm": 75.52225494384766,
|
| 550 |
-
"learning_rate": 9.385517986722248e-07,
|
| 551 |
-
"loss": 215.9344,
|
| 552 |
-
"step": 740
|
| 553 |
-
},
|
| 554 |
-
{
|
| 555 |
-
"epoch": 0.22001393421583368,
|
| 556 |
-
"grad_norm": 74.4793472290039,
|
| 557 |
-
"learning_rate": 9.37007874015748e-07,
|
| 558 |
-
"loss": 222.193,
|
| 559 |
-
"step": 750
|
| 560 |
-
},
|
| 561 |
-
{
|
| 562 |
-
"epoch": 0.22294745333871147,
|
| 563 |
-
"grad_norm": 58.30630111694336,
|
| 564 |
-
"learning_rate": 9.354639493592712e-07,
|
| 565 |
-
"loss": 215.5639,
|
| 566 |
-
"step": 760
|
| 567 |
-
},
|
| 568 |
-
{
|
| 569 |
-
"epoch": 0.22588097246158922,
|
| 570 |
-
"grad_norm": 52.7680778503418,
|
| 571 |
-
"learning_rate": 9.339200247027944e-07,
|
| 572 |
-
"loss": 219.3169,
|
| 573 |
-
"step": 770
|
| 574 |
-
},
|
| 575 |
-
{
|
| 576 |
-
"epoch": 0.228814491584467,
|
| 577 |
-
"grad_norm": 51.10957717895508,
|
| 578 |
-
"learning_rate": 9.323761000463177e-07,
|
| 579 |
-
"loss": 213.7119,
|
| 580 |
-
"step": 780
|
| 581 |
-
},
|
| 582 |
-
{
|
| 583 |
-
"epoch": 0.2317480107073448,
|
| 584 |
-
"grad_norm": 96.71678161621094,
|
| 585 |
-
"learning_rate": 9.30832175389841e-07,
|
| 586 |
-
"loss": 216.2126,
|
| 587 |
-
"step": 790
|
| 588 |
-
},
|
| 589 |
-
{
|
| 590 |
-
"epoch": 0.23468152983022258,
|
| 591 |
-
"grad_norm": 59.496395111083984,
|
| 592 |
-
"learning_rate": 9.292882507333642e-07,
|
| 593 |
-
"loss": 220.2937,
|
| 594 |
-
"step": 800
|
| 595 |
-
},
|
| 596 |
-
{
|
| 597 |
-
"epoch": 0.23468152983022258,
|
| 598 |
-
"eval_loss": 24.050508499145508,
|
| 599 |
-
"eval_runtime": 98.6094,
|
| 600 |
-
"eval_samples_per_second": 98.327,
|
| 601 |
-
"eval_steps_per_second": 6.145,
|
| 602 |
-
"step": 800
|
| 603 |
-
},
|
| 604 |
-
{
|
| 605 |
-
"epoch": 0.23761504895310037,
|
| 606 |
-
"grad_norm": 115.57308959960938,
|
| 607 |
-
"learning_rate": 9.277443260768874e-07,
|
| 608 |
-
"loss": 214.0267,
|
| 609 |
-
"step": 810
|
| 610 |
-
},
|
| 611 |
-
{
|
| 612 |
-
"epoch": 0.24054856807597816,
|
| 613 |
-
"grad_norm": 58.29754638671875,
|
| 614 |
-
"learning_rate": 9.262004014204107e-07,
|
| 615 |
-
"loss": 219.2819,
|
| 616 |
-
"step": 820
|
| 617 |
-
},
|
| 618 |
-
{
|
| 619 |
-
"epoch": 0.24348208719885592,
|
| 620 |
-
"grad_norm": 137.19517517089844,
|
| 621 |
-
"learning_rate": 9.246564767639339e-07,
|
| 622 |
-
"loss": 217.8361,
|
| 623 |
-
"step": 830
|
| 624 |
-
},
|
| 625 |
-
{
|
| 626 |
-
"epoch": 0.2464156063217337,
|
| 627 |
-
"grad_norm": 62.34098434448242,
|
| 628 |
-
"learning_rate": 9.23112552107457e-07,
|
| 629 |
-
"loss": 217.4855,
|
| 630 |
-
"step": 840
|
| 631 |
-
},
|
| 632 |
-
{
|
| 633 |
-
"epoch": 0.2493491254446115,
|
| 634 |
-
"grad_norm": 57.445247650146484,
|
| 635 |
-
"learning_rate": 9.215686274509803e-07,
|
| 636 |
-
"loss": 217.8953,
|
| 637 |
-
"step": 850
|
| 638 |
-
},
|
| 639 |
-
{
|
| 640 |
-
"epoch": 0.2522826445674893,
|
| 641 |
-
"grad_norm": 61.09876251220703,
|
| 642 |
-
"learning_rate": 9.200247027945036e-07,
|
| 643 |
-
"loss": 215.2011,
|
| 644 |
-
"step": 860
|
| 645 |
-
},
|
| 646 |
-
{
|
| 647 |
-
"epoch": 0.25521616369036704,
|
| 648 |
-
"grad_norm": 59.176513671875,
|
| 649 |
-
"learning_rate": 9.184807781380268e-07,
|
| 650 |
-
"loss": 217.2304,
|
| 651 |
-
"step": 870
|
| 652 |
-
},
|
| 653 |
-
{
|
| 654 |
-
"epoch": 0.25814968281324485,
|
| 655 |
-
"grad_norm": 52.66059494018555,
|
| 656 |
-
"learning_rate": 9.1693685348155e-07,
|
| 657 |
-
"loss": 218.234,
|
| 658 |
-
"step": 880
|
| 659 |
-
},
|
| 660 |
-
{
|
| 661 |
-
"epoch": 0.2610832019361226,
|
| 662 |
-
"grad_norm": 98.39973449707031,
|
| 663 |
-
"learning_rate": 9.153929288250732e-07,
|
| 664 |
-
"loss": 214.297,
|
| 665 |
-
"step": 890
|
| 666 |
-
},
|
| 667 |
-
{
|
| 668 |
-
"epoch": 0.2640167210590004,
|
| 669 |
-
"grad_norm": 72.08065795898438,
|
| 670 |
-
"learning_rate": 9.138490041685965e-07,
|
| 671 |
-
"loss": 217.044,
|
| 672 |
-
"step": 900
|
| 673 |
-
},
|
| 674 |
-
{
|
| 675 |
-
"epoch": 0.2669502401818782,
|
| 676 |
-
"grad_norm": 59.712371826171875,
|
| 677 |
-
"learning_rate": 9.123050795121198e-07,
|
| 678 |
-
"loss": 215.2483,
|
| 679 |
-
"step": 910
|
| 680 |
-
},
|
| 681 |
-
{
|
| 682 |
-
"epoch": 0.26988375930475594,
|
| 683 |
-
"grad_norm": 64.43281555175781,
|
| 684 |
-
"learning_rate": 9.10761154855643e-07,
|
| 685 |
-
"loss": 211.9948,
|
| 686 |
-
"step": 920
|
| 687 |
-
},
|
| 688 |
-
{
|
| 689 |
-
"epoch": 0.27281727842763376,
|
| 690 |
-
"grad_norm": 61.78029251098633,
|
| 691 |
-
"learning_rate": 9.092172301991662e-07,
|
| 692 |
-
"loss": 217.2441,
|
| 693 |
-
"step": 930
|
| 694 |
-
},
|
| 695 |
-
{
|
| 696 |
-
"epoch": 0.2757507975505115,
|
| 697 |
-
"grad_norm": 68.14164733886719,
|
| 698 |
-
"learning_rate": 9.076733055426895e-07,
|
| 699 |
-
"loss": 214.7014,
|
| 700 |
-
"step": 940
|
| 701 |
-
},
|
| 702 |
-
{
|
| 703 |
-
"epoch": 0.27868431667338933,
|
| 704 |
-
"grad_norm": 61.65287399291992,
|
| 705 |
-
"learning_rate": 9.061293808862127e-07,
|
| 706 |
-
"loss": 212.859,
|
| 707 |
-
"step": 950
|
| 708 |
-
},
|
| 709 |
-
{
|
| 710 |
-
"epoch": 0.2816178357962671,
|
| 711 |
-
"grad_norm": 64.0514144897461,
|
| 712 |
-
"learning_rate": 9.045854562297359e-07,
|
| 713 |
-
"loss": 217.1946,
|
| 714 |
-
"step": 960
|
| 715 |
-
},
|
| 716 |
-
{
|
| 717 |
-
"epoch": 0.2845513549191449,
|
| 718 |
-
"grad_norm": 91.87364959716797,
|
| 719 |
-
"learning_rate": 9.030415315732592e-07,
|
| 720 |
-
"loss": 215.7542,
|
| 721 |
-
"step": 970
|
| 722 |
-
},
|
| 723 |
-
{
|
| 724 |
-
"epoch": 0.28748487404202266,
|
| 725 |
-
"grad_norm": 54.730316162109375,
|
| 726 |
-
"learning_rate": 9.014976069167825e-07,
|
| 727 |
-
"loss": 218.0408,
|
| 728 |
-
"step": 980
|
| 729 |
-
},
|
| 730 |
-
{
|
| 731 |
-
"epoch": 0.2904183931649004,
|
| 732 |
-
"grad_norm": 56.43712615966797,
|
| 733 |
-
"learning_rate": 8.999536822603057e-07,
|
| 734 |
-
"loss": 212.8671,
|
| 735 |
-
"step": 990
|
| 736 |
-
},
|
| 737 |
-
{
|
| 738 |
-
"epoch": 0.29335191228777824,
|
| 739 |
-
"grad_norm": 59.28590393066406,
|
| 740 |
-
"learning_rate": 8.984097576038289e-07,
|
| 741 |
-
"loss": 215.822,
|
| 742 |
-
"step": 1000
|
| 743 |
-
},
|
| 744 |
-
{
|
| 745 |
-
"epoch": 0.29335191228777824,
|
| 746 |
-
"eval_loss": 23.851858139038086,
|
| 747 |
-
"eval_runtime": 96.4448,
|
| 748 |
-
"eval_samples_per_second": 100.534,
|
| 749 |
-
"eval_steps_per_second": 6.283,
|
| 750 |
-
"step": 1000
|
| 751 |
-
}
|
| 752 |
-
],
|
| 753 |
-
"logging_steps": 10,
|
| 754 |
-
"max_steps": 6818,
|
| 755 |
-
"num_input_tokens_seen": 0,
|
| 756 |
-
"num_train_epochs": 2,
|
| 757 |
-
"save_steps": 200,
|
| 758 |
-
"stateful_callbacks": {
|
| 759 |
-
"TrainerControl": {
|
| 760 |
-
"args": {
|
| 761 |
-
"should_epoch_stop": false,
|
| 762 |
-
"should_evaluate": false,
|
| 763 |
-
"should_log": false,
|
| 764 |
-
"should_save": true,
|
| 765 |
-
"should_training_stop": false
|
| 766 |
-
},
|
| 767 |
-
"attributes": {}
|
| 768 |
-
}
|
| 769 |
-
},
|
| 770 |
-
"total_flos": 3.503007404654592e+17,
|
| 771 |
-
"train_batch_size": 8,
|
| 772 |
-
"trial_name": null,
|
| 773 |
-
"trial_params": null
|
| 774 |
-
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
lora/lora-stage1/vocab.json
DELETED
|
The diff for this file is too large to render.
See raw diff
|
|
|
lora/lora-stage2/README.md
DELETED
|
@@ -1,207 +0,0 @@
|
|
| 1 |
-
---
|
| 2 |
-
base_model: ''
|
| 3 |
-
library_name: peft
|
| 4 |
-
pipeline_tag: text-generation
|
| 5 |
-
tags:
|
| 6 |
-
- 'base_model:adapter:'
|
| 7 |
-
- lora
|
| 8 |
-
- transformers
|
| 9 |
-
---
|
| 10 |
-
|
| 11 |
-
# Model Card for Model ID
|
| 12 |
-
|
| 13 |
-
<!-- Provide a quick summary of what the model is/does. -->
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
## Model Details
|
| 18 |
-
|
| 19 |
-
### Model Description
|
| 20 |
-
|
| 21 |
-
<!-- Provide a longer summary of what this model is. -->
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
- **Developed by:** [More Information Needed]
|
| 26 |
-
- **Funded by [optional]:** [More Information Needed]
|
| 27 |
-
- **Shared by [optional]:** [More Information Needed]
|
| 28 |
-
- **Model type:** [More Information Needed]
|
| 29 |
-
- **Language(s) (NLP):** [More Information Needed]
|
| 30 |
-
- **License:** [More Information Needed]
|
| 31 |
-
- **Finetuned from model [optional]:** [More Information Needed]
|
| 32 |
-
|
| 33 |
-
### Model Sources [optional]
|
| 34 |
-
|
| 35 |
-
<!-- Provide the basic links for the model. -->
|
| 36 |
-
|
| 37 |
-
- **Repository:** [More Information Needed]
|
| 38 |
-
- **Paper [optional]:** [More Information Needed]
|
| 39 |
-
- **Demo [optional]:** [More Information Needed]
|
| 40 |
-
|
| 41 |
-
## Uses
|
| 42 |
-
|
| 43 |
-
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
| 44 |
-
|
| 45 |
-
### Direct Use
|
| 46 |
-
|
| 47 |
-
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
|
| 48 |
-
|
| 49 |
-
[More Information Needed]
|
| 50 |
-
|
| 51 |
-
### Downstream Use [optional]
|
| 52 |
-
|
| 53 |
-
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
|
| 54 |
-
|
| 55 |
-
[More Information Needed]
|
| 56 |
-
|
| 57 |
-
### Out-of-Scope Use
|
| 58 |
-
|
| 59 |
-
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
|
| 60 |
-
|
| 61 |
-
[More Information Needed]
|
| 62 |
-
|
| 63 |
-
## Bias, Risks, and Limitations
|
| 64 |
-
|
| 65 |
-
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
| 66 |
-
|
| 67 |
-
[More Information Needed]
|
| 68 |
-
|
| 69 |
-
### Recommendations
|
| 70 |
-
|
| 71 |
-
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
|
| 72 |
-
|
| 73 |
-
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
|
| 74 |
-
|
| 75 |
-
## How to Get Started with the Model
|
| 76 |
-
|
| 77 |
-
Use the code below to get started with the model.
|
| 78 |
-
|
| 79 |
-
[More Information Needed]
|
| 80 |
-
|
| 81 |
-
## Training Details
|
| 82 |
-
|
| 83 |
-
### Training Data
|
| 84 |
-
|
| 85 |
-
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
| 86 |
-
|
| 87 |
-
[More Information Needed]
|
| 88 |
-
|
| 89 |
-
### Training Procedure
|
| 90 |
-
|
| 91 |
-
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
| 92 |
-
|
| 93 |
-
#### Preprocessing [optional]
|
| 94 |
-
|
| 95 |
-
[More Information Needed]
|
| 96 |
-
|
| 97 |
-
|
| 98 |
-
#### Training Hyperparameters
|
| 99 |
-
|
| 100 |
-
- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
|
| 101 |
-
|
| 102 |
-
#### Speeds, Sizes, Times [optional]
|
| 103 |
-
|
| 104 |
-
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
|
| 105 |
-
|
| 106 |
-
[More Information Needed]
|
| 107 |
-
|
| 108 |
-
## Evaluation
|
| 109 |
-
|
| 110 |
-
<!-- This section describes the evaluation protocols and provides the results. -->
|
| 111 |
-
|
| 112 |
-
### Testing Data, Factors & Metrics
|
| 113 |
-
|
| 114 |
-
#### Testing Data
|
| 115 |
-
|
| 116 |
-
<!-- This should link to a Dataset Card if possible. -->
|
| 117 |
-
|
| 118 |
-
[More Information Needed]
|
| 119 |
-
|
| 120 |
-
#### Factors
|
| 121 |
-
|
| 122 |
-
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
|
| 123 |
-
|
| 124 |
-
[More Information Needed]
|
| 125 |
-
|
| 126 |
-
#### Metrics
|
| 127 |
-
|
| 128 |
-
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
| 129 |
-
|
| 130 |
-
[More Information Needed]
|
| 131 |
-
|
| 132 |
-
### Results
|
| 133 |
-
|
| 134 |
-
[More Information Needed]
|
| 135 |
-
|
| 136 |
-
#### Summary
|
| 137 |
-
|
| 138 |
-
|
| 139 |
-
|
| 140 |
-
## Model Examination [optional]
|
| 141 |
-
|
| 142 |
-
<!-- Relevant interpretability work for the model goes here -->
|
| 143 |
-
|
| 144 |
-
[More Information Needed]
|
| 145 |
-
|
| 146 |
-
## Environmental Impact
|
| 147 |
-
|
| 148 |
-
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
|
| 149 |
-
|
| 150 |
-
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
|
| 151 |
-
|
| 152 |
-
- **Hardware Type:** [More Information Needed]
|
| 153 |
-
- **Hours used:** [More Information Needed]
|
| 154 |
-
- **Cloud Provider:** [More Information Needed]
|
| 155 |
-
- **Compute Region:** [More Information Needed]
|
| 156 |
-
- **Carbon Emitted:** [More Information Needed]
|
| 157 |
-
|
| 158 |
-
## Technical Specifications [optional]
|
| 159 |
-
|
| 160 |
-
### Model Architecture and Objective
|
| 161 |
-
|
| 162 |
-
[More Information Needed]
|
| 163 |
-
|
| 164 |
-
### Compute Infrastructure
|
| 165 |
-
|
| 166 |
-
[More Information Needed]
|
| 167 |
-
|
| 168 |
-
#### Hardware
|
| 169 |
-
|
| 170 |
-
[More Information Needed]
|
| 171 |
-
|
| 172 |
-
#### Software
|
| 173 |
-
|
| 174 |
-
[More Information Needed]
|
| 175 |
-
|
| 176 |
-
## Citation [optional]
|
| 177 |
-
|
| 178 |
-
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
| 179 |
-
|
| 180 |
-
**BibTeX:**
|
| 181 |
-
|
| 182 |
-
[More Information Needed]
|
| 183 |
-
|
| 184 |
-
**APA:**
|
| 185 |
-
|
| 186 |
-
[More Information Needed]
|
| 187 |
-
|
| 188 |
-
## Glossary [optional]
|
| 189 |
-
|
| 190 |
-
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
|
| 191 |
-
|
| 192 |
-
[More Information Needed]
|
| 193 |
-
|
| 194 |
-
## More Information [optional]
|
| 195 |
-
|
| 196 |
-
[More Information Needed]
|
| 197 |
-
|
| 198 |
-
## Model Card Authors [optional]
|
| 199 |
-
|
| 200 |
-
[More Information Needed]
|
| 201 |
-
|
| 202 |
-
## Model Card Contact
|
| 203 |
-
|
| 204 |
-
[More Information Needed]
|
| 205 |
-
### Framework versions
|
| 206 |
-
|
| 207 |
-
- PEFT 0.18.1
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
lora/lora-stage2/adapter_config.json
DELETED
|
@@ -1,38 +0,0 @@
|
|
| 1 |
-
{
|
| 2 |
-
"alora_invocation_tokens": null,
|
| 3 |
-
"alpha_pattern": {},
|
| 4 |
-
"arrow_config": null,
|
| 5 |
-
"auto_mapping": null,
|
| 6 |
-
"base_model_name_or_path": "",
|
| 7 |
-
"bias": "none",
|
| 8 |
-
"corda_config": null,
|
| 9 |
-
"ensure_weight_tying": false,
|
| 10 |
-
"eva_config": null,
|
| 11 |
-
"exclude_modules": null,
|
| 12 |
-
"fan_in_fan_out": false,
|
| 13 |
-
"inference_mode": true,
|
| 14 |
-
"init_lora_weights": true,
|
| 15 |
-
"layer_replication": null,
|
| 16 |
-
"layers_pattern": null,
|
| 17 |
-
"layers_to_transform": null,
|
| 18 |
-
"loftq_config": {},
|
| 19 |
-
"lora_alpha": 16,
|
| 20 |
-
"lora_bias": false,
|
| 21 |
-
"lora_dropout": 0.05,
|
| 22 |
-
"megatron_config": null,
|
| 23 |
-
"megatron_core": "megatron.core",
|
| 24 |
-
"modules_to_save": null,
|
| 25 |
-
"peft_type": "LORA",
|
| 26 |
-
"peft_version": "0.18.1",
|
| 27 |
-
"qalora_group_size": 16,
|
| 28 |
-
"r": 8,
|
| 29 |
-
"rank_pattern": {},
|
| 30 |
-
"revision": null,
|
| 31 |
-
"target_modules": "^(audio_tower\\.(conv_out|proj1|proj2)$|audio_tower\\.layers\\.\\d+\\..*\\.(q_proj|k_proj|v_proj|out_proj|fc1|fc2)$|model\\.layers\\.\\d+\\..*\\.(q_proj|k_proj|v_proj|o_proj|gate_proj|up_proj|down_proj)$)",
|
| 32 |
-
"target_parameters": null,
|
| 33 |
-
"task_type": "CAUSAL_LM",
|
| 34 |
-
"trainable_token_indices": null,
|
| 35 |
-
"use_dora": false,
|
| 36 |
-
"use_qalora": false,
|
| 37 |
-
"use_rslora": false
|
| 38 |
-
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
lora/lora-stage2/adapter_model.safetensors
DELETED
|
@@ -1,3 +0,0 @@
|
|
| 1 |
-
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:dd4baa6a45645b280fdddb3c722186d149b5f64daab687300dba0c08373e3962
|
| 3 |
-
size 41677888
|
|
|
|
|
|
|
|
|
|
|
|
lora/lora-stage2/added_tokens.json
DELETED
|
@@ -1,64 +0,0 @@
|
|
| 1 |
-
{
|
| 2 |
-
"</think>": 151668,
|
| 3 |
-
"</tool_call>": 151658,
|
| 4 |
-
"</tool_response>": 151666,
|
| 5 |
-
"<asr_text>": 151704,
|
| 6 |
-
"<blank10>": 151686,
|
| 7 |
-
"<blank11>": 151687,
|
| 8 |
-
"<blank12>": 151688,
|
| 9 |
-
"<blank13>": 151689,
|
| 10 |
-
"<blank14>": 151690,
|
| 11 |
-
"<blank15>": 151691,
|
| 12 |
-
"<blank16>": 151692,
|
| 13 |
-
"<blank17>": 151693,
|
| 14 |
-
"<blank18>": 151694,
|
| 15 |
-
"<blank19>": 151695,
|
| 16 |
-
"<blank1>": 151677,
|
| 17 |
-
"<blank20>": 151696,
|
| 18 |
-
"<blank21>": 151697,
|
| 19 |
-
"<blank22>": 151698,
|
| 20 |
-
"<blank23>": 151699,
|
| 21 |
-
"<blank24>": 151700,
|
| 22 |
-
"<blank25>": 151701,
|
| 23 |
-
"<blank26>": 151702,
|
| 24 |
-
"<blank27>": 151703,
|
| 25 |
-
"<blank2>": 151678,
|
| 26 |
-
"<blank3>": 151679,
|
| 27 |
-
"<blank4>": 151680,
|
| 28 |
-
"<blank5>": 151681,
|
| 29 |
-
"<blank6>": 151682,
|
| 30 |
-
"<blank7>": 151683,
|
| 31 |
-
"<blank8>": 151684,
|
| 32 |
-
"<blank9>": 151685,
|
| 33 |
-
"<non_speech>": 151675,
|
| 34 |
-
"<think>": 151667,
|
| 35 |
-
"<tool_call>": 151657,
|
| 36 |
-
"<tool_response>": 151665,
|
| 37 |
-
"<tts_pad>": 151671,
|
| 38 |
-
"<tts_text_bos>": 151672,
|
| 39 |
-
"<tts_text_bos_single>": 151674,
|
| 40 |
-
"<tts_text_eod>": 151673,
|
| 41 |
-
"<|audio_end|>": 151670,
|
| 42 |
-
"<|audio_pad|>": 151676,
|
| 43 |
-
"<|audio_start|>": 151669,
|
| 44 |
-
"<|box_end|>": 151649,
|
| 45 |
-
"<|box_start|>": 151648,
|
| 46 |
-
"<|endoftext|>": 151643,
|
| 47 |
-
"<|file_sep|>": 151664,
|
| 48 |
-
"<|fim_middle|>": 151660,
|
| 49 |
-
"<|fim_pad|>": 151662,
|
| 50 |
-
"<|fim_prefix|>": 151659,
|
| 51 |
-
"<|fim_suffix|>": 151661,
|
| 52 |
-
"<|im_end|>": 151645,
|
| 53 |
-
"<|im_start|>": 151644,
|
| 54 |
-
"<|image_pad|>": 151655,
|
| 55 |
-
"<|object_ref_end|>": 151647,
|
| 56 |
-
"<|object_ref_start|>": 151646,
|
| 57 |
-
"<|quad_end|>": 151651,
|
| 58 |
-
"<|quad_start|>": 151650,
|
| 59 |
-
"<|repo_name|>": 151663,
|
| 60 |
-
"<|video_pad|>": 151656,
|
| 61 |
-
"<|vision_end|>": 151653,
|
| 62 |
-
"<|vision_pad|>": 151654,
|
| 63 |
-
"<|vision_start|>": 151652
|
| 64 |
-
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
lora/lora-stage2/base_model.txt
DELETED
|
@@ -1 +0,0 @@
|
|
| 1 |
-
/data/haobin/pky_train/qwen3/Qwen3-ASR-1.7B
|
|
|
|
|
|
lora/lora-stage2/chat_template.jinja
DELETED
|
@@ -1,31 +0,0 @@
|
|
| 1 |
-
{%- set ns = namespace(system_text="") -%}
|
| 2 |
-
{%- for m in messages -%}
|
| 3 |
-
{%- if m.role == 'system' -%}
|
| 4 |
-
{%- if m.content is string -%}
|
| 5 |
-
{%- set ns.system_text = ns.system_text + m.content -%}
|
| 6 |
-
{%- else -%}
|
| 7 |
-
{%- for c in m.content -%}
|
| 8 |
-
{%- if c.type == 'text' and (c.text is defined) -%}
|
| 9 |
-
{%- set ns.system_text = ns.system_text + c.text -%}
|
| 10 |
-
{%- endif -%}
|
| 11 |
-
{%- endfor -%}
|
| 12 |
-
{%- endif -%}
|
| 13 |
-
{%- endif -%}
|
| 14 |
-
{%- endfor -%}
|
| 15 |
-
|
| 16 |
-
{%- set ns2 = namespace(audio_tokens="") -%}
|
| 17 |
-
{%- for m in messages -%}
|
| 18 |
-
{%- if m.content is not string -%}
|
| 19 |
-
{%- for c in m.content -%}
|
| 20 |
-
{%- if c.type == 'audio' or ('audio' in c) or ('audio_url' in c) -%}
|
| 21 |
-
{%- set ns2.audio_tokens = ns2.audio_tokens + "<|audio_start|><|audio_pad|><|audio_end|>" -%}
|
| 22 |
-
{%- endif -%}
|
| 23 |
-
{%- endfor -%}
|
| 24 |
-
{%- endif -%}
|
| 25 |
-
{%- endfor -%}
|
| 26 |
-
|
| 27 |
-
{{- '<|im_start|>system\n' + (ns.system_text if ns.system_text is string else '') + '<|im_end|>\n' -}}
|
| 28 |
-
{{- '<|im_start|>user\n' + ns2.audio_tokens + '<|im_end|>\n' -}}
|
| 29 |
-
{%- if add_generation_prompt -%}
|
| 30 |
-
{{- '<|im_start|>assistant\n' -}}
|
| 31 |
-
{%- endif -%}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
lora/lora-stage2/chat_template.json
DELETED
|
@@ -1 +0,0 @@
|
|
| 1 |
-
{"chat_template": "{%- set ns = namespace(system_text=\"\") -%}\n{%- for m in messages -%}\n {%- if m.role == 'system' -%}\n {%- if m.content is string -%}\n {%- set ns.system_text = ns.system_text + m.content -%}\n {%- else -%}\n {%- for c in m.content -%}\n {%- if c.type == 'text' and (c.text is defined) -%}\n {%- set ns.system_text = ns.system_text + c.text -%}\n {%- endif -%}\n {%- endfor -%}\n {%- endif -%}\n {%- endif -%}\n{%- endfor -%}\n\n{%- set ns2 = namespace(audio_tokens=\"\") -%}\n{%- for m in messages -%}\n {%- if m.content is not string -%}\n {%- for c in m.content -%}\n {%- if c.type == 'audio' or ('audio' in c) or ('audio_url' in c) -%}\n {%- set ns2.audio_tokens = ns2.audio_tokens + \"<|audio_start|><|audio_pad|><|audio_end|>\" -%}\n {%- endif -%}\n {%- endfor -%}\n {%- endif -%}\n{%- endfor -%}\n\n{{- '<|im_start|>system\\n' + (ns.system_text if ns.system_text is string else '') + '<|im_end|>\\n' -}}\n{{- '<|im_start|>user\\n' + ns2.audio_tokens + '<|im_end|>\\n' -}}\n{%- if add_generation_prompt -%}\n{{- '<|im_start|>assistant\\n' -}}\n{%- endif -%}"}
|
|
|
|
|
|
lora/lora-stage2/config.json
DELETED
|
@@ -1,221 +0,0 @@
|
|
| 1 |
-
{
|
| 2 |
-
"architectures": [
|
| 3 |
-
"Qwen3ASRForConditionalGeneration"
|
| 4 |
-
],
|
| 5 |
-
"model_type": "qwen3_asr",
|
| 6 |
-
"support_languages": [
|
| 7 |
-
"Chinese",
|
| 8 |
-
"English",
|
| 9 |
-
"Cantonese",
|
| 10 |
-
"Arabic",
|
| 11 |
-
"German",
|
| 12 |
-
"French",
|
| 13 |
-
"Spanish",
|
| 14 |
-
"Portuguese",
|
| 15 |
-
"Indonesian",
|
| 16 |
-
"Italian",
|
| 17 |
-
"Korean",
|
| 18 |
-
"Russian",
|
| 19 |
-
"Thai",
|
| 20 |
-
"Vietnamese",
|
| 21 |
-
"Japanese",
|
| 22 |
-
"Turkish",
|
| 23 |
-
"Hindi",
|
| 24 |
-
"Malay",
|
| 25 |
-
"Dutch",
|
| 26 |
-
"Swedish",
|
| 27 |
-
"Danish",
|
| 28 |
-
"Finnish",
|
| 29 |
-
"Polish",
|
| 30 |
-
"Czech",
|
| 31 |
-
"Filipino",
|
| 32 |
-
"Persian",
|
| 33 |
-
"Greek",
|
| 34 |
-
"Romanian",
|
| 35 |
-
"Hungarian",
|
| 36 |
-
"Macedonian"
|
| 37 |
-
],
|
| 38 |
-
"thinker_config": {
|
| 39 |
-
"model_type": "qwen3_asr",
|
| 40 |
-
"architectures": [
|
| 41 |
-
"Qwen3ASRForConditionalGeneration"
|
| 42 |
-
],
|
| 43 |
-
"audio_config": {
|
| 44 |
-
"_name_or_path": "",
|
| 45 |
-
"activation_dropout": 0,
|
| 46 |
-
"activation_function": "gelu",
|
| 47 |
-
"add_cross_attention": false,
|
| 48 |
-
"architectures": null,
|
| 49 |
-
"attention_dropout": 0,
|
| 50 |
-
"bad_words_ids": null,
|
| 51 |
-
"begin_suppress_tokens": null,
|
| 52 |
-
"bos_token_id": null,
|
| 53 |
-
"chunk_size_feed_forward": 0,
|
| 54 |
-
"conv_chunksize": 500,
|
| 55 |
-
"cross_attention_hidden_size": null,
|
| 56 |
-
"d_model": 1024,
|
| 57 |
-
"decoder_start_token_id": null,
|
| 58 |
-
"diversity_penalty": 0.0,
|
| 59 |
-
"do_sample": false,
|
| 60 |
-
"downsample_hidden_size": 480,
|
| 61 |
-
"dropout": 0,
|
| 62 |
-
"dtype": null,
|
| 63 |
-
"early_stopping": false,
|
| 64 |
-
"encoder_attention_heads": 16,
|
| 65 |
-
"encoder_ffn_dim": 4096,
|
| 66 |
-
"encoder_layers": 24,
|
| 67 |
-
"encoder_no_repeat_ngram_size": 0,
|
| 68 |
-
"eos_token_id": null,
|
| 69 |
-
"exponential_decay_length_penalty": null,
|
| 70 |
-
"finetuning_task": null,
|
| 71 |
-
"forced_bos_token_id": null,
|
| 72 |
-
"forced_eos_token_id": null,
|
| 73 |
-
"id2label": {
|
| 74 |
-
"0": "LABEL_0",
|
| 75 |
-
"1": "LABEL_1"
|
| 76 |
-
},
|
| 77 |
-
"initializer_range": 0.02,
|
| 78 |
-
"is_decoder": false,
|
| 79 |
-
"is_encoder_decoder": false,
|
| 80 |
-
"label2id": {
|
| 81 |
-
"LABEL_0": 0,
|
| 82 |
-
"LABEL_1": 1
|
| 83 |
-
},
|
| 84 |
-
"length_penalty": 1.0,
|
| 85 |
-
"max_length": 20,
|
| 86 |
-
"max_source_positions": 1500,
|
| 87 |
-
"min_length": 0,
|
| 88 |
-
"model_type": "qwen3_asr_audio_encoder",
|
| 89 |
-
"n_window": 50,
|
| 90 |
-
"n_window_infer": 800,
|
| 91 |
-
"no_repeat_ngram_size": 0,
|
| 92 |
-
"num_beam_groups": 1,
|
| 93 |
-
"num_beams": 1,
|
| 94 |
-
"num_hidden_layers": 24,
|
| 95 |
-
"num_mel_bins": 128,
|
| 96 |
-
"num_return_sequences": 1,
|
| 97 |
-
"output_attentions": false,
|
| 98 |
-
"output_dim": 2048,
|
| 99 |
-
"output_hidden_states": false,
|
| 100 |
-
"output_scores": false,
|
| 101 |
-
"pad_token_id": null,
|
| 102 |
-
"prefix": null,
|
| 103 |
-
"problem_type": null,
|
| 104 |
-
"pruned_heads": {},
|
| 105 |
-
"remove_invalid_values": false,
|
| 106 |
-
"repetition_penalty": 1.0,
|
| 107 |
-
"return_dict": true,
|
| 108 |
-
"return_dict_in_generate": false,
|
| 109 |
-
"scale_embedding": false,
|
| 110 |
-
"sep_token_id": null,
|
| 111 |
-
"suppress_tokens": null,
|
| 112 |
-
"task_specific_params": null,
|
| 113 |
-
"temperature": 1.0,
|
| 114 |
-
"tf_legacy_loss": false,
|
| 115 |
-
"tie_encoder_decoder": false,
|
| 116 |
-
"tie_word_embeddings": true,
|
| 117 |
-
"tokenizer_class": null,
|
| 118 |
-
"top_k": 50,
|
| 119 |
-
"top_p": 1.0,
|
| 120 |
-
"torchscript": false,
|
| 121 |
-
"typical_p": 1.0,
|
| 122 |
-
"use_bfloat16": false
|
| 123 |
-
},
|
| 124 |
-
"audio_end_token_id": 151670,
|
| 125 |
-
"audio_start_token_id": 151669,
|
| 126 |
-
"audio_token_id": 151676,
|
| 127 |
-
"dtype": "bfloat16",
|
| 128 |
-
"initializer_range": 0.02,
|
| 129 |
-
"text_config": {
|
| 130 |
-
"_name_or_path": "",
|
| 131 |
-
"add_cross_attention": false,
|
| 132 |
-
"architectures": null,
|
| 133 |
-
"attention_bias": false,
|
| 134 |
-
"attention_dropout": 0.0,
|
| 135 |
-
"bad_words_ids": null,
|
| 136 |
-
"begin_suppress_tokens": null,
|
| 137 |
-
"bos_token_id": null,
|
| 138 |
-
"chunk_size_feed_forward": 0,
|
| 139 |
-
"cross_attention_hidden_size": null,
|
| 140 |
-
"decoder_start_token_id": null,
|
| 141 |
-
"diversity_penalty": 0.0,
|
| 142 |
-
"do_sample": false,
|
| 143 |
-
"dtype": null,
|
| 144 |
-
"early_stopping": false,
|
| 145 |
-
"encoder_no_repeat_ngram_size": 0,
|
| 146 |
-
"eos_token_id": null,
|
| 147 |
-
"exponential_decay_length_penalty": null,
|
| 148 |
-
"finetuning_task": null,
|
| 149 |
-
"forced_bos_token_id": null,
|
| 150 |
-
"forced_eos_token_id": null,
|
| 151 |
-
"head_dim": 128,
|
| 152 |
-
"hidden_act": "silu",
|
| 153 |
-
"hidden_size": 2048,
|
| 154 |
-
"id2label": {
|
| 155 |
-
"0": "LABEL_0",
|
| 156 |
-
"1": "LABEL_1"
|
| 157 |
-
},
|
| 158 |
-
"initializer_range": 0.02,
|
| 159 |
-
"intermediate_size": 6144,
|
| 160 |
-
"is_decoder": false,
|
| 161 |
-
"is_encoder_decoder": false,
|
| 162 |
-
"label2id": {
|
| 163 |
-
"LABEL_0": 0,
|
| 164 |
-
"LABEL_1": 1
|
| 165 |
-
},
|
| 166 |
-
"length_penalty": 1.0,
|
| 167 |
-
"max_length": 20,
|
| 168 |
-
"max_position_embeddings": 65536,
|
| 169 |
-
"min_length": 0,
|
| 170 |
-
"model_type": "qwen3",
|
| 171 |
-
"no_repeat_ngram_size": 0,
|
| 172 |
-
"num_attention_heads": 16,
|
| 173 |
-
"num_beam_groups": 1,
|
| 174 |
-
"num_beams": 1,
|
| 175 |
-
"num_hidden_layers": 28,
|
| 176 |
-
"num_key_value_heads": 8,
|
| 177 |
-
"num_return_sequences": 1,
|
| 178 |
-
"output_attentions": false,
|
| 179 |
-
"output_hidden_states": false,
|
| 180 |
-
"output_scores": false,
|
| 181 |
-
"pad_token_id": null,
|
| 182 |
-
"prefix": null,
|
| 183 |
-
"problem_type": null,
|
| 184 |
-
"pruned_heads": {},
|
| 185 |
-
"remove_invalid_values": false,
|
| 186 |
-
"repetition_penalty": 1.0,
|
| 187 |
-
"return_dict": true,
|
| 188 |
-
"return_dict_in_generate": false,
|
| 189 |
-
"rms_norm_eps": 1e-06,
|
| 190 |
-
"rope_scaling": {
|
| 191 |
-
"interleaved": true,
|
| 192 |
-
"mrope_interleaved": true,
|
| 193 |
-
"mrope_section": [
|
| 194 |
-
24,
|
| 195 |
-
20,
|
| 196 |
-
20
|
| 197 |
-
],
|
| 198 |
-
"rope_type": "default",
|
| 199 |
-
"type": "default"
|
| 200 |
-
},
|
| 201 |
-
"rope_theta": 1000000,
|
| 202 |
-
"sep_token_id": null,
|
| 203 |
-
"suppress_tokens": null,
|
| 204 |
-
"task_specific_params": null,
|
| 205 |
-
"temperature": 1.0,
|
| 206 |
-
"tf_legacy_loss": false,
|
| 207 |
-
"tie_encoder_decoder": false,
|
| 208 |
-
"tie_word_embeddings": true,
|
| 209 |
-
"tokenizer_class": null,
|
| 210 |
-
"top_k": 50,
|
| 211 |
-
"top_p": 1.0,
|
| 212 |
-
"torchscript": false,
|
| 213 |
-
"typical_p": 1.0,
|
| 214 |
-
"use_bfloat16": false,
|
| 215 |
-
"use_cache": true,
|
| 216 |
-
"vocab_size": 151936
|
| 217 |
-
}
|
| 218 |
-
},
|
| 219 |
-
"transformers_version": "4.57.6"
|
| 220 |
-
}
|
| 221 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
lora/lora-stage2/generation_config.json
DELETED
|
@@ -1,7 +0,0 @@
|
|
| 1 |
-
{
|
| 2 |
-
"_from_model_config": true,
|
| 3 |
-
"eos_token_id": [151643,151645],
|
| 4 |
-
"pad_token_id": 151643,
|
| 5 |
-
"do_sample": false,
|
| 6 |
-
"temperature": 0.000001
|
| 7 |
-
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
lora/lora-stage2/merged_from_lora.txt
DELETED
|
@@ -1 +0,0 @@
|
|
| 1 |
-
/data/haobin/pky_train/qwen3/out_qwen3-asr-lora-0317_550000_wer3_towerb4+proj_2gpu_bs128/checkpoint-1000
|
|
|
|
|
|
lora/lora-stage2/merges.txt
DELETED
|
The diff for this file is too large to render.
See raw diff
|
|
|
lora/lora-stage2/optimizer.pt
DELETED
|
@@ -1,3 +0,0 @@
|
|
| 1 |
-
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:b26cb33d8c7aefdee4dcd88af58551b5a01e16c9f852a1c1ffb0d1a47e6421b4
|
| 3 |
-
size 83695117
|
|
|
|
|
|
|
|
|
|
|
|
lora/lora-stage2/preprocessor_config.json
DELETED
|
@@ -1,14 +0,0 @@
|
|
| 1 |
-
{
|
| 2 |
-
"chunk_length": 30,
|
| 3 |
-
"dither": 0.0,
|
| 4 |
-
"feature_extractor_type": "WhisperFeatureExtractor",
|
| 5 |
-
"feature_size": 128,
|
| 6 |
-
"hop_length": 160,
|
| 7 |
-
"n_fft": 400,
|
| 8 |
-
"n_samples": 480000,
|
| 9 |
-
"nb_max_frames": 3000,
|
| 10 |
-
"padding_side": "right",
|
| 11 |
-
"padding_value": 0.0,
|
| 12 |
-
"processor_class": "Qwen3ASRProcessor",
|
| 13 |
-
"return_attention_mask": true
|
| 14 |
-
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
lora/lora-stage2/rng_state_0.pth
DELETED
|
@@ -1,3 +0,0 @@
|
|
| 1 |
-
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:de015da1ba6a4dc8cf66420b3b9b378bc07585bfb14a0c37fb50e723424b9768
|
| 3 |
-
size 14917
|
|
|
|
|
|
|
|
|
|
|
|
lora/lora-stage2/rng_state_1.pth
DELETED
|
@@ -1,3 +0,0 @@
|
|
| 1 |
-
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:681f2e7cc7c3d884111a86a3bcdeeaea97b22ebf60e4f765788ee5cbeb94e2d9
|
| 3 |
-
size 14917
|
|
|
|
|
|
|
|
|
|
|
|
lora/lora-stage2/scheduler.pt
DELETED
|
@@ -1,3 +0,0 @@
|
|
| 1 |
-
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:2a7077d452a1df5790a83102fc7a743c5150e80f24610df63abd069404ebe93a
|
| 3 |
-
size 1465
|
|
|
|
|
|
|
|
|
|
|
|
lora/lora-stage2/special_tokens_map.json
DELETED
|
@@ -1,44 +0,0 @@
|
|
| 1 |
-
{
|
| 2 |
-
"additional_special_tokens": [
|
| 3 |
-
"<|im_start|>",
|
| 4 |
-
"<|im_end|>",
|
| 5 |
-
"<|object_ref_start|>",
|
| 6 |
-
"<|object_ref_end|>",
|
| 7 |
-
"<|box_start|>",
|
| 8 |
-
"<|box_end|>",
|
| 9 |
-
"<|quad_start|>",
|
| 10 |
-
"<|quad_end|>",
|
| 11 |
-
"<|vision_start|>",
|
| 12 |
-
"<|vision_end|>",
|
| 13 |
-
"<|vision_pad|>",
|
| 14 |
-
"<|image_pad|>",
|
| 15 |
-
"<|video_pad|>",
|
| 16 |
-
"<|audio_start|>",
|
| 17 |
-
"<|audio_end|>",
|
| 18 |
-
"<tts_pad>",
|
| 19 |
-
"<tts_text_bos>",
|
| 20 |
-
"<tts_text_bos_single>",
|
| 21 |
-
"<|audio_pad|>"
|
| 22 |
-
],
|
| 23 |
-
"audio_bos_token": "<|audio_start|>",
|
| 24 |
-
"audio_eos_token": "<|audio_end|>",
|
| 25 |
-
"audio_token": "<|audio_pad|>",
|
| 26 |
-
"eos_token": {
|
| 27 |
-
"content": "<|im_end|>",
|
| 28 |
-
"lstrip": false,
|
| 29 |
-
"normalized": false,
|
| 30 |
-
"rstrip": false,
|
| 31 |
-
"single_word": false
|
| 32 |
-
},
|
| 33 |
-
"image_token": "<|image_pad|>",
|
| 34 |
-
"pad_token": {
|
| 35 |
-
"content": "<|endoftext|>",
|
| 36 |
-
"lstrip": false,
|
| 37 |
-
"normalized": false,
|
| 38 |
-
"rstrip": false,
|
| 39 |
-
"single_word": false
|
| 40 |
-
},
|
| 41 |
-
"video_token": "<|video_pad|>",
|
| 42 |
-
"vision_bos_token": "<|vision_start|>",
|
| 43 |
-
"vision_eos_token": "<|vision_end|>"
|
| 44 |
-
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
lora/lora-stage2/tokenizer.json
DELETED
|
@@ -1,3 +0,0 @@
|
|
| 1 |
-
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:0499602714160467f2d68b910651d6216020689f1e016be87a2d0019ee3baeab
|
| 3 |
-
size 11429499
|
|
|
|
|
|
|
|
|
|
|
|
lora/lora-stage2/tokenizer_config.json
DELETED
|
@@ -1,549 +0,0 @@
|
|
| 1 |
-
{
|
| 2 |
-
"add_bos_token": false,
|
| 3 |
-
"add_prefix_space": false,
|
| 4 |
-
"added_tokens_decoder": {
|
| 5 |
-
"151643": {
|
| 6 |
-
"content": "<|endoftext|>",
|
| 7 |
-
"lstrip": false,
|
| 8 |
-
"normalized": false,
|
| 9 |
-
"rstrip": false,
|
| 10 |
-
"single_word": false,
|
| 11 |
-
"special": true
|
| 12 |
-
},
|
| 13 |
-
"151644": {
|
| 14 |
-
"content": "<|im_start|>",
|
| 15 |
-
"lstrip": false,
|
| 16 |
-
"normalized": false,
|
| 17 |
-
"rstrip": false,
|
| 18 |
-
"single_word": false,
|
| 19 |
-
"special": true
|
| 20 |
-
},
|
| 21 |
-
"151645": {
|
| 22 |
-
"content": "<|im_end|>",
|
| 23 |
-
"lstrip": false,
|
| 24 |
-
"normalized": false,
|
| 25 |
-
"rstrip": false,
|
| 26 |
-
"single_word": false,
|
| 27 |
-
"special": true
|
| 28 |
-
},
|
| 29 |
-
"151646": {
|
| 30 |
-
"content": "<|object_ref_start|>",
|
| 31 |
-
"lstrip": false,
|
| 32 |
-
"normalized": false,
|
| 33 |
-
"rstrip": false,
|
| 34 |
-
"single_word": false,
|
| 35 |
-
"special": true
|
| 36 |
-
},
|
| 37 |
-
"151647": {
|
| 38 |
-
"content": "<|object_ref_end|>",
|
| 39 |
-
"lstrip": false,
|
| 40 |
-
"normalized": false,
|
| 41 |
-
"rstrip": false,
|
| 42 |
-
"single_word": false,
|
| 43 |
-
"special": true
|
| 44 |
-
},
|
| 45 |
-
"151648": {
|
| 46 |
-
"content": "<|box_start|>",
|
| 47 |
-
"lstrip": false,
|
| 48 |
-
"normalized": false,
|
| 49 |
-
"rstrip": false,
|
| 50 |
-
"single_word": false,
|
| 51 |
-
"special": true
|
| 52 |
-
},
|
| 53 |
-
"151649": {
|
| 54 |
-
"content": "<|box_end|>",
|
| 55 |
-
"lstrip": false,
|
| 56 |
-
"normalized": false,
|
| 57 |
-
"rstrip": false,
|
| 58 |
-
"single_word": false,
|
| 59 |
-
"special": true
|
| 60 |
-
},
|
| 61 |
-
"151650": {
|
| 62 |
-
"content": "<|quad_start|>",
|
| 63 |
-
"lstrip": false,
|
| 64 |
-
"normalized": false,
|
| 65 |
-
"rstrip": false,
|
| 66 |
-
"single_word": false,
|
| 67 |
-
"special": true
|
| 68 |
-
},
|
| 69 |
-
"151651": {
|
| 70 |
-
"content": "<|quad_end|>",
|
| 71 |
-
"lstrip": false,
|
| 72 |
-
"normalized": false,
|
| 73 |
-
"rstrip": false,
|
| 74 |
-
"single_word": false,
|
| 75 |
-
"special": true
|
| 76 |
-
},
|
| 77 |
-
"151652": {
|
| 78 |
-
"content": "<|vision_start|>",
|
| 79 |
-
"lstrip": false,
|
| 80 |
-
"normalized": false,
|
| 81 |
-
"rstrip": false,
|
| 82 |
-
"single_word": false,
|
| 83 |
-
"special": true
|
| 84 |
-
},
|
| 85 |
-
"151653": {
|
| 86 |
-
"content": "<|vision_end|>",
|
| 87 |
-
"lstrip": false,
|
| 88 |
-
"normalized": false,
|
| 89 |
-
"rstrip": false,
|
| 90 |
-
"single_word": false,
|
| 91 |
-
"special": true
|
| 92 |
-
},
|
| 93 |
-
"151654": {
|
| 94 |
-
"content": "<|vision_pad|>",
|
| 95 |
-
"lstrip": false,
|
| 96 |
-
"normalized": false,
|
| 97 |
-
"rstrip": false,
|
| 98 |
-
"single_word": false,
|
| 99 |
-
"special": true
|
| 100 |
-
},
|
| 101 |
-
"151655": {
|
| 102 |
-
"content": "<|image_pad|>",
|
| 103 |
-
"lstrip": false,
|
| 104 |
-
"normalized": false,
|
| 105 |
-
"rstrip": false,
|
| 106 |
-
"single_word": false,
|
| 107 |
-
"special": true
|
| 108 |
-
},
|
| 109 |
-
"151656": {
|
| 110 |
-
"content": "<|video_pad|>",
|
| 111 |
-
"lstrip": false,
|
| 112 |
-
"normalized": false,
|
| 113 |
-
"rstrip": false,
|
| 114 |
-
"single_word": false,
|
| 115 |
-
"special": true
|
| 116 |
-
},
|
| 117 |
-
"151657": {
|
| 118 |
-
"content": "<tool_call>",
|
| 119 |
-
"lstrip": false,
|
| 120 |
-
"normalized": false,
|
| 121 |
-
"rstrip": false,
|
| 122 |
-
"single_word": false,
|
| 123 |
-
"special": false
|
| 124 |
-
},
|
| 125 |
-
"151658": {
|
| 126 |
-
"content": "</tool_call>",
|
| 127 |
-
"lstrip": false,
|
| 128 |
-
"normalized": false,
|
| 129 |
-
"rstrip": false,
|
| 130 |
-
"single_word": false,
|
| 131 |
-
"special": false
|
| 132 |
-
},
|
| 133 |
-
"151659": {
|
| 134 |
-
"content": "<|fim_prefix|>",
|
| 135 |
-
"lstrip": false,
|
| 136 |
-
"normalized": false,
|
| 137 |
-
"rstrip": false,
|
| 138 |
-
"single_word": false,
|
| 139 |
-
"special": false
|
| 140 |
-
},
|
| 141 |
-
"151660": {
|
| 142 |
-
"content": "<|fim_middle|>",
|
| 143 |
-
"lstrip": false,
|
| 144 |
-
"normalized": false,
|
| 145 |
-
"rstrip": false,
|
| 146 |
-
"single_word": false,
|
| 147 |
-
"special": false
|
| 148 |
-
},
|
| 149 |
-
"151661": {
|
| 150 |
-
"content": "<|fim_suffix|>",
|
| 151 |
-
"lstrip": false,
|
| 152 |
-
"normalized": false,
|
| 153 |
-
"rstrip": false,
|
| 154 |
-
"single_word": false,
|
| 155 |
-
"special": false
|
| 156 |
-
},
|
| 157 |
-
"151662": {
|
| 158 |
-
"content": "<|fim_pad|>",
|
| 159 |
-
"lstrip": false,
|
| 160 |
-
"normalized": false,
|
| 161 |
-
"rstrip": false,
|
| 162 |
-
"single_word": false,
|
| 163 |
-
"special": false
|
| 164 |
-
},
|
| 165 |
-
"151663": {
|
| 166 |
-
"content": "<|repo_name|>",
|
| 167 |
-
"lstrip": false,
|
| 168 |
-
"normalized": false,
|
| 169 |
-
"rstrip": false,
|
| 170 |
-
"single_word": false,
|
| 171 |
-
"special": false
|
| 172 |
-
},
|
| 173 |
-
"151664": {
|
| 174 |
-
"content": "<|file_sep|>",
|
| 175 |
-
"lstrip": false,
|
| 176 |
-
"normalized": false,
|
| 177 |
-
"rstrip": false,
|
| 178 |
-
"single_word": false,
|
| 179 |
-
"special": false
|
| 180 |
-
},
|
| 181 |
-
"151665": {
|
| 182 |
-
"content": "<tool_response>",
|
| 183 |
-
"lstrip": false,
|
| 184 |
-
"normalized": false,
|
| 185 |
-
"rstrip": false,
|
| 186 |
-
"single_word": false,
|
| 187 |
-
"special": false
|
| 188 |
-
},
|
| 189 |
-
"151666": {
|
| 190 |
-
"content": "</tool_response>",
|
| 191 |
-
"lstrip": false,
|
| 192 |
-
"normalized": false,
|
| 193 |
-
"rstrip": false,
|
| 194 |
-
"single_word": false,
|
| 195 |
-
"special": false
|
| 196 |
-
},
|
| 197 |
-
"151667": {
|
| 198 |
-
"content": "<think>",
|
| 199 |
-
"lstrip": false,
|
| 200 |
-
"normalized": false,
|
| 201 |
-
"rstrip": false,
|
| 202 |
-
"single_word": false,
|
| 203 |
-
"special": false
|
| 204 |
-
},
|
| 205 |
-
"151668": {
|
| 206 |
-
"content": "</think>",
|
| 207 |
-
"lstrip": false,
|
| 208 |
-
"normalized": false,
|
| 209 |
-
"rstrip": false,
|
| 210 |
-
"single_word": false,
|
| 211 |
-
"special": false
|
| 212 |
-
},
|
| 213 |
-
"151669": {
|
| 214 |
-
"content": "<|audio_start|>",
|
| 215 |
-
"lstrip": false,
|
| 216 |
-
"normalized": false,
|
| 217 |
-
"rstrip": false,
|
| 218 |
-
"single_word": false,
|
| 219 |
-
"special": true
|
| 220 |
-
},
|
| 221 |
-
"151670": {
|
| 222 |
-
"content": "<|audio_end|>",
|
| 223 |
-
"lstrip": false,
|
| 224 |
-
"normalized": false,
|
| 225 |
-
"rstrip": false,
|
| 226 |
-
"single_word": false,
|
| 227 |
-
"special": true
|
| 228 |
-
},
|
| 229 |
-
"151671": {
|
| 230 |
-
"content": "<tts_pad>",
|
| 231 |
-
"lstrip": false,
|
| 232 |
-
"normalized": false,
|
| 233 |
-
"rstrip": false,
|
| 234 |
-
"single_word": false,
|
| 235 |
-
"special": true
|
| 236 |
-
},
|
| 237 |
-
"151672": {
|
| 238 |
-
"content": "<tts_text_bos>",
|
| 239 |
-
"lstrip": false,
|
| 240 |
-
"normalized": false,
|
| 241 |
-
"rstrip": false,
|
| 242 |
-
"single_word": false,
|
| 243 |
-
"special": true
|
| 244 |
-
},
|
| 245 |
-
"151673": {
|
| 246 |
-
"content": "<tts_text_eod>",
|
| 247 |
-
"lstrip": false,
|
| 248 |
-
"normalized": false,
|
| 249 |
-
"rstrip": false,
|
| 250 |
-
"single_word": false,
|
| 251 |
-
"special": true
|
| 252 |
-
},
|
| 253 |
-
"151674": {
|
| 254 |
-
"content": "<tts_text_bos_single>",
|
| 255 |
-
"lstrip": false,
|
| 256 |
-
"normalized": false,
|
| 257 |
-
"rstrip": false,
|
| 258 |
-
"single_word": false,
|
| 259 |
-
"special": true
|
| 260 |
-
},
|
| 261 |
-
"151675": {
|
| 262 |
-
"content": "<non_speech>",
|
| 263 |
-
"lstrip": false,
|
| 264 |
-
"normalized": false,
|
| 265 |
-
"rstrip": false,
|
| 266 |
-
"single_word": false,
|
| 267 |
-
"special": false
|
| 268 |
-
},
|
| 269 |
-
"151676": {
|
| 270 |
-
"content": "<|audio_pad|>",
|
| 271 |
-
"lstrip": false,
|
| 272 |
-
"normalized": false,
|
| 273 |
-
"rstrip": false,
|
| 274 |
-
"single_word": false,
|
| 275 |
-
"special": true
|
| 276 |
-
},
|
| 277 |
-
"151677": {
|
| 278 |
-
"content": "<blank1>",
|
| 279 |
-
"lstrip": false,
|
| 280 |
-
"normalized": false,
|
| 281 |
-
"rstrip": false,
|
| 282 |
-
"single_word": false,
|
| 283 |
-
"special": true
|
| 284 |
-
},
|
| 285 |
-
"151678": {
|
| 286 |
-
"content": "<blank2>",
|
| 287 |
-
"lstrip": false,
|
| 288 |
-
"normalized": false,
|
| 289 |
-
"rstrip": false,
|
| 290 |
-
"single_word": false,
|
| 291 |
-
"special": true
|
| 292 |
-
},
|
| 293 |
-
"151679": {
|
| 294 |
-
"content": "<blank3>",
|
| 295 |
-
"lstrip": false,
|
| 296 |
-
"normalized": false,
|
| 297 |
-
"rstrip": false,
|
| 298 |
-
"single_word": false,
|
| 299 |
-
"special": true
|
| 300 |
-
},
|
| 301 |
-
"151680": {
|
| 302 |
-
"content": "<blank4>",
|
| 303 |
-
"lstrip": false,
|
| 304 |
-
"normalized": false,
|
| 305 |
-
"rstrip": false,
|
| 306 |
-
"single_word": false,
|
| 307 |
-
"special": true
|
| 308 |
-
},
|
| 309 |
-
"151681": {
|
| 310 |
-
"content": "<blank5>",
|
| 311 |
-
"lstrip": false,
|
| 312 |
-
"normalized": false,
|
| 313 |
-
"rstrip": false,
|
| 314 |
-
"single_word": false,
|
| 315 |
-
"special": true
|
| 316 |
-
},
|
| 317 |
-
"151682": {
|
| 318 |
-
"content": "<blank6>",
|
| 319 |
-
"lstrip": false,
|
| 320 |
-
"normalized": false,
|
| 321 |
-
"rstrip": false,
|
| 322 |
-
"single_word": false,
|
| 323 |
-
"special": true
|
| 324 |
-
},
|
| 325 |
-
"151683": {
|
| 326 |
-
"content": "<blank7>",
|
| 327 |
-
"lstrip": false,
|
| 328 |
-
"normalized": false,
|
| 329 |
-
"rstrip": false,
|
| 330 |
-
"single_word": false,
|
| 331 |
-
"special": true
|
| 332 |
-
},
|
| 333 |
-
"151684": {
|
| 334 |
-
"content": "<blank8>",
|
| 335 |
-
"lstrip": false,
|
| 336 |
-
"normalized": false,
|
| 337 |
-
"rstrip": false,
|
| 338 |
-
"single_word": false,
|
| 339 |
-
"special": true
|
| 340 |
-
},
|
| 341 |
-
"151685": {
|
| 342 |
-
"content": "<blank9>",
|
| 343 |
-
"lstrip": false,
|
| 344 |
-
"normalized": false,
|
| 345 |
-
"rstrip": false,
|
| 346 |
-
"single_word": false,
|
| 347 |
-
"special": true
|
| 348 |
-
},
|
| 349 |
-
"151686": {
|
| 350 |
-
"content": "<blank10>",
|
| 351 |
-
"lstrip": false,
|
| 352 |
-
"normalized": false,
|
| 353 |
-
"rstrip": false,
|
| 354 |
-
"single_word": false,
|
| 355 |
-
"special": true
|
| 356 |
-
},
|
| 357 |
-
"151687": {
|
| 358 |
-
"content": "<blank11>",
|
| 359 |
-
"lstrip": false,
|
| 360 |
-
"normalized": false,
|
| 361 |
-
"rstrip": false,
|
| 362 |
-
"single_word": false,
|
| 363 |
-
"special": true
|
| 364 |
-
},
|
| 365 |
-
"151688": {
|
| 366 |
-
"content": "<blank12>",
|
| 367 |
-
"lstrip": false,
|
| 368 |
-
"normalized": false,
|
| 369 |
-
"rstrip": false,
|
| 370 |
-
"single_word": false,
|
| 371 |
-
"special": true
|
| 372 |
-
},
|
| 373 |
-
"151689": {
|
| 374 |
-
"content": "<blank13>",
|
| 375 |
-
"lstrip": false,
|
| 376 |
-
"normalized": false,
|
| 377 |
-
"rstrip": false,
|
| 378 |
-
"single_word": false,
|
| 379 |
-
"special": true
|
| 380 |
-
},
|
| 381 |
-
"151690": {
|
| 382 |
-
"content": "<blank14>",
|
| 383 |
-
"lstrip": false,
|
| 384 |
-
"normalized": false,
|
| 385 |
-
"rstrip": false,
|
| 386 |
-
"single_word": false,
|
| 387 |
-
"special": true
|
| 388 |
-
},
|
| 389 |
-
"151691": {
|
| 390 |
-
"content": "<blank15>",
|
| 391 |
-
"lstrip": false,
|
| 392 |
-
"normalized": false,
|
| 393 |
-
"rstrip": false,
|
| 394 |
-
"single_word": false,
|
| 395 |
-
"special": true
|
| 396 |
-
},
|
| 397 |
-
"151692": {
|
| 398 |
-
"content": "<blank16>",
|
| 399 |
-
"lstrip": false,
|
| 400 |
-
"normalized": false,
|
| 401 |
-
"rstrip": false,
|
| 402 |
-
"single_word": false,
|
| 403 |
-
"special": true
|
| 404 |
-
},
|
| 405 |
-
"151693": {
|
| 406 |
-
"content": "<blank17>",
|
| 407 |
-
"lstrip": false,
|
| 408 |
-
"normalized": false,
|
| 409 |
-
"rstrip": false,
|
| 410 |
-
"single_word": false,
|
| 411 |
-
"special": true
|
| 412 |
-
},
|
| 413 |
-
"151694": {
|
| 414 |
-
"content": "<blank18>",
|
| 415 |
-
"lstrip": false,
|
| 416 |
-
"normalized": false,
|
| 417 |
-
"rstrip": false,
|
| 418 |
-
"single_word": false,
|
| 419 |
-
"special": true
|
| 420 |
-
},
|
| 421 |
-
"151695": {
|
| 422 |
-
"content": "<blank19>",
|
| 423 |
-
"lstrip": false,
|
| 424 |
-
"normalized": false,
|
| 425 |
-
"rstrip": false,
|
| 426 |
-
"single_word": false,
|
| 427 |
-
"special": true
|
| 428 |
-
},
|
| 429 |
-
"151696": {
|
| 430 |
-
"content": "<blank20>",
|
| 431 |
-
"lstrip": false,
|
| 432 |
-
"normalized": false,
|
| 433 |
-
"rstrip": false,
|
| 434 |
-
"single_word": false,
|
| 435 |
-
"special": true
|
| 436 |
-
},
|
| 437 |
-
"151697": {
|
| 438 |
-
"content": "<blank21>",
|
| 439 |
-
"lstrip": false,
|
| 440 |
-
"normalized": false,
|
| 441 |
-
"rstrip": false,
|
| 442 |
-
"single_word": false,
|
| 443 |
-
"special": true
|
| 444 |
-
},
|
| 445 |
-
"151698": {
|
| 446 |
-
"content": "<blank22>",
|
| 447 |
-
"lstrip": false,
|
| 448 |
-
"normalized": false,
|
| 449 |
-
"rstrip": false,
|
| 450 |
-
"single_word": false,
|
| 451 |
-
"special": true
|
| 452 |
-
},
|
| 453 |
-
"151699": {
|
| 454 |
-
"content": "<blank23>",
|
| 455 |
-
"lstrip": false,
|
| 456 |
-
"normalized": false,
|
| 457 |
-
"rstrip": false,
|
| 458 |
-
"single_word": false,
|
| 459 |
-
"special": true
|
| 460 |
-
},
|
| 461 |
-
"151700": {
|
| 462 |
-
"content": "<blank24>",
|
| 463 |
-
"lstrip": false,
|
| 464 |
-
"normalized": false,
|
| 465 |
-
"rstrip": false,
|
| 466 |
-
"single_word": false,
|
| 467 |
-
"special": true
|
| 468 |
-
},
|
| 469 |
-
"151701": {
|
| 470 |
-
"content": "<blank25>",
|
| 471 |
-
"lstrip": false,
|
| 472 |
-
"normalized": false,
|
| 473 |
-
"rstrip": false,
|
| 474 |
-
"single_word": false,
|
| 475 |
-
"special": true
|
| 476 |
-
},
|
| 477 |
-
"151702": {
|
| 478 |
-
"content": "<blank26>",
|
| 479 |
-
"lstrip": false,
|
| 480 |
-
"normalized": false,
|
| 481 |
-
"rstrip": false,
|
| 482 |
-
"single_word": false,
|
| 483 |
-
"special": true
|
| 484 |
-
},
|
| 485 |
-
"151703": {
|
| 486 |
-
"content": "<blank27>",
|
| 487 |
-
"lstrip": false,
|
| 488 |
-
"normalized": false,
|
| 489 |
-
"rstrip": false,
|
| 490 |
-
"single_word": false,
|
| 491 |
-
"special": true
|
| 492 |
-
},
|
| 493 |
-
"151704": {
|
| 494 |
-
"content": "<asr_text>",
|
| 495 |
-
"lstrip": false,
|
| 496 |
-
"normalized": false,
|
| 497 |
-
"rstrip": false,
|
| 498 |
-
"single_word": false,
|
| 499 |
-
"special": false
|
| 500 |
-
}
|
| 501 |
-
},
|
| 502 |
-
"additional_special_tokens": [
|
| 503 |
-
"<|im_start|>",
|
| 504 |
-
"<|im_end|>",
|
| 505 |
-
"<|object_ref_start|>",
|
| 506 |
-
"<|object_ref_end|>",
|
| 507 |
-
"<|box_start|>",
|
| 508 |
-
"<|box_end|>",
|
| 509 |
-
"<|quad_start|>",
|
| 510 |
-
"<|quad_end|>",
|
| 511 |
-
"<|vision_start|>",
|
| 512 |
-
"<|vision_end|>",
|
| 513 |
-
"<|vision_pad|>",
|
| 514 |
-
"<|image_pad|>",
|
| 515 |
-
"<|video_pad|>",
|
| 516 |
-
"<|audio_start|>",
|
| 517 |
-
"<|audio_end|>",
|
| 518 |
-
"<tts_pad>",
|
| 519 |
-
"<tts_text_bos>",
|
| 520 |
-
"<tts_text_bos_single>",
|
| 521 |
-
"<|audio_pad|>"
|
| 522 |
-
],
|
| 523 |
-
"audio_bos_token": "<|audio_start|>",
|
| 524 |
-
"audio_eos_token": "<|audio_end|>",
|
| 525 |
-
"audio_token": "<|audio_pad|>",
|
| 526 |
-
"bos_token": null,
|
| 527 |
-
"clean_up_tokenization_spaces": false,
|
| 528 |
-
"eos_token": "<|im_end|>",
|
| 529 |
-
"errors": "replace",
|
| 530 |
-
"extra_special_tokens": {
|
| 531 |
-
"audio_bos_token": "<|audio_start|>",
|
| 532 |
-
"audio_eos_token": "<|audio_end|>",
|
| 533 |
-
"audio_token": "<|audio_pad|>",
|
| 534 |
-
"image_token": "<|image_pad|>",
|
| 535 |
-
"video_token": "<|video_pad|>",
|
| 536 |
-
"vision_bos_token": "<|vision_start|>",
|
| 537 |
-
"vision_eos_token": "<|vision_end|>"
|
| 538 |
-
},
|
| 539 |
-
"image_token": "<|image_pad|>",
|
| 540 |
-
"model_max_length": 131072,
|
| 541 |
-
"pad_token": "<|endoftext|>",
|
| 542 |
-
"processor_class": "Qwen3ASRProcessor",
|
| 543 |
-
"split_special_tokens": false,
|
| 544 |
-
"tokenizer_class": "Qwen2Tokenizer",
|
| 545 |
-
"unk_token": null,
|
| 546 |
-
"video_token": "<|video_pad|>",
|
| 547 |
-
"vision_bos_token": "<|vision_start|>",
|
| 548 |
-
"vision_eos_token": "<|vision_end|>"
|
| 549 |
-
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
lora/lora-stage2/trainer_state.json
DELETED
|
The diff for this file is too large to render.
See raw diff
|
|
|
lora/lora-stage2/vocab.json
DELETED
|
The diff for this file is too large to render.
See raw diff
|
|
|
lora/lora-stage3/README.md
DELETED
|
@@ -1,207 +0,0 @@
|
|
| 1 |
-
---
|
| 2 |
-
base_model: /data/haobin/Qwen3-ASR/Qwen3-ASR-1.7B-lora-merged
|
| 3 |
-
library_name: peft
|
| 4 |
-
pipeline_tag: text-generation
|
| 5 |
-
tags:
|
| 6 |
-
- base_model:adapter:/data/haobin/Qwen3-ASR/Qwen3-ASR-1.7B-lora-merged
|
| 7 |
-
- lora
|
| 8 |
-
- transformers
|
| 9 |
-
---
|
| 10 |
-
|
| 11 |
-
# Model Card for Model ID
|
| 12 |
-
|
| 13 |
-
<!-- Provide a quick summary of what the model is/does. -->
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
## Model Details
|
| 18 |
-
|
| 19 |
-
### Model Description
|
| 20 |
-
|
| 21 |
-
<!-- Provide a longer summary of what this model is. -->
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
- **Developed by:** [More Information Needed]
|
| 26 |
-
- **Funded by [optional]:** [More Information Needed]
|
| 27 |
-
- **Shared by [optional]:** [More Information Needed]
|
| 28 |
-
- **Model type:** [More Information Needed]
|
| 29 |
-
- **Language(s) (NLP):** [More Information Needed]
|
| 30 |
-
- **License:** [More Information Needed]
|
| 31 |
-
- **Finetuned from model [optional]:** [More Information Needed]
|
| 32 |
-
|
| 33 |
-
### Model Sources [optional]
|
| 34 |
-
|
| 35 |
-
<!-- Provide the basic links for the model. -->
|
| 36 |
-
|
| 37 |
-
- **Repository:** [More Information Needed]
|
| 38 |
-
- **Paper [optional]:** [More Information Needed]
|
| 39 |
-
- **Demo [optional]:** [More Information Needed]
|
| 40 |
-
|
| 41 |
-
## Uses
|
| 42 |
-
|
| 43 |
-
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
| 44 |
-
|
| 45 |
-
### Direct Use
|
| 46 |
-
|
| 47 |
-
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
|
| 48 |
-
|
| 49 |
-
[More Information Needed]
|
| 50 |
-
|
| 51 |
-
### Downstream Use [optional]
|
| 52 |
-
|
| 53 |
-
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
|
| 54 |
-
|
| 55 |
-
[More Information Needed]
|
| 56 |
-
|
| 57 |
-
### Out-of-Scope Use
|
| 58 |
-
|
| 59 |
-
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
|
| 60 |
-
|
| 61 |
-
[More Information Needed]
|
| 62 |
-
|
| 63 |
-
## Bias, Risks, and Limitations
|
| 64 |
-
|
| 65 |
-
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
| 66 |
-
|
| 67 |
-
[More Information Needed]
|
| 68 |
-
|
| 69 |
-
### Recommendations
|
| 70 |
-
|
| 71 |
-
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
|
| 72 |
-
|
| 73 |
-
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
|
| 74 |
-
|
| 75 |
-
## How to Get Started with the Model
|
| 76 |
-
|
| 77 |
-
Use the code below to get started with the model.
|
| 78 |
-
|
| 79 |
-
[More Information Needed]
|
| 80 |
-
|
| 81 |
-
## Training Details
|
| 82 |
-
|
| 83 |
-
### Training Data
|
| 84 |
-
|
| 85 |
-
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
| 86 |
-
|
| 87 |
-
[More Information Needed]
|
| 88 |
-
|
| 89 |
-
### Training Procedure
|
| 90 |
-
|
| 91 |
-
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
| 92 |
-
|
| 93 |
-
#### Preprocessing [optional]
|
| 94 |
-
|
| 95 |
-
[More Information Needed]
|
| 96 |
-
|
| 97 |
-
|
| 98 |
-
#### Training Hyperparameters
|
| 99 |
-
|
| 100 |
-
- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
|
| 101 |
-
|
| 102 |
-
#### Speeds, Sizes, Times [optional]
|
| 103 |
-
|
| 104 |
-
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
|
| 105 |
-
|
| 106 |
-
[More Information Needed]
|
| 107 |
-
|
| 108 |
-
## Evaluation
|
| 109 |
-
|
| 110 |
-
<!-- This section describes the evaluation protocols and provides the results. -->
|
| 111 |
-
|
| 112 |
-
### Testing Data, Factors & Metrics
|
| 113 |
-
|
| 114 |
-
#### Testing Data
|
| 115 |
-
|
| 116 |
-
<!-- This should link to a Dataset Card if possible. -->
|
| 117 |
-
|
| 118 |
-
[More Information Needed]
|
| 119 |
-
|
| 120 |
-
#### Factors
|
| 121 |
-
|
| 122 |
-
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
|
| 123 |
-
|
| 124 |
-
[More Information Needed]
|
| 125 |
-
|
| 126 |
-
#### Metrics
|
| 127 |
-
|
| 128 |
-
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
| 129 |
-
|
| 130 |
-
[More Information Needed]
|
| 131 |
-
|
| 132 |
-
### Results
|
| 133 |
-
|
| 134 |
-
[More Information Needed]
|
| 135 |
-
|
| 136 |
-
#### Summary
|
| 137 |
-
|
| 138 |
-
|
| 139 |
-
|
| 140 |
-
## Model Examination [optional]
|
| 141 |
-
|
| 142 |
-
<!-- Relevant interpretability work for the model goes here -->
|
| 143 |
-
|
| 144 |
-
[More Information Needed]
|
| 145 |
-
|
| 146 |
-
## Environmental Impact
|
| 147 |
-
|
| 148 |
-
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
|
| 149 |
-
|
| 150 |
-
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
|
| 151 |
-
|
| 152 |
-
- **Hardware Type:** [More Information Needed]
|
| 153 |
-
- **Hours used:** [More Information Needed]
|
| 154 |
-
- **Cloud Provider:** [More Information Needed]
|
| 155 |
-
- **Compute Region:** [More Information Needed]
|
| 156 |
-
- **Carbon Emitted:** [More Information Needed]
|
| 157 |
-
|
| 158 |
-
## Technical Specifications [optional]
|
| 159 |
-
|
| 160 |
-
### Model Architecture and Objective
|
| 161 |
-
|
| 162 |
-
[More Information Needed]
|
| 163 |
-
|
| 164 |
-
### Compute Infrastructure
|
| 165 |
-
|
| 166 |
-
[More Information Needed]
|
| 167 |
-
|
| 168 |
-
#### Hardware
|
| 169 |
-
|
| 170 |
-
[More Information Needed]
|
| 171 |
-
|
| 172 |
-
#### Software
|
| 173 |
-
|
| 174 |
-
[More Information Needed]
|
| 175 |
-
|
| 176 |
-
## Citation [optional]
|
| 177 |
-
|
| 178 |
-
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
| 179 |
-
|
| 180 |
-
**BibTeX:**
|
| 181 |
-
|
| 182 |
-
[More Information Needed]
|
| 183 |
-
|
| 184 |
-
**APA:**
|
| 185 |
-
|
| 186 |
-
[More Information Needed]
|
| 187 |
-
|
| 188 |
-
## Glossary [optional]
|
| 189 |
-
|
| 190 |
-
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
|
| 191 |
-
|
| 192 |
-
[More Information Needed]
|
| 193 |
-
|
| 194 |
-
## More Information [optional]
|
| 195 |
-
|
| 196 |
-
[More Information Needed]
|
| 197 |
-
|
| 198 |
-
## Model Card Authors [optional]
|
| 199 |
-
|
| 200 |
-
[More Information Needed]
|
| 201 |
-
|
| 202 |
-
## Model Card Contact
|
| 203 |
-
|
| 204 |
-
[More Information Needed]
|
| 205 |
-
### Framework versions
|
| 206 |
-
|
| 207 |
-
- PEFT 0.18.1
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
lora/lora-stage3/adapter_config.json
DELETED
|
@@ -1,38 +0,0 @@
|
|
| 1 |
-
{
|
| 2 |
-
"alora_invocation_tokens": null,
|
| 3 |
-
"alpha_pattern": {},
|
| 4 |
-
"arrow_config": null,
|
| 5 |
-
"auto_mapping": null,
|
| 6 |
-
"base_model_name_or_path": "/data/haobin/Qwen3-ASR/Qwen3-ASR-1.7B-lora-merged",
|
| 7 |
-
"bias": "none",
|
| 8 |
-
"corda_config": null,
|
| 9 |
-
"ensure_weight_tying": false,
|
| 10 |
-
"eva_config": null,
|
| 11 |
-
"exclude_modules": null,
|
| 12 |
-
"fan_in_fan_out": false,
|
| 13 |
-
"inference_mode": true,
|
| 14 |
-
"init_lora_weights": true,
|
| 15 |
-
"layer_replication": null,
|
| 16 |
-
"layers_pattern": null,
|
| 17 |
-
"layers_to_transform": null,
|
| 18 |
-
"loftq_config": {},
|
| 19 |
-
"lora_alpha": 32,
|
| 20 |
-
"lora_bias": false,
|
| 21 |
-
"lora_dropout": 0.05,
|
| 22 |
-
"megatron_config": null,
|
| 23 |
-
"megatron_core": "megatron.core",
|
| 24 |
-
"modules_to_save": [],
|
| 25 |
-
"peft_type": "LORA",
|
| 26 |
-
"peft_version": "0.18.1",
|
| 27 |
-
"qalora_group_size": 16,
|
| 28 |
-
"r": 8,
|
| 29 |
-
"rank_pattern": {},
|
| 30 |
-
"revision": null,
|
| 31 |
-
"target_modules": "^(thinker\\.model(?=\\.).*\\.(k_proj|q_proj|o_proj|up_proj|down_proj|v_proj|gate_proj)|(?!(thinker.audio_tower.proj1|thinker.audio_tower.proj2))thinker\\.audio_tower(?=\\.).*\\.(fc1|out_proj|proj1|k_proj|q_proj|fc2|proj2|v_proj|conv_out)|thinker\\.audio_tower\\.proj1(?=\\.)|thinker\\.audio_tower\\.proj2(?=\\.))$",
|
| 32 |
-
"target_parameters": null,
|
| 33 |
-
"task_type": "CAUSAL_LM",
|
| 34 |
-
"trainable_token_indices": null,
|
| 35 |
-
"use_dora": false,
|
| 36 |
-
"use_qalora": false,
|
| 37 |
-
"use_rslora": false
|
| 38 |
-
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
lora/lora-stage3/adapter_model.safetensors
DELETED
|
@@ -1,3 +0,0 @@
|
|
| 1 |
-
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:1b5507acb5bb51851c4db58504cac3dcc748dbc37210b986e93624bb9ea115b0
|
| 3 |
-
size 49395592
|
|
|
|
|
|
|
|
|
|
|
|
lora/lora-stage3/additional_config.json
DELETED
|
@@ -1 +0,0 @@
|
|
| 1 |
-
{"lora_dtype": null, "lorap_lr_ratio": null, "lorap_emb_lr": 1e-06}
|
|
|
|
|
|
lora/lora-stage3/args.json
DELETED
|
@@ -1,502 +0,0 @@
|
|
| 1 |
-
{
|
| 2 |
-
"output_dir": "/data/haobin/pky_train/qwen3_swift/pky_out/qwen3asr_dapo_reward5_3x8x8_12gen_3GPU/v3-20260410-173721",
|
| 3 |
-
"overwrite_output_dir": false,
|
| 4 |
-
"do_train": false,
|
| 5 |
-
"do_eval": false,
|
| 6 |
-
"do_predict": false,
|
| 7 |
-
"eval_strategy": "steps",
|
| 8 |
-
"prediction_loss_only": false,
|
| 9 |
-
"per_device_train_batch_size": 4,
|
| 10 |
-
"per_device_eval_batch_size": 4,
|
| 11 |
-
"per_gpu_train_batch_size": null,
|
| 12 |
-
"per_gpu_eval_batch_size": null,
|
| 13 |
-
"gradient_accumulation_steps": 16,
|
| 14 |
-
"eval_accumulation_steps": null,
|
| 15 |
-
"eval_delay": 0,
|
| 16 |
-
"torch_empty_cache_steps": null,
|
| 17 |
-
"learning_rate": 5e-05,
|
| 18 |
-
"weight_decay": 0.1,
|
| 19 |
-
"adam_beta1": 0.9,
|
| 20 |
-
"adam_beta2": 0.95,
|
| 21 |
-
"adam_epsilon": 1e-08,
|
| 22 |
-
"max_grad_norm": 1.0,
|
| 23 |
-
"num_train_epochs": 3.0,
|
| 24 |
-
"max_steps": -1,
|
| 25 |
-
"lr_scheduler_type": "cosine",
|
| 26 |
-
"lr_scheduler_kwargs": null,
|
| 27 |
-
"warmup_ratio": 0.03,
|
| 28 |
-
"warmup_steps": 0,
|
| 29 |
-
"log_level": "passive",
|
| 30 |
-
"log_level_replica": "warning",
|
| 31 |
-
"log_on_each_node": true,
|
| 32 |
-
"logging_dir": "/data/haobin/pky_train/qwen3_swift/pky_out/qwen3asr_dapo_reward5_3x8x8_12gen_3GPU/v3-20260410-173721/runs",
|
| 33 |
-
"logging_strategy": "steps",
|
| 34 |
-
"logging_first_step": true,
|
| 35 |
-
"logging_steps": 5,
|
| 36 |
-
"logging_nan_inf_filter": true,
|
| 37 |
-
"save_strategy": "steps",
|
| 38 |
-
"save_steps": 20.0,
|
| 39 |
-
"save_total_limit": null,
|
| 40 |
-
"save_safetensors": true,
|
| 41 |
-
"save_on_each_node": false,
|
| 42 |
-
"save_only_model": false,
|
| 43 |
-
"restore_callback_states_from_checkpoint": false,
|
| 44 |
-
"no_cuda": false,
|
| 45 |
-
"use_cpu": false,
|
| 46 |
-
"use_mps_device": false,
|
| 47 |
-
"seed": 42,
|
| 48 |
-
"data_seed": 42,
|
| 49 |
-
"jit_mode_eval": false,
|
| 50 |
-
"bf16": true,
|
| 51 |
-
"fp16": false,
|
| 52 |
-
"fp16_opt_level": "O1",
|
| 53 |
-
"half_precision_backend": "auto",
|
| 54 |
-
"bf16_full_eval": false,
|
| 55 |
-
"fp16_full_eval": false,
|
| 56 |
-
"tf32": null,
|
| 57 |
-
"local_rank": 0,
|
| 58 |
-
"ddp_backend": null,
|
| 59 |
-
"tpu_num_cores": null,
|
| 60 |
-
"tpu_metrics_debug": false,
|
| 61 |
-
"debug": null,
|
| 62 |
-
"dataloader_drop_last": false,
|
| 63 |
-
"eval_steps": 20.0,
|
| 64 |
-
"dataloader_num_workers": null,
|
| 65 |
-
"dataloader_prefetch_factor": null,
|
| 66 |
-
"past_index": -1,
|
| 67 |
-
"run_name": "qwen3asr_dapo_reward5_3x8x8_12gen_3GPU",
|
| 68 |
-
"disable_tqdm": null,
|
| 69 |
-
"remove_unused_columns": false,
|
| 70 |
-
"label_names": null,
|
| 71 |
-
"load_best_model_at_end": false,
|
| 72 |
-
"metric_for_best_model": "loss",
|
| 73 |
-
"greater_is_better": false,
|
| 74 |
-
"ignore_data_skip": false,
|
| 75 |
-
"fsdp": [],
|
| 76 |
-
"fsdp_min_num_params": 0,
|
| 77 |
-
"fsdp_config": null,
|
| 78 |
-
"fsdp_transformer_layer_cls_to_wrap": null,
|
| 79 |
-
"accelerator_config": {
|
| 80 |
-
"dispatch_batches": false
|
| 81 |
-
},
|
| 82 |
-
"parallelism_config": null,
|
| 83 |
-
"deepspeed": null,
|
| 84 |
-
"label_smoothing_factor": 0.0,
|
| 85 |
-
"optim": "adamw_torch_fused",
|
| 86 |
-
"optim_args": null,
|
| 87 |
-
"adafactor": false,
|
| 88 |
-
"group_by_length": false,
|
| 89 |
-
"length_column_name": "length",
|
| 90 |
-
"report_to": [
|
| 91 |
-
"wandb"
|
| 92 |
-
],
|
| 93 |
-
"project": "huggingface",
|
| 94 |
-
"trackio_space_id": "trackio",
|
| 95 |
-
"ddp_find_unused_parameters": null,
|
| 96 |
-
"ddp_bucket_cap_mb": null,
|
| 97 |
-
"ddp_broadcast_buffers": null,
|
| 98 |
-
"dataloader_pin_memory": true,
|
| 99 |
-
"dataloader_persistent_workers": false,
|
| 100 |
-
"skip_memory_metrics": true,
|
| 101 |
-
"use_legacy_prediction_loop": false,
|
| 102 |
-
"push_to_hub": false,
|
| 103 |
-
"resume_from_checkpoint": null,
|
| 104 |
-
"hub_model_id": null,
|
| 105 |
-
"hub_strategy": "every_save",
|
| 106 |
-
"hub_token": null,
|
| 107 |
-
"hub_private_repo": null,
|
| 108 |
-
"hub_always_push": false,
|
| 109 |
-
"hub_revision": null,
|
| 110 |
-
"gradient_checkpointing": true,
|
| 111 |
-
"gradient_checkpointing_kwargs": null,
|
| 112 |
-
"include_inputs_for_metrics": false,
|
| 113 |
-
"include_for_metrics": [],
|
| 114 |
-
"eval_do_concat_batches": true,
|
| 115 |
-
"fp16_backend": "auto",
|
| 116 |
-
"push_to_hub_model_id": null,
|
| 117 |
-
"push_to_hub_organization": null,
|
| 118 |
-
"push_to_hub_token": null,
|
| 119 |
-
"mp_parameters": "",
|
| 120 |
-
"auto_find_batch_size": false,
|
| 121 |
-
"full_determinism": false,
|
| 122 |
-
"torchdynamo": null,
|
| 123 |
-
"ray_scope": "last",
|
| 124 |
-
"ddp_timeout": 18000000,
|
| 125 |
-
"torch_compile": false,
|
| 126 |
-
"torch_compile_backend": null,
|
| 127 |
-
"torch_compile_mode": null,
|
| 128 |
-
"include_tokens_per_second": false,
|
| 129 |
-
"include_num_input_tokens_seen": false,
|
| 130 |
-
"neftune_noise_alpha": null,
|
| 131 |
-
"optim_target_modules": null,
|
| 132 |
-
"batch_eval_metrics": false,
|
| 133 |
-
"eval_on_start": false,
|
| 134 |
-
"use_liger_kernel": false,
|
| 135 |
-
"liger_kernel_config": null,
|
| 136 |
-
"eval_use_gather_object": false,
|
| 137 |
-
"average_tokens_across_devices": true,
|
| 138 |
-
"sortish_sampler": false,
|
| 139 |
-
"predict_with_generate": false,
|
| 140 |
-
"generation_max_length": null,
|
| 141 |
-
"generation_num_beams": null,
|
| 142 |
-
"generation_config": null,
|
| 143 |
-
"tuner_backend": "peft",
|
| 144 |
-
"vit_gradient_checkpointing": null,
|
| 145 |
-
"router_aux_loss_coef": 0.0,
|
| 146 |
-
"enable_dft_loss": false,
|
| 147 |
-
"enable_channel_loss": false,
|
| 148 |
-
"safe_serialization": true,
|
| 149 |
-
"max_shard_size": "5GB",
|
| 150 |
-
"check_model": true,
|
| 151 |
-
"acc_strategy": "token",
|
| 152 |
-
"train_dataloader_shuffle": true,
|
| 153 |
-
"max_epochs": null,
|
| 154 |
-
"aligner_lr": null,
|
| 155 |
-
"vit_lr": null,
|
| 156 |
-
"use_logits_to_keep": null,
|
| 157 |
-
"ds3_gather_for_generation": true,
|
| 158 |
-
"resume_only_model": false,
|
| 159 |
-
"optimizer": null,
|
| 160 |
-
"loss_type": "dapo",
|
| 161 |
-
"eval_metric": null,
|
| 162 |
-
"callbacks": [],
|
| 163 |
-
"early_stop_interval": null,
|
| 164 |
-
"eval_use_evalscope": false,
|
| 165 |
-
"eval_dataset": [],
|
| 166 |
-
"eval_dataset_args": null,
|
| 167 |
-
"eval_limit": null,
|
| 168 |
-
"eval_generation_config": null,
|
| 169 |
-
"extra_eval_args": null,
|
| 170 |
-
"tuner_type": "lora",
|
| 171 |
-
"use_galore": false,
|
| 172 |
-
"galore_target_modules": null,
|
| 173 |
-
"galore_rank": 128,
|
| 174 |
-
"galore_update_proj_gap": 50,
|
| 175 |
-
"galore_scale": 1.0,
|
| 176 |
-
"galore_proj_type": "std",
|
| 177 |
-
"galore_optim_per_parameter": false,
|
| 178 |
-
"galore_with_embedding": false,
|
| 179 |
-
"galore_quantization": false,
|
| 180 |
-
"galore_proj_quant": false,
|
| 181 |
-
"galore_proj_bits": 4,
|
| 182 |
-
"galore_proj_group_size": 256,
|
| 183 |
-
"galore_cos_threshold": 0.4,
|
| 184 |
-
"galore_gamma_proj": 2,
|
| 185 |
-
"galore_queue_size": 5,
|
| 186 |
-
"lisa_activated_layers": 0,
|
| 187 |
-
"lisa_step_interval": 20,
|
| 188 |
-
"use_flash_ckpt": false,
|
| 189 |
-
"use_ray": false,
|
| 190 |
-
"ray_exp_name": null,
|
| 191 |
-
"device_groups": null,
|
| 192 |
-
"model": "/data/haobin/Qwen3-ASR/Qwen3-ASR-1.7B-lora-merged",
|
| 193 |
-
"model_type": "my_qwen3_asr_rl",
|
| 194 |
-
"model_revision": null,
|
| 195 |
-
"task_type": "causal_lm",
|
| 196 |
-
"torch_dtype": "bfloat16",
|
| 197 |
-
"attn_impl": null,
|
| 198 |
-
"experts_impl": null,
|
| 199 |
-
"new_special_tokens": [],
|
| 200 |
-
"num_labels": null,
|
| 201 |
-
"problem_type": null,
|
| 202 |
-
"rope_scaling": null,
|
| 203 |
-
"device_map": null,
|
| 204 |
-
"max_memory": {},
|
| 205 |
-
"max_model_len": null,
|
| 206 |
-
"local_repo_path": null,
|
| 207 |
-
"init_strategy": null,
|
| 208 |
-
"template": "my_qwen3_asr_rl",
|
| 209 |
-
"system": null,
|
| 210 |
-
"max_length": 65536,
|
| 211 |
-
"truncation_strategy": "delete",
|
| 212 |
-
"max_pixels": null,
|
| 213 |
-
"agent_template": null,
|
| 214 |
-
"norm_bbox": null,
|
| 215 |
-
"use_chat_template": true,
|
| 216 |
-
"padding_side": "left",
|
| 217 |
-
"padding_free": false,
|
| 218 |
-
"loss_scale": "last_round",
|
| 219 |
-
"sequence_parallel_size": 1,
|
| 220 |
-
"template_backend": "swift",
|
| 221 |
-
"response_prefix": null,
|
| 222 |
-
"enable_thinking": null,
|
| 223 |
-
"add_non_thinking_prefix": true,
|
| 224 |
-
"dataset": [
|
| 225 |
-
"/data/haobin/batch_process/lora_0323_10w+55w+error+syn_with_domain_train90_targeted_rl_train90_loramerged_basewer_271.jsonl"
|
| 226 |
-
],
|
| 227 |
-
"val_dataset": [
|
| 228 |
-
"/data/haobin/batch_process/lora_0323_10w+55w+error+syn_with_domain_train90_targeted_rl_val5_sample5p.jsonl"
|
| 229 |
-
],
|
| 230 |
-
"cached_dataset": [],
|
| 231 |
-
"cached_val_dataset": [],
|
| 232 |
-
"split_dataset_ratio": 0.0,
|
| 233 |
-
"dataset_num_proc": 1,
|
| 234 |
-
"load_from_cache_file": false,
|
| 235 |
-
"dataset_shuffle": true,
|
| 236 |
-
"val_dataset_shuffle": false,
|
| 237 |
-
"streaming": false,
|
| 238 |
-
"interleave_prob": null,
|
| 239 |
-
"stopping_strategy": "first_exhausted",
|
| 240 |
-
"shuffle_buffer_size": 1000,
|
| 241 |
-
"download_mode": "reuse_dataset_if_exists",
|
| 242 |
-
"columns": {},
|
| 243 |
-
"strict": false,
|
| 244 |
-
"model_name": null,
|
| 245 |
-
"model_author": null,
|
| 246 |
-
"custom_dataset_info": [],
|
| 247 |
-
"quant_method": null,
|
| 248 |
-
"quant_bits": null,
|
| 249 |
-
"hqq_axis": null,
|
| 250 |
-
"bnb_4bit_compute_dtype": "bfloat16",
|
| 251 |
-
"bnb_4bit_quant_type": "nf4",
|
| 252 |
-
"bnb_4bit_use_double_quant": true,
|
| 253 |
-
"bnb_4bit_quant_storage": null,
|
| 254 |
-
"max_new_tokens": 256,
|
| 255 |
-
"temperature": 0.5,
|
| 256 |
-
"top_k": 50,
|
| 257 |
-
"top_p": 0.95,
|
| 258 |
-
"repetition_penalty": 1.08,
|
| 259 |
-
"num_beams": 1,
|
| 260 |
-
"stream": false,
|
| 261 |
-
"stop_words": [],
|
| 262 |
-
"logprobs": false,
|
| 263 |
-
"top_logprobs": null,
|
| 264 |
-
"structured_outputs_regex": null,
|
| 265 |
-
"train_type": "lora",
|
| 266 |
-
"adapters": [],
|
| 267 |
-
"external_plugins": [
|
| 268 |
-
"/data/haobin/pky_train/qwen3_swift/my_qwen3_asr_dapo_register.py",
|
| 269 |
-
"/data/haobin/pky_train/qwen3_swift/qwen3_RL_reward5.py"
|
| 270 |
-
],
|
| 271 |
-
"custom_register_path": [],
|
| 272 |
-
"model_kwargs": {},
|
| 273 |
-
"load_args": false,
|
| 274 |
-
"load_data_args": false,
|
| 275 |
-
"packing": false,
|
| 276 |
-
"packing_length": null,
|
| 277 |
-
"packing_num_proc": 1,
|
| 278 |
-
"lazy_tokenize": true,
|
| 279 |
-
"use_hf": false,
|
| 280 |
-
"ignore_args_error": false,
|
| 281 |
-
"use_swift_lora": false,
|
| 282 |
-
"freeze_parameters": [],
|
| 283 |
-
"freeze_parameters_regex": null,
|
| 284 |
-
"freeze_parameters_ratio": 0.0,
|
| 285 |
-
"trainable_parameters": [],
|
| 286 |
-
"trainable_parameters_regex": null,
|
| 287 |
-
"freeze_llm": false,
|
| 288 |
-
"freeze_vit": false,
|
| 289 |
-
"freeze_aligner": false,
|
| 290 |
-
"target_modules": [
|
| 291 |
-
"all-linear"
|
| 292 |
-
],
|
| 293 |
-
"target_regex": null,
|
| 294 |
-
"target_parameters": null,
|
| 295 |
-
"modules_to_save": [],
|
| 296 |
-
"lora_rank": 8,
|
| 297 |
-
"lora_alpha": 32,
|
| 298 |
-
"lora_dropout": 0.05,
|
| 299 |
-
"lora_bias": "none",
|
| 300 |
-
"lora_dtype": null,
|
| 301 |
-
"lorap_lr_ratio": null,
|
| 302 |
-
"use_rslora": false,
|
| 303 |
-
"use_dora": false,
|
| 304 |
-
"lora_ga_batch_size": 2,
|
| 305 |
-
"lora_ga_iters": 2,
|
| 306 |
-
"lora_ga_max_length": 1024,
|
| 307 |
-
"lora_ga_direction": "ArB2r",
|
| 308 |
-
"lora_ga_scale": "stable",
|
| 309 |
-
"lora_ga_stable_gamma": 16,
|
| 310 |
-
"init_weights": true,
|
| 311 |
-
"fourier_n_frequency": 2000,
|
| 312 |
-
"fourier_scaling": 300.0,
|
| 313 |
-
"boft_block_size": 4,
|
| 314 |
-
"boft_block_num": 0,
|
| 315 |
-
"boft_n_butterfly_factor": 1,
|
| 316 |
-
"boft_dropout": 0.0,
|
| 317 |
-
"vera_rank": 256,
|
| 318 |
-
"vera_projection_prng_key": 0,
|
| 319 |
-
"vera_dropout": 0.0,
|
| 320 |
-
"vera_d_initial": 0.1,
|
| 321 |
-
"adapter_act": "gelu",
|
| 322 |
-
"adapter_length": 128,
|
| 323 |
-
"adalora_target_r": 8,
|
| 324 |
-
"adalora_init_r": 12,
|
| 325 |
-
"adalora_tinit": 0,
|
| 326 |
-
"adalora_tfinal": 0,
|
| 327 |
-
"adalora_deltaT": 1,
|
| 328 |
-
"adalora_beta1": 0.85,
|
| 329 |
-
"adalora_beta2": 0.85,
|
| 330 |
-
"adalora_orth_reg_weight": 0.5,
|
| 331 |
-
"llamapro_num_new_blocks": 4,
|
| 332 |
-
"llamapro_num_groups": null,
|
| 333 |
-
"reft_layer_key": null,
|
| 334 |
-
"reft_layers": null,
|
| 335 |
-
"reft_rank": 4,
|
| 336 |
-
"reft_intervention_type": "LoreftIntervention",
|
| 337 |
-
"reft_args": null,
|
| 338 |
-
"swanlab_token": null,
|
| 339 |
-
"swanlab_project": "ms-swift",
|
| 340 |
-
"swanlab_workspace": null,
|
| 341 |
-
"swanlab_exp_name": null,
|
| 342 |
-
"swanlab_notification_method": null,
|
| 343 |
-
"swanlab_webhook_url": null,
|
| 344 |
-
"swanlab_secret": null,
|
| 345 |
-
"swanlab_sender_email": null,
|
| 346 |
-
"swanlab_receiver_email": null,
|
| 347 |
-
"swanlab_smtp_server": null,
|
| 348 |
-
"swanlab_smtp_port": null,
|
| 349 |
-
"swanlab_email_language": "zh",
|
| 350 |
-
"swanlab_mode": "cloud",
|
| 351 |
-
"add_version": true,
|
| 352 |
-
"create_checkpoint_symlink": false,
|
| 353 |
-
"zero_hpz_partition_size": null,
|
| 354 |
-
"deepspeed_autotp_size": null,
|
| 355 |
-
"reward_model": null,
|
| 356 |
-
"reward_adapters": [],
|
| 357 |
-
"reward_model_type": null,
|
| 358 |
-
"reward_model_revision": null,
|
| 359 |
-
"num_ppo_epochs": 4,
|
| 360 |
-
"whiten_rewards": false,
|
| 361 |
-
"kl_coef": 0.05,
|
| 362 |
-
"cliprange": 0.2,
|
| 363 |
-
"vf_coef": 0.1,
|
| 364 |
-
"cliprange_value": 0.2,
|
| 365 |
-
"gamma": 1.0,
|
| 366 |
-
"lam": 0.95,
|
| 367 |
-
"num_mini_batches": 1,
|
| 368 |
-
"local_rollout_forward_batch_size": 64,
|
| 369 |
-
"num_sample_generations": 10,
|
| 370 |
-
"response_length": 256,
|
| 371 |
-
"missing_eos_penalty": null,
|
| 372 |
-
"vllm_gpu_memory_utilization": 0.9,
|
| 373 |
-
"vllm_tensor_parallel_size": 1,
|
| 374 |
-
"vllm_pipeline_parallel_size": 1,
|
| 375 |
-
"vllm_enable_expert_parallel": false,
|
| 376 |
-
"vllm_max_num_seqs": null,
|
| 377 |
-
"vllm_max_model_len": null,
|
| 378 |
-
"vllm_disable_custom_all_reduce": true,
|
| 379 |
-
"vllm_enforce_eager": false,
|
| 380 |
-
"vllm_limit_mm_per_prompt": null,
|
| 381 |
-
"vllm_max_lora_rank": 16,
|
| 382 |
-
"vllm_enable_prefix_caching": true,
|
| 383 |
-
"vllm_use_async_engine": null,
|
| 384 |
-
"vllm_quantization": null,
|
| 385 |
-
"vllm_reasoning_parser": null,
|
| 386 |
-
"vllm_disable_cascade_attn": false,
|
| 387 |
-
"vllm_mm_processor_cache_gb": null,
|
| 388 |
-
"vllm_speculative_config": null,
|
| 389 |
-
"vllm_engine_kwargs": {},
|
| 390 |
-
"vllm_data_parallel_size": 1,
|
| 391 |
-
"use_vllm": false,
|
| 392 |
-
"vllm_mode": null,
|
| 393 |
-
"vllm_enable_lora": false,
|
| 394 |
-
"vllm_server_base_url": null,
|
| 395 |
-
"vllm_server_host": null,
|
| 396 |
-
"vllm_server_port": [
|
| 397 |
-
8000
|
| 398 |
-
],
|
| 399 |
-
"vllm_server_timeout": 240.0,
|
| 400 |
-
"vllm_server_group_port": null,
|
| 401 |
-
"enable_flattened_weight_sync": true,
|
| 402 |
-
"async_generate": false,
|
| 403 |
-
"sleep_level": 0,
|
| 404 |
-
"move_model_batches": null,
|
| 405 |
-
"offload_optimizer": false,
|
| 406 |
-
"offload_model": false,
|
| 407 |
-
"wandb_log_unique_prompts": null,
|
| 408 |
-
"epsilon": 0.2,
|
| 409 |
-
"epsilon_high": 0.28,
|
| 410 |
-
"delta": null,
|
| 411 |
-
"cosine_min_len_value_wrong": -0.5,
|
| 412 |
-
"cosine_max_len_value_wrong": 0.0,
|
| 413 |
-
"cosine_min_len_value_correct": 1.0,
|
| 414 |
-
"cosine_max_len_value_correct": 0.5,
|
| 415 |
-
"cosine_max_len": null,
|
| 416 |
-
"repetition_n_grams": 3,
|
| 417 |
-
"repetition_max_penalty": -1.0,
|
| 418 |
-
"reward_model_plugin": null,
|
| 419 |
-
"chord_sft_dataset": [],
|
| 420 |
-
"chord_sft_per_device_train_batch_size": null,
|
| 421 |
-
"chord_enable_phi_function": false,
|
| 422 |
-
"chord_mu_warmup_steps": null,
|
| 423 |
-
"chord_mu_decay_steps": null,
|
| 424 |
-
"chord_mu_peak": null,
|
| 425 |
-
"chord_mu_valley": null,
|
| 426 |
-
"sync_ref_model": false,
|
| 427 |
-
"ref_model_sync_steps": 512,
|
| 428 |
-
"ref_model_mixup_alpha": 0.6,
|
| 429 |
-
"multi_turn_scheduler": null,
|
| 430 |
-
"max_turns": null,
|
| 431 |
-
"completion_length_limit_scope": "per_round",
|
| 432 |
-
"vllm_server_pass_dataset": false,
|
| 433 |
-
"dynamic_sample": true,
|
| 434 |
-
"max_resample_times": 4,
|
| 435 |
-
"overlong_filter": true,
|
| 436 |
-
"soft_max_length": null,
|
| 437 |
-
"soft_cache_length": null,
|
| 438 |
-
"scale_rewards": "group",
|
| 439 |
-
"log_entropy": false,
|
| 440 |
-
"top_entropy_quantile": 1.0,
|
| 441 |
-
"importance_sampling_level": "token",
|
| 442 |
-
"tau_pos": 1.0,
|
| 443 |
-
"tau_neg": 1.05,
|
| 444 |
-
"advantage_estimator": "grpo",
|
| 445 |
-
"kl_in_reward": false,
|
| 446 |
-
"generation_batch_size": 48,
|
| 447 |
-
"steps_per_generation": null,
|
| 448 |
-
"num_generations_eval": 4,
|
| 449 |
-
"rollout_importance_sampling_mode": null,
|
| 450 |
-
"rollout_importance_sampling_threshold": 2.0,
|
| 451 |
-
"log_rollout_offpolicy_metrics": false,
|
| 452 |
-
"off_policy_sequence_mask_delta": null,
|
| 453 |
-
"num_generations": 12,
|
| 454 |
-
"reward_funcs": [
|
| 455 |
-
"asr_wer_hallu_len_v5"
|
| 456 |
-
],
|
| 457 |
-
"reward_weights": null,
|
| 458 |
-
"log_completions": true,
|
| 459 |
-
"num_iterations": 2,
|
| 460 |
-
"teacher_model": null,
|
| 461 |
-
"teacher_adapters": [],
|
| 462 |
-
"teacher_model_type": null,
|
| 463 |
-
"teacher_model_revision": null,
|
| 464 |
-
"teacher_deepspeed": null,
|
| 465 |
-
"teacher_model_server": null,
|
| 466 |
-
"rlhf_type": "grpo",
|
| 467 |
-
"ref_model": null,
|
| 468 |
-
"ref_adapters": [],
|
| 469 |
-
"ref_model_type": null,
|
| 470 |
-
"ref_model_revision": null,
|
| 471 |
-
"beta": 0.04,
|
| 472 |
-
"label_smoothing": 0,
|
| 473 |
-
"max_completion_length": 256,
|
| 474 |
-
"rpo_alpha": null,
|
| 475 |
-
"ld_alpha": null,
|
| 476 |
-
"discopop_tau": 0.05,
|
| 477 |
-
"loss_weights": null,
|
| 478 |
-
"cpo_alpha": 1.0,
|
| 479 |
-
"simpo_gamma": 1,
|
| 480 |
-
"desirable_weight": 1.0,
|
| 481 |
-
"undesirable_weight": 1.0,
|
| 482 |
-
"center_rewards_coefficient": null,
|
| 483 |
-
"sft_alpha": 0,
|
| 484 |
-
"lmbda": 0.5,
|
| 485 |
-
"seq_kd": false,
|
| 486 |
-
"gkd_logits_topk": null,
|
| 487 |
-
"offload_teacher_model": false,
|
| 488 |
-
"swift_version": "4.0.3",
|
| 489 |
-
"ckpt_dir": null,
|
| 490 |
-
"rank": 0,
|
| 491 |
-
"global_world_size": 3,
|
| 492 |
-
"local_world_size": 3,
|
| 493 |
-
"model_suffix": "Qwen3-ASR-1.7B-lora-merged",
|
| 494 |
-
"model_info": "ModelInfo(model_type='my_qwen3_asr_rl', model_dir='/data/haobin/Qwen3-ASR/Qwen3-ASR-1.7B-lora-merged', torch_dtype=torch.bfloat16, max_model_len=65536, quant_method=None, quant_bits=None, rope_scaling={'interleaved': True, 'mrope_interleaved': True, 'mrope_section': [24, 20, 20], 'rope_type': 'default', 'type': 'default'}, is_moe_model=False, is_multimodal=True, config=None, task_type='causal_lm', num_labels=None)",
|
| 495 |
-
"model_meta": "ModelMeta(model_type='my_qwen3_asr_rl', model_groups=[ModelGroup(models=[Model(ms_model_id='Qwen/Qwen3-ASR-0.6B', hf_model_id=None, model_path=None, ms_revision=None, hf_revision=None), Model(ms_model_id='Qwen/Qwen3-ASR-1.7B', hf_model_id=None, model_path=None, ms_revision=None, hf_revision=None)], template=None, ignore_patterns=None, requires=None, tags=[])], loader=<class 'my_qwen3_asr_dapo_register.Qwen3ASRRLLoader'>, template='my_qwen3_asr_rl', model_arch=MultiModelKeys(arch_name='my_qwen3_asr_rl', embedding=None, module_list=None, lm_head=None, q_proj=None, k_proj=None, v_proj=None, o_proj=None, attention=None, mlp=None, down_proj=None, qkv_proj=None, qk_proj=None, qa_proj=None, qb_proj=None, kv_proj=None, kva_proj=None, kvb_proj=None, language_model=['thinker.model', 'thinker.lm_head'], aligner=['thinker.audio_tower.proj1', 'thinker.audio_tower.proj2'], vision_tower=['thinker.audio_tower'], generator=[]), architectures=['Qwen3ASRForConditionalGeneration'], additional_saved_files=['generation_config.json', 'preprocessor_config.json', 'processor_config.json', 'tokenizer_config.json', 'tokenizer.json', 'special_tokens_map.json', 'chat_template.json', 'merges.txt', 'vocab.json'], torch_dtype=None, is_multimodal=True, is_reward=False, task_type=None, ignore_patterns=None, requires=['transformers>=4.57', 'qwen-asr', 'librosa'], tags=['audio'])",
|
| 496 |
-
"model_dir": "/data/haobin/Qwen3-ASR/Qwen3-ASR-1.7B-lora-merged",
|
| 497 |
-
"template_meta": "TemplateMeta(template_type='my_qwen3_asr_rl', prefix=[], prompt=['{{QUERY}}'], chat_sep=[], suffix=[''], template_cls=<class 'my_qwen3_asr_dapo_register.Qwen3ASRRLTemplate'>, system_prefix=[], default_system=None, auto_add_bos=False, stop_words=[], agent_template='react_en', is_thinking=False, thinking_prefix='', non_thinking_prefix='', history_thinking_prefix='')",
|
| 498 |
-
"_val_dataset_exists": true,
|
| 499 |
-
"hub": "<class 'swift.hub.hub.MSHub'>",
|
| 500 |
-
"evaluation_strategy": "steps",
|
| 501 |
-
"training_args": "GRPOConfig(output_dir='/data/haobin/pky_train/qwen3_swift/pky_out/qwen3asr_dapo_reward5_3x8x8_12gen_3GPU/v3-20260410-173721', overwrite_output_dir=False, do_train=False, do_eval=True, do_predict=False, eval_strategy=<IntervalStrategy.STEPS: 'steps'>, prediction_loss_only=False, per_device_train_batch_size=4, per_device_eval_batch_size=4, per_gpu_train_batch_size=None, per_gpu_eval_batch_size=None, gradient_accumulation_steps=16, eval_accumulation_steps=None, eval_delay=0, torch_empty_cache_steps=None, learning_rate=5e-05, weight_decay=0.1, adam_beta1=0.9, adam_beta2=0.95, adam_epsilon=1e-08, max_grad_norm=1.0, num_train_epochs=3.0, max_steps=-1, lr_scheduler_type=<SchedulerType.COSINE: 'cosine'>, lr_scheduler_kwargs=None, warmup_ratio=0.03, warmup_steps=0, log_level='passive', log_level_replica='warning', log_on_each_node=True, logging_dir='/data/haobin/pky_train/qwen3_swift/pky_out/qwen3asr_dapo_reward5_3x8x8_12gen_3GPU/v3-20260410-173721/runs', logging_strategy=<IntervalStrategy.STEPS: 'steps'>, logging_first_step=True, logging_steps=5, logging_nan_inf_filter=True, save_strategy=<SaveStrategy.STEPS: 'steps'>, save_steps=20, save_total_limit=None, save_safetensors=True, save_on_each_node=False, save_only_model=False, restore_callback_states_from_checkpoint=False, no_cuda=False, use_cpu=False, use_mps_device=False, seed=42, data_seed=42, jit_mode_eval=False, bf16=True, fp16=False, fp16_opt_level='O1', half_precision_backend='auto', bf16_full_eval=False, fp16_full_eval=False, tf32=None, local_rank=0, ddp_backend=None, tpu_num_cores=None, tpu_metrics_debug=False, debug=[], dataloader_drop_last=True, eval_steps=20, dataloader_num_workers=1, dataloader_prefetch_factor=2, past_index=-1, run_name='qwen3asr_dapo_reward5_3x8x8_12gen_3GPU', disable_tqdm=False, remove_unused_columns=False, label_names=None, load_best_model_at_end=False, metric_for_best_model='loss', greater_is_better=False, ignore_data_skip=False, fsdp=[], fsdp_min_num_params=0, fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_transformer_layer_cls_to_wrap=None, accelerator_config=AcceleratorConfig(split_batches=False, dispatch_batches=False, even_batches=True, use_seedable_sampler=True, non_blocking=False, gradient_accumulation_kwargs=None, use_configured_state=False), parallelism_config=None, deepspeed=None, label_smoothing_factor=0.0, optim=<OptimizerNames.ADAMW_TORCH_FUSED: 'adamw_torch_fused'>, optim_args=None, adafactor=False, group_by_length=False, length_column_name='length', report_to=['wandb'], project='huggingface', trackio_space_id='trackio', ddp_find_unused_parameters=None, ddp_bucket_cap_mb=None, ddp_broadcast_buffers=None, dataloader_pin_memory=True, dataloader_persistent_workers=False, skip_memory_metrics=True, use_legacy_prediction_loop=False, push_to_hub=False, resume_from_checkpoint=None, hub_model_id=None, hub_strategy=<HubStrategy.EVERY_SAVE: 'every_save'>, hub_token=None, hub_private_repo=None, hub_always_push=False, hub_revision=None, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, include_inputs_for_metrics=False, include_for_metrics=[], eval_do_concat_batches=True, fp16_backend='auto', push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=None, mp_parameters='', auto_find_batch_size=False, full_determinism=False, torchdynamo=None, ray_scope='last', ddp_timeout=18000000, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, include_tokens_per_second=None, include_num_input_tokens_seen=None, neftune_noise_alpha=None, optim_target_modules=None, batch_eval_metrics=False, eval_on_start=False, use_liger_kernel=False, liger_kernel_config=None, eval_use_gather_object=False, average_tokens_across_devices=None, model_init_kwargs=None, disable_dropout=False, cast_lm_head_to_fp32=False, num_generations=12, num_generations_eval=4, max_completion_length=256, ds3_gather_for_generation=True, shuffle_dataset=True, generation_batch_size=48, steps_per_generation=4, temperature=0.5, top_p=0.95, top_k=50, min_p=None, generation_kwargs=None, chat_template_kwargs=None, repetition_penalty=1.08, use_transformers_paged=False, cache_implementation=None, use_vllm=False, vllm_mode=None, vllm_model_impl='vllm', vllm_enable_sleep_mode=False, vllm_structured_outputs_regex=None, vllm_server_base_url=None, vllm_server_host=None, vllm_server_port=[8000], vllm_server_timeout=240.0, vllm_group_port=51216, vllm_gpu_memory_utilization=0.9, vllm_max_model_length=None, vllm_tensor_parallel_size=1, beta=0.04, num_iterations=2, epsilon=0.2, delta=None, epsilon_high=0.28, sapo_temperature_neg=1.05, sapo_temperature_pos=1.0, importance_sampling_level='token', reward_weights=None, multi_objective_aggregation='sum_then_normalize', scale_rewards='group', loss_type='dapo', mask_truncated_completions=False, sync_ref_model=False, ref_model_mixup_alpha=0.6, ref_model_sync_steps=512, top_entropy_quantile=1.0, max_tool_calling_iterations=None, vllm_importance_sampling_correction=True, vllm_importance_sampling_mode='sequence_mask', vllm_importance_sampling_cap=3.0, off_policy_mask_threshold=None, use_bias_correction_kl=False, log_completions=True, num_completions_to_print=None, log_unique_prompts=False, log_completions_hub_repo=None, tuner_backend='peft', vit_gradient_checkpointing=True, router_aux_loss_coef=0.0, enable_dft_loss=False, enable_channel_loss=False, safe_serialization=True, max_shard_size='5GB', check_model=True, acc_strategy='token', train_dataloader_shuffle=True, max_epochs=None, aligner_lr=None, vit_lr=None, use_logits_to_keep=None, resume_only_model=False, optimizer=None, eval_metric=None, callbacks=[], early_stop_interval=None, eval_use_evalscope=False, eval_dataset=[], eval_dataset_args=None, eval_limit=None, eval_generation_config=None, extra_eval_args=None, tuner_type='lora', use_galore=False, galore_target_modules=None, galore_rank=128, galore_update_proj_gap=50, galore_scale=1.0, galore_proj_type='std', galore_optim_per_parameter=False, galore_with_embedding=False, galore_quantization=False, galore_proj_quant=False, galore_proj_bits=4, galore_proj_group_size=256, galore_cos_threshold=0.4, galore_gamma_proj=2, galore_queue_size=5, lisa_activated_layers=0, lisa_step_interval=20, use_flash_ckpt=False, vllm_pipeline_parallel_size=1, vllm_enable_expert_parallel=False, vllm_max_num_seqs=None, vllm_max_model_len=None, vllm_disable_custom_all_reduce=True, vllm_enforce_eager=False, vllm_limit_mm_per_prompt=None, vllm_max_lora_rank=16, vllm_enable_prefix_caching=True, vllm_use_async_engine=None, vllm_quantization=None, vllm_reasoning_parser=None, vllm_disable_cascade_attn=False, vllm_mm_processor_cache_gb=None, vllm_speculative_config=None, vllm_engine_kwargs={}, vllm_data_parallel_size=1, stop_words=[], vllm_enable_lora=False, lora_rank=8, vllm_server_group_port=None, enable_flattened_weight_sync=True, async_generate=False, structured_outputs_regex=None, sleep_level=0, move_model_batches=None, offload_optimizer=False, offload_model=False, wandb_log_unique_prompts=None, cosine_min_len_value_wrong=-0.5, cosine_max_len_value_wrong=0.0, cosine_min_len_value_correct=1.0, cosine_max_len_value_correct=0.5, cosine_max_len=256, repetition_n_grams=3, repetition_max_penalty=-1.0, reward_model=None, reward_model_plugin=None, chord_sft_dataset=[], chord_sft_per_device_train_batch_size=None, chord_enable_phi_function=False, chord_mu_warmup_steps=None, chord_mu_decay_steps=None, chord_mu_peak=None, chord_mu_valley=None, multi_turn_scheduler=None, max_turns=None, completion_length_limit_scope='per_round', vllm_server_pass_dataset=False, dynamic_sample=True, max_resample_times=4, overlong_filter=True, soft_max_length=None, soft_cache_length=None, log_entropy=False, tau_pos=1.0, tau_neg=1.05, advantage_estimator='grpo', kl_in_reward=False, dataset_shuffle=True, rollout_importance_sampling_mode=None, rollout_importance_sampling_threshold=2.0, log_rollout_offpolicy_metrics=False, off_policy_sequence_mask_delta=None)"
|
| 502 |
-
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
lora/lora-stage3/optimizer.pt
DELETED
|
@@ -1,3 +0,0 @@
|
|
| 1 |
-
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:226e26b176c37ed7c792adfd2fe4136f95d2fdb572f9bb695f787161e8da0faa
|
| 3 |
-
size 99183201
|
|
|
|
|
|
|
|
|
|
|
|
lora/lora-stage3/rng_state_0.pth
DELETED
|
@@ -1,3 +0,0 @@
|
|
| 1 |
-
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:e6aa29f654dcff45f4d494e85fba95c300e2ba77360edeca5a3899f79909e7ce
|
| 3 |
-
size 14725
|
|
|
|
|
|
|
|
|
|
|
|
lora/lora-stage3/rng_state_1.pth
DELETED
|
@@ -1,3 +0,0 @@
|
|
| 1 |
-
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:47db367bb33a2abe8e3e662eec69e0be4925b4a0a64b5b6c12647bc9faa62ad2
|
| 3 |
-
size 14661
|
|
|
|
|
|
|
|
|
|
|
|
lora/lora-stage3/rng_state_2.pth
DELETED
|
@@ -1,3 +0,0 @@
|
|
| 1 |
-
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:2faaa8f2708f53af7418ce42a1a06c28bcd4f75dce65c528b4f754d02132f5c0
|
| 3 |
-
size 14661
|
|
|
|
|
|
|
|
|
|
|
|