Instructions to use josephmayo/gemma-4-E4B-it-coding-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use josephmayo/gemma-4-E4B-it-coding-lora with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("google/gemma-4-E4B-it") model = PeftModel.from_pretrained(base_model, "josephmayo/gemma-4-E4B-it-coding-lora") - Notebooks
- Google Colab
- Kaggle
Upload Kaggle-trained coding LoRA adapter
Browse files- .gitattributes +1 -0
- README.md +43 -0
- adapter_config.json +308 -0
- adapter_model.safetensors +3 -0
- chat_template.jinja +351 -0
- eval_before_after.csv +150 -0
- executable_eval.json +135 -0
- nvidia_smi.txt +24 -0
- processor_config.json +75 -0
- proof_summary.json +175 -0
- summary.json +175 -0
- tokenizer.json +3 -0
- tokenizer_config.json +96 -0
- trainer_log_history.json +291 -0
.gitattributes
CHANGED
|
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
README.md
ADDED
|
@@ -0,0 +1,43 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
base_model: google/gemma-4-E4B-it
|
| 3 |
+
library_name: peft
|
| 4 |
+
license: apache-2.0
|
| 5 |
+
tags:
|
| 6 |
+
- gemma4
|
| 7 |
+
- coding
|
| 8 |
+
- qlora
|
| 9 |
+
- kaggle-proof
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
# Gemma 4 E4B IT Coding LoRA
|
| 13 |
+
|
| 14 |
+
QLoRA adapter for `google/gemma-4-E4B-it`, trained on filtered benign coding instructions.
|
| 15 |
+
|
| 16 |
+
## Training
|
| 17 |
+
|
| 18 |
+
- Runtime: Kaggle 2x Tesla T4
|
| 19 |
+
- Dataset: `ise-uiuc/Magicoder-Evol-Instruct-110K`, filtered to remove unsafe coding domains
|
| 20 |
+
- Safe rows used: 1024
|
| 21 |
+
- Steps: 200
|
| 22 |
+
- LoRA: r=16, alpha=32, target_modules=`all-linear`
|
| 23 |
+
- Trainable parameters: 50,499,584
|
| 24 |
+
- Final train loss: 1.1427
|
| 25 |
+
|
| 26 |
+
## Proof
|
| 27 |
+
|
| 28 |
+
- HumanEval subset: first 8 tasks
|
| 29 |
+
- Executable pass count before: 5/8
|
| 30 |
+
- Executable pass count after: 7/8
|
| 31 |
+
- Heuristic score before: 0.7688
|
| 32 |
+
- Heuristic score after: 0.7688
|
| 33 |
+
|
| 34 |
+
Artifacts included:
|
| 35 |
+
|
| 36 |
+
- `eval_before_after.csv`
|
| 37 |
+
- `executable_eval.json`
|
| 38 |
+
- `trainer_log_history.json`
|
| 39 |
+
- `summary.json`
|
| 40 |
+
- `proof_summary.json`
|
| 41 |
+
- `nvidia_smi.txt`
|
| 42 |
+
|
| 43 |
+
This adapter is for benign coding assistance only. It was not trained on malware, phishing, exploit, credential theft, evasion, or destructive automation examples.
|
adapter_config.json
ADDED
|
@@ -0,0 +1,308 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"alora_invocation_tokens": null,
|
| 3 |
+
"alpha_pattern": {},
|
| 4 |
+
"arrow_config": null,
|
| 5 |
+
"auto_mapping": null,
|
| 6 |
+
"base_model_name_or_path": "google/gemma-4-E4B-it",
|
| 7 |
+
"bias": "none",
|
| 8 |
+
"corda_config": null,
|
| 9 |
+
"ensure_weight_tying": false,
|
| 10 |
+
"eva_config": null,
|
| 11 |
+
"exclude_modules": null,
|
| 12 |
+
"fan_in_fan_out": false,
|
| 13 |
+
"inference_mode": true,
|
| 14 |
+
"init_lora_weights": true,
|
| 15 |
+
"layer_replication": null,
|
| 16 |
+
"layers_pattern": null,
|
| 17 |
+
"layers_to_transform": null,
|
| 18 |
+
"loftq_config": {},
|
| 19 |
+
"lora_alpha": 32,
|
| 20 |
+
"lora_bias": false,
|
| 21 |
+
"lora_dropout": 0.05,
|
| 22 |
+
"lora_ga_config": null,
|
| 23 |
+
"megatron_config": null,
|
| 24 |
+
"megatron_core": "megatron.core",
|
| 25 |
+
"modules_to_save": null,
|
| 26 |
+
"peft_type": "LORA",
|
| 27 |
+
"peft_version": "0.19.1",
|
| 28 |
+
"qalora_group_size": 16,
|
| 29 |
+
"r": 16,
|
| 30 |
+
"rank_pattern": {},
|
| 31 |
+
"revision": null,
|
| 32 |
+
"target_modules": [
|
| 33 |
+
"input_proj_linear",
|
| 34 |
+
"input_proj",
|
| 35 |
+
"26.mlp.down_proj",
|
| 36 |
+
"30.mlp.gate_proj",
|
| 37 |
+
"language_model.layers.11.self_attn.q_proj",
|
| 38 |
+
"language_model.layers.13.mlp.down_proj",
|
| 39 |
+
"language_model.layers.11.mlp.down_proj",
|
| 40 |
+
"language_model.layers.14.mlp.down_proj",
|
| 41 |
+
"21.self_attn.o_proj",
|
| 42 |
+
"language_model.layers.3.mlp.up_proj",
|
| 43 |
+
"33.mlp.up_proj",
|
| 44 |
+
"language_model.layers.1.self_attn.v_proj",
|
| 45 |
+
"language_model.layers.13.self_attn.v_proj",
|
| 46 |
+
"language_model.layers.8.self_attn.v_proj",
|
| 47 |
+
"23.self_attn.k_proj",
|
| 48 |
+
"41.mlp.up_proj",
|
| 49 |
+
"28.self_attn.o_proj",
|
| 50 |
+
"22.self_attn.q_proj",
|
| 51 |
+
"language_model.layers.8.mlp.gate_proj",
|
| 52 |
+
"19.self_attn.o_proj",
|
| 53 |
+
"language_model.layers.15.self_attn.v_proj",
|
| 54 |
+
"language_model.layers.9.self_attn.q_proj",
|
| 55 |
+
"language_model.layers.4.self_attn.q_proj",
|
| 56 |
+
"35.self_attn.o_proj",
|
| 57 |
+
"linear",
|
| 58 |
+
"22.self_attn.o_proj",
|
| 59 |
+
"24.mlp.down_proj",
|
| 60 |
+
"31.mlp.up_proj",
|
| 61 |
+
"language_model.layers.10.self_attn.v_proj",
|
| 62 |
+
"20.mlp.gate_proj",
|
| 63 |
+
"27.mlp.up_proj",
|
| 64 |
+
"41.mlp.down_proj",
|
| 65 |
+
"18.self_attn.o_proj",
|
| 66 |
+
"language_model.layers.7.self_attn.q_proj",
|
| 67 |
+
"23.mlp.gate_proj",
|
| 68 |
+
"18.self_attn.k_proj",
|
| 69 |
+
"language_model.layers.9.mlp.down_proj",
|
| 70 |
+
"language_model.layers.3.self_attn.o_proj",
|
| 71 |
+
"language_model.layers.4.self_attn.o_proj",
|
| 72 |
+
"27.mlp.down_proj",
|
| 73 |
+
"16.mlp.up_proj",
|
| 74 |
+
"19.self_attn.k_proj",
|
| 75 |
+
"language_model.layers.14.mlp.gate_proj",
|
| 76 |
+
"language_model.layers.8.self_attn.q_proj",
|
| 77 |
+
"29.self_attn.q_proj",
|
| 78 |
+
"17.self_attn.v_proj",
|
| 79 |
+
"language_model.layers.8.mlp.up_proj",
|
| 80 |
+
"32.mlp.up_proj",
|
| 81 |
+
"language_model.layers.2.self_attn.v_proj",
|
| 82 |
+
"20.self_attn.q_proj",
|
| 83 |
+
"17.self_attn.q_proj",
|
| 84 |
+
"29.mlp.down_proj",
|
| 85 |
+
"language_model.layers.7.mlp.gate_proj",
|
| 86 |
+
"28.mlp.gate_proj",
|
| 87 |
+
"18.mlp.up_proj",
|
| 88 |
+
"language_model.layers.8.mlp.down_proj",
|
| 89 |
+
"37.self_attn.o_proj",
|
| 90 |
+
"language_model.layers.6.self_attn.o_proj",
|
| 91 |
+
"language_model.layers.11.mlp.up_proj",
|
| 92 |
+
"21.mlp.down_proj",
|
| 93 |
+
"language_model.layers.0.mlp.gate_proj",
|
| 94 |
+
"25.self_attn.o_proj",
|
| 95 |
+
"language_model.layers.0.self_attn.q_proj",
|
| 96 |
+
"language_model.layers.0.self_attn.o_proj",
|
| 97 |
+
"33.self_attn.q_proj",
|
| 98 |
+
"per_layer_model_projection",
|
| 99 |
+
"32.self_attn.q_proj",
|
| 100 |
+
"language_model.layers.7.self_attn.v_proj",
|
| 101 |
+
"41.self_attn.o_proj",
|
| 102 |
+
"38.mlp.down_proj",
|
| 103 |
+
"25.mlp.up_proj",
|
| 104 |
+
"23.self_attn.v_proj",
|
| 105 |
+
"26.self_attn.q_proj",
|
| 106 |
+
"16.self_attn.v_proj",
|
| 107 |
+
"language_model.layers.11.self_attn.v_proj",
|
| 108 |
+
"language_model.layers.10.self_attn.q_proj",
|
| 109 |
+
"language_model.layers.9.self_attn.v_proj",
|
| 110 |
+
"35.mlp.down_proj",
|
| 111 |
+
"language_model.layers.7.mlp.down_proj",
|
| 112 |
+
"language_model.layers.3.self_attn.k_proj",
|
| 113 |
+
"38.mlp.gate_proj",
|
| 114 |
+
"language_model.layers.4.self_attn.k_proj",
|
| 115 |
+
"37.self_attn.q_proj",
|
| 116 |
+
"language_model.layers.1.mlp.gate_proj",
|
| 117 |
+
"language_model.layers.13.self_attn.q_proj",
|
| 118 |
+
"40.mlp.gate_proj",
|
| 119 |
+
"language_model.layers.1.self_attn.o_proj",
|
| 120 |
+
"38.self_attn.q_proj",
|
| 121 |
+
"19.mlp.gate_proj",
|
| 122 |
+
"36.mlp.gate_proj",
|
| 123 |
+
"language_model.layers.15.mlp.gate_proj",
|
| 124 |
+
"language_model.layers.10.mlp.gate_proj",
|
| 125 |
+
"language_model.layers.12.self_attn.q_proj",
|
| 126 |
+
"language_model.layers.12.mlp.down_proj",
|
| 127 |
+
"16.self_attn.q_proj",
|
| 128 |
+
"21.self_attn.q_proj",
|
| 129 |
+
"language_model.layers.13.mlp.up_proj",
|
| 130 |
+
"36.mlp.up_proj",
|
| 131 |
+
"language_model.layers.9.mlp.up_proj",
|
| 132 |
+
"16.mlp.down_proj",
|
| 133 |
+
"language_model.layers.15.self_attn.o_proj",
|
| 134 |
+
"41.mlp.gate_proj",
|
| 135 |
+
"26.mlp.up_proj",
|
| 136 |
+
"30.mlp.down_proj",
|
| 137 |
+
"39.mlp.up_proj",
|
| 138 |
+
"21.mlp.gate_proj",
|
| 139 |
+
"language_model.layers.3.self_attn.q_proj",
|
| 140 |
+
"language_model.layers.0.mlp.down_proj",
|
| 141 |
+
"language_model.layers.14.self_attn.o_proj",
|
| 142 |
+
"language_model.layers.11.self_attn.o_proj",
|
| 143 |
+
"17.mlp.up_proj",
|
| 144 |
+
"29.mlp.up_proj",
|
| 145 |
+
"23.self_attn.q_proj",
|
| 146 |
+
"21.mlp.up_proj",
|
| 147 |
+
"39.mlp.down_proj",
|
| 148 |
+
"40.self_attn.o_proj",
|
| 149 |
+
"35.mlp.up_proj",
|
| 150 |
+
"language_model.layers.5.mlp.gate_proj",
|
| 151 |
+
"output_proj",
|
| 152 |
+
"22.mlp.gate_proj",
|
| 153 |
+
"30.self_attn.o_proj",
|
| 154 |
+
"language_model.layers.1.self_attn.k_proj",
|
| 155 |
+
"language_model.layers.10.self_attn.k_proj",
|
| 156 |
+
"language_model.layers.10.mlp.down_proj",
|
| 157 |
+
"31.mlp.down_proj",
|
| 158 |
+
"26.mlp.gate_proj",
|
| 159 |
+
"language_model.layers.0.self_attn.k_proj",
|
| 160 |
+
"per_layer_input_gate",
|
| 161 |
+
"17.mlp.down_proj",
|
| 162 |
+
"31.mlp.gate_proj",
|
| 163 |
+
"language_model.layers.2.self_attn.q_proj",
|
| 164 |
+
"24.self_attn.q_proj",
|
| 165 |
+
"22.mlp.up_proj",
|
| 166 |
+
"language_model.layers.4.mlp.up_proj",
|
| 167 |
+
"25.mlp.gate_proj",
|
| 168 |
+
"language_model.layers.5.mlp.down_proj",
|
| 169 |
+
"21.self_attn.k_proj",
|
| 170 |
+
"16.self_attn.o_proj",
|
| 171 |
+
"language_model.layers.5.self_attn.o_proj",
|
| 172 |
+
"19.mlp.up_proj",
|
| 173 |
+
"21.self_attn.v_proj",
|
| 174 |
+
"language_model.layers.10.mlp.up_proj",
|
| 175 |
+
"18.self_attn.q_proj",
|
| 176 |
+
"language_model.layers.10.self_attn.o_proj",
|
| 177 |
+
"16.self_attn.k_proj",
|
| 178 |
+
"language_model.layers.6.mlp.up_proj",
|
| 179 |
+
"language_model.layers.7.self_attn.o_proj",
|
| 180 |
+
"language_model.layers.15.self_attn.k_proj",
|
| 181 |
+
"language_model.layers.2.mlp.gate_proj",
|
| 182 |
+
"17.mlp.gate_proj",
|
| 183 |
+
"28.self_attn.q_proj",
|
| 184 |
+
"30.mlp.up_proj",
|
| 185 |
+
"language_model.layers.4.self_attn.v_proj",
|
| 186 |
+
"language_model.layers.4.mlp.down_proj",
|
| 187 |
+
"25.self_attn.q_proj",
|
| 188 |
+
"30.self_attn.q_proj",
|
| 189 |
+
"language_model.layers.5.self_attn.q_proj",
|
| 190 |
+
"language_model.layers.14.self_attn.v_proj",
|
| 191 |
+
"language_model.layers.3.mlp.down_proj",
|
| 192 |
+
"24.mlp.gate_proj",
|
| 193 |
+
"38.mlp.up_proj",
|
| 194 |
+
"language_model.layers.13.mlp.gate_proj",
|
| 195 |
+
"22.mlp.down_proj",
|
| 196 |
+
"39.mlp.gate_proj",
|
| 197 |
+
"35.mlp.gate_proj",
|
| 198 |
+
"17.self_attn.k_proj",
|
| 199 |
+
"embedding_projection",
|
| 200 |
+
"language_model.layers.0.mlp.up_proj",
|
| 201 |
+
"20.self_attn.o_proj",
|
| 202 |
+
"language_model.layers.14.self_attn.k_proj",
|
| 203 |
+
"20.mlp.up_proj",
|
| 204 |
+
"22.self_attn.v_proj",
|
| 205 |
+
"18.mlp.down_proj",
|
| 206 |
+
"language_model.layers.1.mlp.up_proj",
|
| 207 |
+
"39.self_attn.o_proj",
|
| 208 |
+
"31.self_attn.q_proj",
|
| 209 |
+
"34.mlp.gate_proj",
|
| 210 |
+
"language_model.layers.4.mlp.gate_proj",
|
| 211 |
+
"40.mlp.down_proj",
|
| 212 |
+
"36.self_attn.o_proj",
|
| 213 |
+
"27.self_attn.q_proj",
|
| 214 |
+
"34.mlp.up_proj",
|
| 215 |
+
"language_model.layers.5.self_attn.v_proj",
|
| 216 |
+
"language_model.layers.6.mlp.down_proj",
|
| 217 |
+
"language_model.layers.12.mlp.gate_proj",
|
| 218 |
+
"28.mlp.down_proj",
|
| 219 |
+
"29.self_attn.o_proj",
|
| 220 |
+
"language_model.layers.1.self_attn.q_proj",
|
| 221 |
+
"36.mlp.down_proj",
|
| 222 |
+
"language_model.layers.15.mlp.down_proj",
|
| 223 |
+
"32.self_attn.o_proj",
|
| 224 |
+
"40.mlp.up_proj",
|
| 225 |
+
"33.mlp.down_proj",
|
| 226 |
+
"26.self_attn.o_proj",
|
| 227 |
+
"language_model.layers.3.self_attn.v_proj",
|
| 228 |
+
"language_model.layers.2.self_attn.o_proj",
|
| 229 |
+
"language_model.layers.2.mlp.up_proj",
|
| 230 |
+
"language_model.layers.12.mlp.up_proj",
|
| 231 |
+
"language_model.layers.2.self_attn.k_proj",
|
| 232 |
+
"language_model.layers.12.self_attn.o_proj",
|
| 233 |
+
"language_model.layers.6.self_attn.k_proj",
|
| 234 |
+
"18.mlp.gate_proj",
|
| 235 |
+
"34.self_attn.o_proj",
|
| 236 |
+
"23.mlp.up_proj",
|
| 237 |
+
"29.mlp.gate_proj",
|
| 238 |
+
"language_model.layers.0.self_attn.v_proj",
|
| 239 |
+
"language_model.layers.15.mlp.up_proj",
|
| 240 |
+
"19.self_attn.q_proj",
|
| 241 |
+
"32.mlp.gate_proj",
|
| 242 |
+
"per_layer_projection",
|
| 243 |
+
"34.self_attn.q_proj",
|
| 244 |
+
"language_model.layers.1.mlp.down_proj",
|
| 245 |
+
"language_model.layers.13.self_attn.o_proj",
|
| 246 |
+
"language_model.layers.6.mlp.gate_proj",
|
| 247 |
+
"language_model.layers.3.mlp.gate_proj",
|
| 248 |
+
"20.mlp.down_proj",
|
| 249 |
+
"language_model.layers.2.mlp.down_proj",
|
| 250 |
+
"37.mlp.down_proj",
|
| 251 |
+
"language_model.layers.12.self_attn.k_proj",
|
| 252 |
+
"28.mlp.up_proj",
|
| 253 |
+
"37.mlp.gate_proj",
|
| 254 |
+
"27.mlp.gate_proj",
|
| 255 |
+
"language_model.layers.9.self_attn.k_proj",
|
| 256 |
+
"24.mlp.up_proj",
|
| 257 |
+
"20.self_attn.v_proj",
|
| 258 |
+
"language_model.layers.13.self_attn.k_proj",
|
| 259 |
+
"19.mlp.down_proj",
|
| 260 |
+
"23.mlp.down_proj",
|
| 261 |
+
"31.self_attn.o_proj",
|
| 262 |
+
"language_model.layers.5.self_attn.k_proj",
|
| 263 |
+
"17.self_attn.o_proj",
|
| 264 |
+
"16.mlp.gate_proj",
|
| 265 |
+
"39.self_attn.q_proj",
|
| 266 |
+
"language_model.layers.5.mlp.up_proj",
|
| 267 |
+
"language_model.layers.12.self_attn.v_proj",
|
| 268 |
+
"language_model.layers.14.self_attn.q_proj",
|
| 269 |
+
"36.self_attn.q_proj",
|
| 270 |
+
"24.self_attn.o_proj",
|
| 271 |
+
"23.self_attn.o_proj",
|
| 272 |
+
"language_model.layers.9.mlp.gate_proj",
|
| 273 |
+
"relative_k_proj",
|
| 274 |
+
"40.self_attn.q_proj",
|
| 275 |
+
"18.self_attn.v_proj",
|
| 276 |
+
"language_model.layers.11.self_attn.k_proj",
|
| 277 |
+
"32.mlp.down_proj",
|
| 278 |
+
"33.self_attn.o_proj",
|
| 279 |
+
"38.self_attn.o_proj",
|
| 280 |
+
"34.mlp.down_proj",
|
| 281 |
+
"language_model.layers.9.self_attn.o_proj",
|
| 282 |
+
"35.self_attn.q_proj",
|
| 283 |
+
"19.self_attn.v_proj",
|
| 284 |
+
"language_model.layers.8.self_attn.k_proj",
|
| 285 |
+
"33.mlp.gate_proj",
|
| 286 |
+
"language_model.layers.7.self_attn.k_proj",
|
| 287 |
+
"22.self_attn.k_proj",
|
| 288 |
+
"41.self_attn.q_proj",
|
| 289 |
+
"37.mlp.up_proj",
|
| 290 |
+
"27.self_attn.o_proj",
|
| 291 |
+
"20.self_attn.k_proj",
|
| 292 |
+
"language_model.layers.11.mlp.gate_proj",
|
| 293 |
+
"language_model.layers.14.mlp.up_proj",
|
| 294 |
+
"language_model.layers.8.self_attn.o_proj",
|
| 295 |
+
"language_model.layers.15.self_attn.q_proj",
|
| 296 |
+
"language_model.layers.6.self_attn.q_proj",
|
| 297 |
+
"language_model.layers.6.self_attn.v_proj",
|
| 298 |
+
"25.mlp.down_proj",
|
| 299 |
+
"language_model.layers.7.mlp.up_proj"
|
| 300 |
+
],
|
| 301 |
+
"target_parameters": null,
|
| 302 |
+
"task_type": "CAUSAL_LM",
|
| 303 |
+
"trainable_token_indices": null,
|
| 304 |
+
"use_bdlora": null,
|
| 305 |
+
"use_dora": false,
|
| 306 |
+
"use_qalora": false,
|
| 307 |
+
"use_rslora": false
|
| 308 |
+
}
|
adapter_model.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:7225065dc08e4127efff0a65cfbad75b9c45e317897e1725e1c85bf07a53bda9
|
| 3 |
+
size 202180544
|
chat_template.jinja
ADDED
|
@@ -0,0 +1,351 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{%- macro format_parameters(properties, required, filter_keys=false) -%}
|
| 2 |
+
{%- set standard_keys = ['description', 'type', 'properties', 'required', 'nullable'] -%}
|
| 3 |
+
{%- set ns = namespace(found_first=false) -%}
|
| 4 |
+
{%- for key, value in properties | dictsort -%}
|
| 5 |
+
{%- set add_comma = false -%}
|
| 6 |
+
{%- if not filter_keys or key not in standard_keys -%}
|
| 7 |
+
{%- if ns.found_first %},{% endif -%}
|
| 8 |
+
{%- set ns.found_first = true -%}
|
| 9 |
+
{{ key }}:{
|
| 10 |
+
{%- if value['description'] -%}
|
| 11 |
+
description:<|"|>{{ value['description'] }}<|"|>
|
| 12 |
+
{%- set add_comma = true -%}
|
| 13 |
+
{%- endif -%}
|
| 14 |
+
{%- if value['type'] | upper == 'STRING' -%}
|
| 15 |
+
{%- if value['enum'] -%}
|
| 16 |
+
{%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%}
|
| 17 |
+
enum:{{ format_argument(value['enum']) }}
|
| 18 |
+
{%- endif -%}
|
| 19 |
+
{%- elif value['type'] | upper == 'ARRAY' -%}
|
| 20 |
+
{%- if value['items'] is mapping and value['items'] -%}
|
| 21 |
+
{%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%}
|
| 22 |
+
items:{
|
| 23 |
+
{%- set ns_items = namespace(found_first=false) -%}
|
| 24 |
+
{%- for item_key, item_value in value['items'] | dictsort -%}
|
| 25 |
+
{%- if item_value is not none -%}
|
| 26 |
+
{%- if ns_items.found_first %},{% endif -%}
|
| 27 |
+
{%- set ns_items.found_first = true -%}
|
| 28 |
+
{%- if item_key == 'properties' -%}
|
| 29 |
+
properties:{
|
| 30 |
+
{%- if item_value is mapping -%}
|
| 31 |
+
{{- format_parameters(item_value, value['items']['required'] | default([])) -}}
|
| 32 |
+
{%- endif -%}
|
| 33 |
+
}
|
| 34 |
+
{%- elif item_key == 'required' -%}
|
| 35 |
+
required:[
|
| 36 |
+
{%- for req_item in item_value -%}
|
| 37 |
+
<|"|>{{- req_item -}}<|"|>
|
| 38 |
+
{%- if not loop.last %},{% endif -%}
|
| 39 |
+
{%- endfor -%}
|
| 40 |
+
]
|
| 41 |
+
{%- elif item_key == 'type' -%}
|
| 42 |
+
{%- if item_value is string -%}
|
| 43 |
+
type:{{ format_argument(item_value | upper) }}
|
| 44 |
+
{%- else -%}
|
| 45 |
+
type:{{ format_argument(item_value | map('upper') | list) }}
|
| 46 |
+
{%- endif -%}
|
| 47 |
+
{%- else -%}
|
| 48 |
+
{{ item_key }}:{{ format_argument(item_value) }}
|
| 49 |
+
{%- endif -%}
|
| 50 |
+
{%- endif -%}
|
| 51 |
+
{%- endfor -%}
|
| 52 |
+
}
|
| 53 |
+
{%- endif -%}
|
| 54 |
+
{%- endif -%}
|
| 55 |
+
{%- if value['nullable'] %}
|
| 56 |
+
{%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%}
|
| 57 |
+
nullable:true
|
| 58 |
+
{%- endif -%}
|
| 59 |
+
{%- if value['type'] | upper == 'OBJECT' -%}
|
| 60 |
+
{%- if value['properties'] is defined and value['properties'] is mapping -%}
|
| 61 |
+
{%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%}
|
| 62 |
+
properties:{
|
| 63 |
+
{{- format_parameters(value['properties'], value['required'] | default([])) -}}
|
| 64 |
+
}
|
| 65 |
+
{%- elif value is mapping -%}
|
| 66 |
+
{%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%}
|
| 67 |
+
properties:{
|
| 68 |
+
{{- format_parameters(value, value['required'] | default([]), filter_keys=true) -}}
|
| 69 |
+
}
|
| 70 |
+
{%- endif -%}
|
| 71 |
+
{%- if value['required'] -%}
|
| 72 |
+
{%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%}
|
| 73 |
+
required:[
|
| 74 |
+
{%- for item in value['required'] | default([]) -%}
|
| 75 |
+
<|"|>{{- item -}}<|"|>
|
| 76 |
+
{%- if not loop.last %},{% endif -%}
|
| 77 |
+
{%- endfor -%}
|
| 78 |
+
]
|
| 79 |
+
{%- endif -%}
|
| 80 |
+
{%- endif -%}
|
| 81 |
+
{%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%}
|
| 82 |
+
type:<|"|>{{ value['type'] | upper }}<|"|>}
|
| 83 |
+
{%- endif -%}
|
| 84 |
+
{%- endfor -%}
|
| 85 |
+
{%- endmacro -%}
|
| 86 |
+
{%- macro format_function_declaration(tool_data) -%}
|
| 87 |
+
declaration:{{- tool_data['function']['name'] -}}{description:<|"|>{{- tool_data['function']['description'] -}}<|"|>
|
| 88 |
+
{%- set params = tool_data['function']['parameters'] -%}
|
| 89 |
+
{%- if params -%}
|
| 90 |
+
,parameters:{
|
| 91 |
+
{%- if params['properties'] -%}
|
| 92 |
+
properties:{ {{- format_parameters(params['properties'], params['required']) -}} },
|
| 93 |
+
{%- endif -%}
|
| 94 |
+
{%- if params['required'] -%}
|
| 95 |
+
required:[
|
| 96 |
+
{%- for item in params['required'] -%}
|
| 97 |
+
<|"|>{{- item -}}<|"|>
|
| 98 |
+
{{- ',' if not loop.last -}}
|
| 99 |
+
{%- endfor -%}
|
| 100 |
+
],
|
| 101 |
+
{%- endif -%}
|
| 102 |
+
{%- if params['type'] -%}
|
| 103 |
+
type:<|"|>{{- params['type'] | upper -}}<|"|>}
|
| 104 |
+
{%- endif -%}
|
| 105 |
+
{%- endif -%}
|
| 106 |
+
{%- if 'response' in tool_data['function'] -%}
|
| 107 |
+
{%- set response_declaration = tool_data['function']['response'] -%}
|
| 108 |
+
,response:{
|
| 109 |
+
{%- if response_declaration['description'] -%}
|
| 110 |
+
description:<|"|>{{- response_declaration['description'] -}}<|"|>,
|
| 111 |
+
{%- endif -%}
|
| 112 |
+
{%- if response_declaration['type'] | upper == 'OBJECT' -%}
|
| 113 |
+
type:<|"|>{{- response_declaration['type'] | upper -}}<|"|>}
|
| 114 |
+
{%- endif -%}
|
| 115 |
+
{%- endif -%}
|
| 116 |
+
}
|
| 117 |
+
{%- endmacro -%}
|
| 118 |
+
{%- macro format_argument(argument, escape_keys=True) -%}
|
| 119 |
+
{%- if argument is string -%}
|
| 120 |
+
{{- '<|"|>' + argument + '<|"|>' -}}
|
| 121 |
+
{%- elif argument is boolean -%}
|
| 122 |
+
{{- 'true' if argument else 'false' -}}
|
| 123 |
+
{%- elif argument is mapping -%}
|
| 124 |
+
{{- '{' -}}
|
| 125 |
+
{%- set ns = namespace(found_first=false) -%}
|
| 126 |
+
{%- for key, value in argument | dictsort -%}
|
| 127 |
+
{%- if ns.found_first %},{% endif -%}
|
| 128 |
+
{%- set ns.found_first = true -%}
|
| 129 |
+
{%- if escape_keys -%}
|
| 130 |
+
{{- '<|"|>' + key + '<|"|>' -}}
|
| 131 |
+
{%- else -%}
|
| 132 |
+
{{- key -}}
|
| 133 |
+
{%- endif -%}
|
| 134 |
+
:{{- format_argument(value, escape_keys=escape_keys) -}}
|
| 135 |
+
{%- endfor -%}
|
| 136 |
+
{{- '}' -}}
|
| 137 |
+
{%- elif argument is sequence -%}
|
| 138 |
+
{{- '[' -}}
|
| 139 |
+
{%- for item in argument -%}
|
| 140 |
+
{{- format_argument(item, escape_keys=escape_keys) -}}
|
| 141 |
+
{%- if not loop.last %},{% endif -%}
|
| 142 |
+
{%- endfor -%}
|
| 143 |
+
{{- ']' -}}
|
| 144 |
+
{%- else -%}
|
| 145 |
+
{{- argument -}}
|
| 146 |
+
{%- endif -%}
|
| 147 |
+
{%- endmacro -%}
|
| 148 |
+
{%- macro strip_thinking(text) -%}
|
| 149 |
+
{%- set ns = namespace(result='') -%}
|
| 150 |
+
{%- for part in text.split('<channel|>') -%}
|
| 151 |
+
{%- if '<|channel>' in part -%}
|
| 152 |
+
{%- set ns.result = ns.result + part.split('<|channel>')[0] -%}
|
| 153 |
+
{%- else -%}
|
| 154 |
+
{%- set ns.result = ns.result + part -%}
|
| 155 |
+
{%- endif -%}
|
| 156 |
+
{%- endfor -%}
|
| 157 |
+
{{- ns.result | trim -}}
|
| 158 |
+
{%- endmacro -%}
|
| 159 |
+
|
| 160 |
+
{%- macro format_tool_response_block(tool_name, response) -%}
|
| 161 |
+
{{- '<|tool_response>' -}}
|
| 162 |
+
{%- if response is mapping -%}
|
| 163 |
+
{{- 'response:' + tool_name + '{' -}}
|
| 164 |
+
{%- for key, value in response | dictsort -%}
|
| 165 |
+
{{- key -}}:{{- format_argument(value, escape_keys=False) -}}
|
| 166 |
+
{%- if not loop.last %},{% endif -%}
|
| 167 |
+
{%- endfor -%}
|
| 168 |
+
{{- '}' -}}
|
| 169 |
+
{%- else -%}
|
| 170 |
+
{{- 'response:' + tool_name + '{value:' + format_argument(response, escape_keys=False) + '}' -}}
|
| 171 |
+
{%- endif -%}
|
| 172 |
+
{{- '<tool_response|>' -}}
|
| 173 |
+
{%- endmacro -%}
|
| 174 |
+
|
| 175 |
+
{%- set ns = namespace(prev_message_type=None) -%}
|
| 176 |
+
{%- set loop_messages = messages -%}
|
| 177 |
+
{{- bos_token -}}
|
| 178 |
+
{#- Handle System/Tool Definitions Block -#}
|
| 179 |
+
{%- if (enable_thinking is defined and enable_thinking) or tools or messages[0]['role'] in ['system', 'developer'] -%}
|
| 180 |
+
{{- '<|turn>system\n' -}}
|
| 181 |
+
{#- Inject Thinking token at the very top of the FIRST system turn -#}
|
| 182 |
+
{%- if enable_thinking is defined and enable_thinking -%}
|
| 183 |
+
{{- '<|think|>\n' -}}
|
| 184 |
+
{%- set ns.prev_message_type = 'think' -%}
|
| 185 |
+
{%- endif -%}
|
| 186 |
+
{%- if messages[0]['role'] in ['system', 'developer'] -%}
|
| 187 |
+
{%- if messages[0]['content'] is string -%}
|
| 188 |
+
{{- messages[0]['content'] | trim -}}
|
| 189 |
+
{%- elif messages[0]['content'] is sequence -%}
|
| 190 |
+
{%- for item in messages[0]['content'] -%}
|
| 191 |
+
{{- item['text'] | trim + ' '-}}
|
| 192 |
+
{%- endfor -%}
|
| 193 |
+
{%- endif -%}
|
| 194 |
+
{%- set loop_messages = messages[1:] -%}
|
| 195 |
+
{%- endif -%}
|
| 196 |
+
{%- if tools -%}
|
| 197 |
+
{%- for tool in tools %}
|
| 198 |
+
{{- '<|tool>' -}}
|
| 199 |
+
{{- format_function_declaration(tool) | trim -}}
|
| 200 |
+
{{- '<tool|>' -}}
|
| 201 |
+
{%- endfor %}
|
| 202 |
+
{%- set ns.prev_message_type = 'tool' -%}
|
| 203 |
+
{%- endif -%}
|
| 204 |
+
{{- '<turn|>\n' -}}
|
| 205 |
+
{%- endif %}
|
| 206 |
+
|
| 207 |
+
{#- Pre-scan: find last user message index for reasoning guard -#}
|
| 208 |
+
{%- set ns_turn = namespace(last_user_idx=-1) -%}
|
| 209 |
+
{%- for i in range(loop_messages | length) -%}
|
| 210 |
+
{%- if loop_messages[i]['role'] == 'user' -%}
|
| 211 |
+
{%- set ns_turn.last_user_idx = i -%}
|
| 212 |
+
{%- endif -%}
|
| 213 |
+
{%- endfor -%}
|
| 214 |
+
|
| 215 |
+
{#- Loop through messages -#}
|
| 216 |
+
{%- for message in loop_messages -%}
|
| 217 |
+
{%- if message['role'] != 'tool' -%}
|
| 218 |
+
{%- set ns.prev_message_type = None -%}
|
| 219 |
+
{%- set role = 'model' if message['role'] == 'assistant' else message['role'] -%}
|
| 220 |
+
{#- Detect continuation: suppress duplicate <|turn>model when previous non-tool message was also assistant -#}
|
| 221 |
+
{%- set prev_nt = namespace(role=None, found=false) -%}
|
| 222 |
+
{%- if loop.index0 > 0 -%}
|
| 223 |
+
{%- for j in range(loop.index0 - 1, -1, -1) -%}
|
| 224 |
+
{%- if not prev_nt.found -%}
|
| 225 |
+
{%- if loop_messages[j]['role'] != 'tool' -%}
|
| 226 |
+
{%- set prev_nt.role = loop_messages[j]['role'] -%}
|
| 227 |
+
{%- set prev_nt.found = true -%}
|
| 228 |
+
{%- endif -%}
|
| 229 |
+
{%- endif -%}
|
| 230 |
+
{%- endfor -%}
|
| 231 |
+
{%- endif -%}
|
| 232 |
+
{%- set continue_same_model_turn = (role == 'model' and prev_nt.role == 'assistant') -%}
|
| 233 |
+
{%- if not continue_same_model_turn -%}
|
| 234 |
+
{{- '<|turn>' + role + '\n' }}
|
| 235 |
+
{%- endif -%}
|
| 236 |
+
|
| 237 |
+
{#- Render reasoning/reasoning_content as thinking channel -#}
|
| 238 |
+
{%- set thinking_text = message.get('reasoning') or message.get('reasoning_content') -%}
|
| 239 |
+
{%- if thinking_text and loop.index0 > ns_turn.last_user_idx and message.get('tool_calls') -%}
|
| 240 |
+
{{- '<|channel>thought\n' + thinking_text + '\n<channel|>' -}}
|
| 241 |
+
{%- endif -%}
|
| 242 |
+
|
| 243 |
+
{%- if message['tool_calls'] -%}
|
| 244 |
+
{%- for tool_call in message['tool_calls'] -%}
|
| 245 |
+
{%- set function = tool_call['function'] -%}
|
| 246 |
+
{{- '<|tool_call>call:' + function['name'] + '{' -}}
|
| 247 |
+
{%- if function['arguments'] is mapping -%}
|
| 248 |
+
{%- set ns_args = namespace(found_first=false) -%}
|
| 249 |
+
{%- for key, value in function['arguments'] | dictsort -%}
|
| 250 |
+
{%- if ns_args.found_first %},{% endif -%}
|
| 251 |
+
{%- set ns_args.found_first = true -%}
|
| 252 |
+
{{- key -}}:{{- format_argument(value, escape_keys=False) -}}
|
| 253 |
+
{%- endfor -%}
|
| 254 |
+
{%- elif function['arguments'] is string -%}
|
| 255 |
+
{{- function['arguments'] -}}
|
| 256 |
+
{%- endif -%}
|
| 257 |
+
{{- '}<tool_call|>' -}}
|
| 258 |
+
{%- endfor -%}
|
| 259 |
+
{%- set ns.prev_message_type = 'tool_call' -%}
|
| 260 |
+
{%- endif -%}
|
| 261 |
+
|
| 262 |
+
{%- set ns_tr_out = namespace(flag=false) -%}
|
| 263 |
+
{%- if message.get('tool_responses') -%}
|
| 264 |
+
{#- Legacy: tool_responses embedded on the assistant message (Google/Gemma native) -#}
|
| 265 |
+
{%- for tool_response in message['tool_responses'] -%}
|
| 266 |
+
{{- format_tool_response_block(tool_response['name'] | default('unknown'), tool_response['response']) -}}
|
| 267 |
+
{%- set ns_tr_out.flag = true -%}
|
| 268 |
+
{%- set ns.prev_message_type = 'tool_response' -%}
|
| 269 |
+
{%- endfor -%}
|
| 270 |
+
{%- elif message.get('tool_calls') -%}
|
| 271 |
+
{#- OpenAI Chat Completions: forward-scan consecutive role:tool messages -#}
|
| 272 |
+
{%- set ns_tool_scan = namespace(stopped=false) -%}
|
| 273 |
+
{%- for k in range(loop.index0 + 1, loop_messages | length) -%}
|
| 274 |
+
{%- if ns_tool_scan.stopped -%}
|
| 275 |
+
{%- elif loop_messages[k]['role'] != 'tool' -%}
|
| 276 |
+
{%- set ns_tool_scan.stopped = true -%}
|
| 277 |
+
{%- else -%}
|
| 278 |
+
{%- set follow = loop_messages[k] -%}
|
| 279 |
+
{#- Resolve tool_call_id to function name -#}
|
| 280 |
+
{%- set ns_tname = namespace(name=follow.get('name') | default('unknown')) -%}
|
| 281 |
+
{%- for tc in message['tool_calls'] -%}
|
| 282 |
+
{%- if tc.get('id') == follow.get('tool_call_id') -%}
|
| 283 |
+
{%- set ns_tname.name = tc['function']['name'] -%}
|
| 284 |
+
{%- endif -%}
|
| 285 |
+
{%- endfor -%}
|
| 286 |
+
{#- Handle content as string or content-parts array -#}
|
| 287 |
+
{%- set tool_body = follow.get('content') -%}
|
| 288 |
+
{%- if tool_body is string -%}
|
| 289 |
+
{{- format_tool_response_block(ns_tname.name, tool_body) -}}
|
| 290 |
+
{%- elif tool_body is sequence and tool_body is not string -%}
|
| 291 |
+
{%- set ns_txt = namespace(s='') -%}
|
| 292 |
+
{%- for part in tool_body -%}
|
| 293 |
+
{%- if part.get('type') == 'text' -%}
|
| 294 |
+
{%- set ns_txt.s = ns_txt.s + (part.get('text') | default('')) -%}
|
| 295 |
+
{%- endif -%}
|
| 296 |
+
{%- endfor -%}
|
| 297 |
+
{{- format_tool_response_block(ns_tname.name, ns_txt.s) -}}
|
| 298 |
+
{%- else -%}
|
| 299 |
+
{{- format_tool_response_block(ns_tname.name, tool_body) -}}
|
| 300 |
+
{%- endif -%}
|
| 301 |
+
{%- set ns_tr_out.flag = true -%}
|
| 302 |
+
{%- set ns.prev_message_type = 'tool_response' -%}
|
| 303 |
+
{%- endif -%}
|
| 304 |
+
{%- endfor -%}
|
| 305 |
+
{%- endif -%}
|
| 306 |
+
|
| 307 |
+
{%- set captured_content -%}
|
| 308 |
+
{%- if message['content'] is string -%}
|
| 309 |
+
{%- if role == 'model' -%}
|
| 310 |
+
{{- strip_thinking(message['content']) -}}
|
| 311 |
+
{%- else -%}
|
| 312 |
+
{{- message['content'] | trim -}}
|
| 313 |
+
{%- endif -%}
|
| 314 |
+
{%- elif message['content'] is sequence -%}
|
| 315 |
+
{%- for item in message['content'] -%}
|
| 316 |
+
{%- if item['type'] == 'text' -%}
|
| 317 |
+
{%- if role == 'model' -%}
|
| 318 |
+
{{- strip_thinking(item['text']) -}}
|
| 319 |
+
{%- else -%}
|
| 320 |
+
{{- item['text'] | trim -}}
|
| 321 |
+
{%- endif -%}
|
| 322 |
+
{%- elif item['type'] == 'image' -%}
|
| 323 |
+
{{- '<|image|>' -}}
|
| 324 |
+
{%- set ns.prev_message_type = 'image' -%}
|
| 325 |
+
{%- elif item['type'] == 'audio' -%}
|
| 326 |
+
{{- '<|audio|>' -}}
|
| 327 |
+
{%- set ns.prev_message_type = 'audio' -%}
|
| 328 |
+
{%- elif item['type'] == 'video' -%}
|
| 329 |
+
{{- '<|video|>' -}}
|
| 330 |
+
{%- set ns.prev_message_type = 'video' -%}
|
| 331 |
+
{%- endif -%}
|
| 332 |
+
{%- endfor -%}
|
| 333 |
+
{%- endif -%}
|
| 334 |
+
{%- endset -%}
|
| 335 |
+
|
| 336 |
+
{{- captured_content -}}
|
| 337 |
+
{%- set has_content = captured_content | trim | length > 0 -%}
|
| 338 |
+
|
| 339 |
+
{%- if ns.prev_message_type == 'tool_call' and not ns_tr_out.flag -%}
|
| 340 |
+
{{- '<|tool_response>' -}}
|
| 341 |
+
{%- elif not (ns_tr_out.flag and not has_content) -%}
|
| 342 |
+
{{- '<turn|>\n' -}}
|
| 343 |
+
{%- endif -%}
|
| 344 |
+
{%- endif -%}
|
| 345 |
+
{%- endfor -%}
|
| 346 |
+
|
| 347 |
+
{%- if add_generation_prompt -%}
|
| 348 |
+
{%- if ns.prev_message_type != 'tool_response' and ns.prev_message_type != 'tool_call' -%}
|
| 349 |
+
{{- '<|turn>model\n' -}}
|
| 350 |
+
{%- endif -%}
|
| 351 |
+
{%- endif -%}
|
eval_before_after.csv
ADDED
|
@@ -0,0 +1,150 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
phase,id,latency_s,score,chars,preview
|
| 2 |
+
before,HumanEval_0,35.316,0.8,536,"```python
|
| 3 |
+
from typing import List
|
| 4 |
+
|
| 5 |
+
|
| 6 |
+
def has_close_elements(numbers: List[float], threshold: float) -> bool:
|
| 7 |
+
"""""" Check if in given list of numbers, are any two numbers closer to each other than
|
| 8 |
+
given threshold.
|
| 9 |
+
>>> has_close_elements([1.0, 2.0, 3.0], 0.5)
|
| 10 |
+
False
|
| 11 |
+
>>> has_close_eleme"
|
| 12 |
+
before,HumanEval_1,33.612,0.8,755,"```python
|
| 13 |
+
from typing import List
|
| 14 |
+
|
| 15 |
+
|
| 16 |
+
def separate_paren_groups(paren_string: str) -> List[str]:
|
| 17 |
+
"""""" Input to this function is a string containing multiple groups of nested parentheses. Your goal is to
|
| 18 |
+
separate those group into separate strings and return the list of those.
|
| 19 |
+
Separate groups"
|
| 20 |
+
before,HumanEval_2,17.675,0.8,377,"```python
|
| 21 |
+
def truncate_number(number: float) -> float:
|
| 22 |
+
"""""" Given a positive floating point number, it can be decomposed into
|
| 23 |
+
and integer part (largest integer smaller than given number) and decimals
|
| 24 |
+
(leftover part always smaller than 1).
|
| 25 |
+
|
| 26 |
+
Return the decimal part of the number.
|
| 27 |
+
>>"
|
| 28 |
+
before,HumanEval_3,30.553,0.8,610,"```python
|
| 29 |
+
from typing import List
|
| 30 |
+
|
| 31 |
+
|
| 32 |
+
def below_zero(operations: List[int]) -> bool:
|
| 33 |
+
"""""" You're given a list of deposit and withdrawal operations on a bank account that starts with
|
| 34 |
+
zero balance. Your task is to detect if at any point the balance of account fallls below zero, and
|
| 35 |
+
at that po"
|
| 36 |
+
before,HumanEval_4,33.594,0.8,648,"```python
|
| 37 |
+
from typing import List
|
| 38 |
+
|
| 39 |
+
|
| 40 |
+
def mean_absolute_deviation(numbers: List[float]) -> float:
|
| 41 |
+
"""""" For a given list of input numbers, calculate Mean Absolute Deviation
|
| 42 |
+
around the mean of this dataset.
|
| 43 |
+
Mean Absolute Deviation is the average absolute difference between each
|
| 44 |
+
element an"
|
| 45 |
+
before,HumanEval_5,29.098,0.8,500,"```python
|
| 46 |
+
from typing import List
|
| 47 |
+
|
| 48 |
+
|
| 49 |
+
def intersperse(numbers: List[int], delimeter: int) -> List[int]:
|
| 50 |
+
"""""" Insert a number 'delimeter' between every two consecutive elements of input list `numbers'
|
| 51 |
+
>>> intersperse([], 4)
|
| 52 |
+
[]
|
| 53 |
+
>>> intersperse([1, 2, 3], 4)
|
| 54 |
+
[1, 4, 2, 4, 3]
|
| 55 |
+
""""""
|
| 56 |
+
"
|
| 57 |
+
before,HumanEval_6,33.787,0.55,702,"```python
|
| 58 |
+
from typing import List
|
| 59 |
+
|
| 60 |
+
|
| 61 |
+
def parse_nested_parens(paren_string: str) -> List[int]:
|
| 62 |
+
"""""" Input to this function is a string represented multiple groups for nested parentheses separated by spaces.
|
| 63 |
+
For each of the group, output the deepest level of nesting of parentheses.
|
| 64 |
+
E.g. (()("
|
| 65 |
+
before,HumanEval_7,21.317,0.8,396,"```python
|
| 66 |
+
from typing import List
|
| 67 |
+
|
| 68 |
+
|
| 69 |
+
def filter_by_substring(strings: List[str], substring: str) -> List[str]:
|
| 70 |
+
"""""" Filter an input list of strings only for ones that contain given substring
|
| 71 |
+
>>> filter_by_substring([], 'a')
|
| 72 |
+
[]
|
| 73 |
+
>>> filter_by_substring(['abc', 'bacd', 'cde', 'array'], 'a"
|
| 74 |
+
after,HumanEval_0,53.709,0.8,568,"```python
|
| 75 |
+
from typing import List
|
| 76 |
+
|
| 77 |
+
|
| 78 |
+
def has_close_elements(numbers: List[float], threshold: float) -> bool:
|
| 79 |
+
"""""" Check if in given list of numbers, are any two numbers closer to each other than
|
| 80 |
+
given threshold.
|
| 81 |
+
>>> has_close_elements([1.0, 2.0, 3.0], 0.5)
|
| 82 |
+
False
|
| 83 |
+
>>> has_close_eleme"
|
| 84 |
+
after,HumanEval_1,53.466,0.8,763,"```python
|
| 85 |
+
from typing import List
|
| 86 |
+
|
| 87 |
+
|
| 88 |
+
def separate_paren_groups(paren_string: str) -> List[str]:
|
| 89 |
+
paren_string = paren_string.replace("" "", """")
|
| 90 |
+
groups = []
|
| 91 |
+
balance = 0
|
| 92 |
+
start = 0
|
| 93 |
+
for i, char in enumerate(paren_string):
|
| 94 |
+
if char == '(':
|
| 95 |
+
balance += 1
|
| 96 |
+
elif char"
|
| 97 |
+
after,HumanEval_2,53.294,0.8,824,"```python
|
| 98 |
+
def truncate_number(number: float) -> float:
|
| 99 |
+
"""""" Given a positive floating point number, it can be decomposed into
|
| 100 |
+
and an integer part (largest integer smaller than given number) and decimals
|
| 101 |
+
(leftover part always smaller than 1).
|
| 102 |
+
|
| 103 |
+
Return the decimal part of the number.
|
| 104 |
+
"
|
| 105 |
+
after,HumanEval_3,53.551,0.8,770,"```python
|
| 106 |
+
from typing import List
|
| 107 |
+
|
| 108 |
+
|
| 109 |
+
def below_zero(operations: List[int]) -> bool:
|
| 110 |
+
"""""" You're given a list of deposit and withdrawal operations on a bank account that starts with
|
| 111 |
+
zero balance. Your task is to detect if at any point the balance of account fallls below zero, and
|
| 112 |
+
at that po"
|
| 113 |
+
after,HumanEval_4,53.816,0.8,736,"```python
|
| 114 |
+
from typing import List
|
| 115 |
+
|
| 116 |
+
|
| 117 |
+
def mean_absolute_deviation(numbers: List[float]) -> float:
|
| 118 |
+
"""""" For a given list of input numbers, calculate Mean Absolute Deviation
|
| 119 |
+
around the mean of this dataset.
|
| 120 |
+
Mean Absolute Deviation is the average absolute difference between each
|
| 121 |
+
element an"
|
| 122 |
+
after,HumanEval_5,53.518,0.8,643,"```python
|
| 123 |
+
from typing import List
|
| 124 |
+
|
| 125 |
+
|
| 126 |
+
def intersperse(numbers: List[int], delimeter: int) -> List[int]:
|
| 127 |
+
"""""" Insert a number 'delimeter' between every two consecutive elements of input list `numbers'
|
| 128 |
+
>>> intersperse([], 4)
|
| 129 |
+
[]
|
| 130 |
+
>>> intersperse([1, 2, 3], 4)
|
| 131 |
+
[1, 4, 2, 4, 3]
|
| 132 |
+
""""""
|
| 133 |
+
"
|
| 134 |
+
after,HumanEval_6,53.528,0.55,701,"```python
|
| 135 |
+
from typing import List
|
| 136 |
+
|
| 137 |
+
|
| 138 |
+
def parse_nested_parens(paren_string: str) -> List[int]:
|
| 139 |
+
"""""" Input to this function is a string represented multiple groups for nested parentheses separated by spaces.
|
| 140 |
+
For each of the group, output the deepest level of nesting of parentheses.
|
| 141 |
+
E.g. (()("
|
| 142 |
+
after,HumanEval_7,53.317,0.8,739,"```python
|
| 143 |
+
from typing import List
|
| 144 |
+
|
| 145 |
+
|
| 146 |
+
def filter_by_substring(strings: List[str], substring: str) -> List[str]:
|
| 147 |
+
"""""" Filter an input list of strings only for ones that contain given substring
|
| 148 |
+
>>> filter_by_substring([], 'a')
|
| 149 |
+
[]
|
| 150 |
+
>>> filter_by_substring(['abc', 'bacd', 'cde', 'array'], 'a"
|
executable_eval.json
ADDED
|
@@ -0,0 +1,135 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"before_pass": 5,
|
| 3 |
+
"after_pass": 7,
|
| 4 |
+
"total": 8,
|
| 5 |
+
"rows": [
|
| 6 |
+
{
|
| 7 |
+
"phase": "before",
|
| 8 |
+
"task_id": "HumanEval/0",
|
| 9 |
+
"entry_point": "has_close_elements",
|
| 10 |
+
"passed": false,
|
| 11 |
+
"error": "Traceback (most recent call last):\n File \"C:\\Users\\USER\\AppData\\Local\\Temp\\tmp6oy5omdq.py\", line 8, in <module>\n exec(code, ns)\n ~~~~^^^^^^^^^^\n File \"<string>\", line 17\n if sorted_numbers[i+1] - sorted_numbers[i\n ^\nSyntaxError: '[' was never closed\n",
|
| 12 |
+
"chars": 526
|
| 13 |
+
},
|
| 14 |
+
{
|
| 15 |
+
"phase": "after",
|
| 16 |
+
"task_id": "HumanEval/0",
|
| 17 |
+
"entry_point": "has_close_elements",
|
| 18 |
+
"passed": true,
|
| 19 |
+
"error": null,
|
| 20 |
+
"chars": 495
|
| 21 |
+
},
|
| 22 |
+
{
|
| 23 |
+
"phase": "before",
|
| 24 |
+
"task_id": "HumanEval/1",
|
| 25 |
+
"entry_point": "separate_paren_groups",
|
| 26 |
+
"passed": false,
|
| 27 |
+
"error": "Traceback (most recent call last):\n File \"C:\\Users\\USER\\AppData\\Local\\Temp\\tmpbgx0dlv4.py\", line 10, in <module>\n ns[\"check\"](ns[entry_point])\n ~~~~~~~~~~~^^^^^^^^^^^^^^^^^\n File \"<string>\", line 10, in check\nAssertionError\n",
|
| 28 |
+
"chars": 745
|
| 29 |
+
},
|
| 30 |
+
{
|
| 31 |
+
"phase": "after",
|
| 32 |
+
"task_id": "HumanEval/1",
|
| 33 |
+
"entry_point": "separate_paren_groups",
|
| 34 |
+
"passed": true,
|
| 35 |
+
"error": null,
|
| 36 |
+
"chars": 455
|
| 37 |
+
},
|
| 38 |
+
{
|
| 39 |
+
"phase": "before",
|
| 40 |
+
"task_id": "HumanEval/2",
|
| 41 |
+
"entry_point": "truncate_number",
|
| 42 |
+
"passed": true,
|
| 43 |
+
"error": null,
|
| 44 |
+
"chars": 360
|
| 45 |
+
},
|
| 46 |
+
{
|
| 47 |
+
"phase": "after",
|
| 48 |
+
"task_id": "HumanEval/2",
|
| 49 |
+
"entry_point": "truncate_number",
|
| 50 |
+
"passed": true,
|
| 51 |
+
"error": null,
|
| 52 |
+
"chars": 363
|
| 53 |
+
},
|
| 54 |
+
{
|
| 55 |
+
"phase": "before",
|
| 56 |
+
"task_id": "HumanEval/3",
|
| 57 |
+
"entry_point": "below_zero",
|
| 58 |
+
"passed": true,
|
| 59 |
+
"error": null,
|
| 60 |
+
"chars": 590
|
| 61 |
+
},
|
| 62 |
+
{
|
| 63 |
+
"phase": "after",
|
| 64 |
+
"task_id": "HumanEval/3",
|
| 65 |
+
"entry_point": "below_zero",
|
| 66 |
+
"passed": true,
|
| 67 |
+
"error": null,
|
| 68 |
+
"chars": 590
|
| 69 |
+
},
|
| 70 |
+
{
|
| 71 |
+
"phase": "before",
|
| 72 |
+
"task_id": "HumanEval/4",
|
| 73 |
+
"entry_point": "mean_absolute_deviation",
|
| 74 |
+
"passed": true,
|
| 75 |
+
"error": null,
|
| 76 |
+
"chars": 632
|
| 77 |
+
},
|
| 78 |
+
{
|
| 79 |
+
"phase": "after",
|
| 80 |
+
"task_id": "HumanEval/4",
|
| 81 |
+
"entry_point": "mean_absolute_deviation",
|
| 82 |
+
"passed": true,
|
| 83 |
+
"error": null,
|
| 84 |
+
"chars": 530
|
| 85 |
+
},
|
| 86 |
+
{
|
| 87 |
+
"phase": "before",
|
| 88 |
+
"task_id": "HumanEval/5",
|
| 89 |
+
"entry_point": "intersperse",
|
| 90 |
+
"passed": true,
|
| 91 |
+
"error": null,
|
| 92 |
+
"chars": 486
|
| 93 |
+
},
|
| 94 |
+
{
|
| 95 |
+
"phase": "after",
|
| 96 |
+
"task_id": "HumanEval/5",
|
| 97 |
+
"entry_point": "intersperse",
|
| 98 |
+
"passed": true,
|
| 99 |
+
"error": null,
|
| 100 |
+
"chars": 455
|
| 101 |
+
},
|
| 102 |
+
{
|
| 103 |
+
"phase": "before",
|
| 104 |
+
"task_id": "HumanEval/6",
|
| 105 |
+
"entry_point": "parse_nested_parens",
|
| 106 |
+
"passed": false,
|
| 107 |
+
"error": "Traceback (most recent call last):\n File \"C:\\Users\\USER\\AppData\\Local\\Temp\\tmpx0bb66c4.py\", line 8, in <module>\n exec(code, ns)\n ~~~~^^^^^^^^^^\n File \"<string>\", line 21\n max_depth = max(max_depth\n ^\nSyntaxError: '(' was never closed\n",
|
| 108 |
+
"chars": 692
|
| 109 |
+
},
|
| 110 |
+
{
|
| 111 |
+
"phase": "after",
|
| 112 |
+
"task_id": "HumanEval/6",
|
| 113 |
+
"entry_point": "parse_nested_parens",
|
| 114 |
+
"passed": false,
|
| 115 |
+
"error": "Traceback (most recent call last):\n File \"C:\\Users\\USER\\AppData\\Local\\Temp\\tmpn7um62g2.py\", line 8, in <module>\n exec(code, ns)\n ~~~~^^^^^^^^^^\n File \"<string>\", line 20\n max_depth = max(max_depth\n ^\nSyntaxError: '(' was never closed\n",
|
| 116 |
+
"chars": 691
|
| 117 |
+
},
|
| 118 |
+
{
|
| 119 |
+
"phase": "before",
|
| 120 |
+
"task_id": "HumanEval/7",
|
| 121 |
+
"entry_point": "filter_by_substring",
|
| 122 |
+
"passed": true,
|
| 123 |
+
"error": null,
|
| 124 |
+
"chars": 379
|
| 125 |
+
},
|
| 126 |
+
{
|
| 127 |
+
"phase": "after",
|
| 128 |
+
"task_id": "HumanEval/7",
|
| 129 |
+
"entry_point": "filter_by_substring",
|
| 130 |
+
"passed": true,
|
| 131 |
+
"error": null,
|
| 132 |
+
"chars": 379
|
| 133 |
+
}
|
| 134 |
+
]
|
| 135 |
+
}
|
nvidia_smi.txt
ADDED
|
@@ -0,0 +1,24 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
Tue May 12 19:04:42 2026
|
| 2 |
+
+-----------------------------------------------------------------------------------------+
|
| 3 |
+
| NVIDIA-SMI 580.105.08 Driver Version: 580.105.08 CUDA Version: 13.0 |
|
| 4 |
+
+-----------------------------------------+------------------------+----------------------+
|
| 5 |
+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
|
| 6 |
+
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
|
| 7 |
+
| | | MIG M. |
|
| 8 |
+
|=========================================+========================+======================|
|
| 9 |
+
| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |
|
| 10 |
+
| N/A 39C P8 9W / 70W | 3MiB / 15360MiB | 0% Default |
|
| 11 |
+
| | | N/A |
|
| 12 |
+
+-----------------------------------------+------------------------+----------------------+
|
| 13 |
+
| 1 Tesla T4 Off | 00000000:00:05.0 Off | 0 |
|
| 14 |
+
| N/A 38C P8 10W / 70W | 3MiB / 15360MiB | 0% Default |
|
| 15 |
+
| | | N/A |
|
| 16 |
+
+-----------------------------------------+------------------------+----------------------+
|
| 17 |
+
|
| 18 |
+
+-----------------------------------------------------------------------------------------+
|
| 19 |
+
| Processes: |
|
| 20 |
+
| GPU GI CI PID Type Process name GPU Memory |
|
| 21 |
+
| ID ID Usage |
|
| 22 |
+
|=========================================================================================|
|
| 23 |
+
| No running processes found |
|
| 24 |
+
+-----------------------------------------------------------------------------------------+
|
processor_config.json
ADDED
|
@@ -0,0 +1,75 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"audio_ms_per_token": 40,
|
| 3 |
+
"audio_seq_length": 750,
|
| 4 |
+
"feature_extractor": {
|
| 5 |
+
"dither": 0.0,
|
| 6 |
+
"feature_extractor_type": "Gemma4AudioFeatureExtractor",
|
| 7 |
+
"feature_size": 128,
|
| 8 |
+
"fft_length": 512,
|
| 9 |
+
"fft_overdrive": false,
|
| 10 |
+
"frame_length": 320,
|
| 11 |
+
"hop_length": 160,
|
| 12 |
+
"input_scale_factor": 1.0,
|
| 13 |
+
"max_frequency": 8000.0,
|
| 14 |
+
"mel_floor": 0.001,
|
| 15 |
+
"min_frequency": 0.0,
|
| 16 |
+
"padding_side": "right",
|
| 17 |
+
"padding_value": 0.0,
|
| 18 |
+
"per_bin_mean": null,
|
| 19 |
+
"per_bin_stddev": null,
|
| 20 |
+
"preemphasis": 0.0,
|
| 21 |
+
"preemphasis_htk_flavor": true,
|
| 22 |
+
"return_attention_mask": true,
|
| 23 |
+
"sampling_rate": 16000
|
| 24 |
+
},
|
| 25 |
+
"image_processor": {
|
| 26 |
+
"do_convert_rgb": true,
|
| 27 |
+
"do_normalize": false,
|
| 28 |
+
"do_rescale": true,
|
| 29 |
+
"do_resize": true,
|
| 30 |
+
"image_mean": [
|
| 31 |
+
0.0,
|
| 32 |
+
0.0,
|
| 33 |
+
0.0
|
| 34 |
+
],
|
| 35 |
+
"image_processor_type": "Gemma4ImageProcessor",
|
| 36 |
+
"image_seq_length": 280,
|
| 37 |
+
"image_std": [
|
| 38 |
+
1.0,
|
| 39 |
+
1.0,
|
| 40 |
+
1.0
|
| 41 |
+
],
|
| 42 |
+
"max_soft_tokens": 280,
|
| 43 |
+
"patch_size": 16,
|
| 44 |
+
"pooling_kernel_size": 3,
|
| 45 |
+
"resample": 3,
|
| 46 |
+
"rescale_factor": 0.00392156862745098
|
| 47 |
+
},
|
| 48 |
+
"image_seq_length": 280,
|
| 49 |
+
"processor_class": "Gemma4Processor",
|
| 50 |
+
"video_processor": {
|
| 51 |
+
"do_convert_rgb": true,
|
| 52 |
+
"do_normalize": true,
|
| 53 |
+
"do_rescale": true,
|
| 54 |
+
"do_resize": true,
|
| 55 |
+
"do_sample_frames": true,
|
| 56 |
+
"image_mean": [
|
| 57 |
+
0.0,
|
| 58 |
+
0.0,
|
| 59 |
+
0.0
|
| 60 |
+
],
|
| 61 |
+
"image_std": [
|
| 62 |
+
1.0,
|
| 63 |
+
1.0,
|
| 64 |
+
1.0
|
| 65 |
+
],
|
| 66 |
+
"max_soft_tokens": 70,
|
| 67 |
+
"num_frames": 32,
|
| 68 |
+
"patch_size": 16,
|
| 69 |
+
"pooling_kernel_size": 3,
|
| 70 |
+
"resample": 3,
|
| 71 |
+
"rescale_factor": 0.00392156862745098,
|
| 72 |
+
"return_metadata": false,
|
| 73 |
+
"video_processor_type": "Gemma4VideoProcessor"
|
| 74 |
+
}
|
| 75 |
+
}
|
proof_summary.json
ADDED
|
@@ -0,0 +1,175 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"run_id": "20260512_190433",
|
| 3 |
+
"model_id": "google/gemma-4-E4B-it",
|
| 4 |
+
"dataset_id": "ise-uiuc/Magicoder-Evol-Instruct-110K",
|
| 5 |
+
"adapter_repo": "josephmayo/gemma-4-E4B-it-coding-lora",
|
| 6 |
+
"merged_repo": "josephmayo/gemma-4-E4B-it-coding-merged",
|
| 7 |
+
"stage": "after_eval",
|
| 8 |
+
"errors": [],
|
| 9 |
+
"cuda_available": true,
|
| 10 |
+
"cuda_device_count": 2,
|
| 11 |
+
"devices": [
|
| 12 |
+
"Tesla T4",
|
| 13 |
+
"Tesla T4"
|
| 14 |
+
],
|
| 15 |
+
"torch_version_initial": "2.10.0+cu128",
|
| 16 |
+
"hf_token_present": false,
|
| 17 |
+
"max_train_samples": 1024,
|
| 18 |
+
"max_steps": 200,
|
| 19 |
+
"max_seq_length": 512,
|
| 20 |
+
"eval_count": 8,
|
| 21 |
+
"lora_r": 16,
|
| 22 |
+
"lora_alpha": 32,
|
| 23 |
+
"lr": 0.0001,
|
| 24 |
+
"grad_accum": 8,
|
| 25 |
+
"push_to_hf": true,
|
| 26 |
+
"merge_and_push": false,
|
| 27 |
+
"load_in_4bit": true,
|
| 28 |
+
"memory_after_load": [
|
| 29 |
+
0,
|
| 30 |
+
9302143488
|
| 31 |
+
],
|
| 32 |
+
"eval_source": "openai/openai_humaneval:8",
|
| 33 |
+
"baseline_avg_score": 0.76875,
|
| 34 |
+
"safe_train_rows": 1024,
|
| 35 |
+
"trainable_parameters": {
|
| 36 |
+
"trainable": 50499584,
|
| 37 |
+
"total": 7991600416
|
| 38 |
+
},
|
| 39 |
+
"log_history_tail": [
|
| 40 |
+
{
|
| 41 |
+
"loss": 1.0154043197631837,
|
| 42 |
+
"grad_norm": 0.37521687150001526,
|
| 43 |
+
"learning_rate": 4.51495073572676e-05,
|
| 44 |
+
"epoch": 1.71875
|
| 45 |
+
},
|
| 46 |
+
{
|
| 47 |
+
"loss": 0.9917967796325684,
|
| 48 |
+
"grad_norm": 0.42887604236602783,
|
| 49 |
+
"learning_rate": 4.114045042103887e-05,
|
| 50 |
+
"epoch": 1.796875
|
| 51 |
+
},
|
| 52 |
+
{
|
| 53 |
+
"loss": 1.1146905899047852,
|
| 54 |
+
"grad_norm": 0.4208148717880249,
|
| 55 |
+
"learning_rate": 3.718944461187138e-05,
|
| 56 |
+
"epoch": 1.875
|
| 57 |
+
},
|
| 58 |
+
{
|
| 59 |
+
"loss": 0.9283761978149414,
|
| 60 |
+
"grad_norm": 0.3849687874317169,
|
| 61 |
+
"learning_rate": 3.332237841745898e-05,
|
| 62 |
+
"epoch": 1.953125
|
| 63 |
+
},
|
| 64 |
+
{
|
| 65 |
+
"loss": 1.113053035736084,
|
| 66 |
+
"grad_norm": 0.4142734110355377,
|
| 67 |
+
"learning_rate": 2.9564590321322207e-05,
|
| 68 |
+
"epoch": 2.03125
|
| 69 |
+
},
|
| 70 |
+
{
|
| 71 |
+
"loss": 0.9842248916625976,
|
| 72 |
+
"grad_norm": 0.44529953598976135,
|
| 73 |
+
"learning_rate": 2.5940702775459747e-05,
|
| 74 |
+
"epoch": 2.109375
|
| 75 |
+
},
|
| 76 |
+
{
|
| 77 |
+
"loss": 0.9449721336364746,
|
| 78 |
+
"grad_norm": 0.3756776750087738,
|
| 79 |
+
"learning_rate": 2.2474460864709824e-05,
|
| 80 |
+
"epoch": 2.1875
|
| 81 |
+
},
|
| 82 |
+
{
|
| 83 |
+
"loss": 1.0590093612670899,
|
| 84 |
+
"grad_norm": 0.4192875325679779,
|
| 85 |
+
"learning_rate": 1.9188576719953633e-05,
|
| 86 |
+
"epoch": 2.265625
|
| 87 |
+
},
|
| 88 |
+
{
|
| 89 |
+
"loss": 0.9768091201782226,
|
| 90 |
+
"grad_norm": 0.5095818638801575,
|
| 91 |
+
"learning_rate": 1.6104580699624837e-05,
|
| 92 |
+
"epoch": 2.34375
|
| 93 |
+
},
|
| 94 |
+
{
|
| 95 |
+
"loss": 1.038302516937256,
|
| 96 |
+
"grad_norm": 0.41709497570991516,
|
| 97 |
+
"learning_rate": 1.3242680314639993e-05,
|
| 98 |
+
"epoch": 2.421875
|
| 99 |
+
},
|
| 100 |
+
{
|
| 101 |
+
"loss": 0.9975608825683594,
|
| 102 |
+
"grad_norm": 0.5563586354255676,
|
| 103 |
+
"learning_rate": 1.0621627821127289e-05,
|
| 104 |
+
"epoch": 2.5
|
| 105 |
+
},
|
| 106 |
+
{
|
| 107 |
+
"loss": 0.9714397430419922,
|
| 108 |
+
"grad_norm": 0.8915637135505676,
|
| 109 |
+
"learning_rate": 8.25859734853645e-06,
|
| 110 |
+
"epoch": 2.578125
|
| 111 |
+
},
|
| 112 |
+
{
|
| 113 |
+
"loss": 0.9948483467102051,
|
| 114 |
+
"grad_norm": 0.4391196370124817,
|
| 115 |
+
"learning_rate": 6.16907236823262e-06,
|
| 116 |
+
"epoch": 2.65625
|
| 117 |
+
},
|
| 118 |
+
{
|
| 119 |
+
"loss": 0.9389057159423828,
|
| 120 |
+
"grad_norm": 0.4650712311267853,
|
| 121 |
+
"learning_rate": 4.366744239922998e-06,
|
| 122 |
+
"epoch": 2.734375
|
| 123 |
+
},
|
| 124 |
+
{
|
| 125 |
+
"loss": 1.06390380859375,
|
| 126 |
+
"grad_norm": 0.4836062788963318,
|
| 127 |
+
"learning_rate": 2.8634225006782865e-06,
|
| 128 |
+
"epoch": 2.8125
|
| 129 |
+
},
|
| 130 |
+
{
|
| 131 |
+
"loss": 1.008359718322754,
|
| 132 |
+
"grad_norm": 0.45215511322021484,
|
| 133 |
+
"learning_rate": 1.6689574843694433e-06,
|
| 134 |
+
"epoch": 2.890625
|
| 135 |
+
},
|
| 136 |
+
{
|
| 137 |
+
"loss": 1.0110493659973145,
|
| 138 |
+
"grad_norm": 0.5408219695091248,
|
| 139 |
+
"learning_rate": 7.911757785462881e-07,
|
| 140 |
+
"epoch": 2.96875
|
| 141 |
+
},
|
| 142 |
+
{
|
| 143 |
+
"loss": 0.911649227142334,
|
| 144 |
+
"grad_norm": 0.4599083364009857,
|
| 145 |
+
"learning_rate": 2.3582894166930268e-07,
|
| 146 |
+
"epoch": 3.046875
|
| 147 |
+
},
|
| 148 |
+
{
|
| 149 |
+
"loss": 0.9673548698425293,
|
| 150 |
+
"grad_norm": 0.43304941058158875,
|
| 151 |
+
"learning_rate": 6.5558167183898955e-09,
|
| 152 |
+
"epoch": 3.125
|
| 153 |
+
},
|
| 154 |
+
{
|
| 155 |
+
"train_runtime": 4256.6409,
|
| 156 |
+
"train_samples_per_second": 0.752,
|
| 157 |
+
"train_steps_per_second": 0.047,
|
| 158 |
+
"total_flos": 4.259313762009523e+16,
|
| 159 |
+
"train_loss": 1.142699921131134,
|
| 160 |
+
"epoch": 3.125
|
| 161 |
+
}
|
| 162 |
+
],
|
| 163 |
+
"train_metrics": {
|
| 164 |
+
"train_runtime": 4256.6409,
|
| 165 |
+
"train_samples_per_second": 0.752,
|
| 166 |
+
"train_steps_per_second": 0.047,
|
| 167 |
+
"total_flos": 4.259313762009523e+16,
|
| 168 |
+
"train_loss": 1.142699921131134,
|
| 169 |
+
"epoch": 3.125
|
| 170 |
+
},
|
| 171 |
+
"after_avg_score": 0.76875,
|
| 172 |
+
"score_delta": 0.0,
|
| 173 |
+
"adapter_dir": "/kaggle/working/gemma4_e4b_coding_lora",
|
| 174 |
+
"release_gate_pass": true
|
| 175 |
+
}
|
summary.json
ADDED
|
@@ -0,0 +1,175 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"run_id": "20260512_190433",
|
| 3 |
+
"model_id": "google/gemma-4-E4B-it",
|
| 4 |
+
"dataset_id": "ise-uiuc/Magicoder-Evol-Instruct-110K",
|
| 5 |
+
"adapter_repo": "josephmayo/gemma-4-E4B-it-coding-lora",
|
| 6 |
+
"merged_repo": "josephmayo/gemma-4-E4B-it-coding-merged",
|
| 7 |
+
"stage": "after_eval",
|
| 8 |
+
"errors": [],
|
| 9 |
+
"cuda_available": true,
|
| 10 |
+
"cuda_device_count": 2,
|
| 11 |
+
"devices": [
|
| 12 |
+
"Tesla T4",
|
| 13 |
+
"Tesla T4"
|
| 14 |
+
],
|
| 15 |
+
"torch_version_initial": "2.10.0+cu128",
|
| 16 |
+
"hf_token_present": false,
|
| 17 |
+
"max_train_samples": 1024,
|
| 18 |
+
"max_steps": 200,
|
| 19 |
+
"max_seq_length": 512,
|
| 20 |
+
"eval_count": 8,
|
| 21 |
+
"lora_r": 16,
|
| 22 |
+
"lora_alpha": 32,
|
| 23 |
+
"lr": 0.0001,
|
| 24 |
+
"grad_accum": 8,
|
| 25 |
+
"push_to_hf": true,
|
| 26 |
+
"merge_and_push": false,
|
| 27 |
+
"load_in_4bit": true,
|
| 28 |
+
"memory_after_load": [
|
| 29 |
+
0,
|
| 30 |
+
9302143488
|
| 31 |
+
],
|
| 32 |
+
"eval_source": "openai/openai_humaneval:8",
|
| 33 |
+
"baseline_avg_score": 0.76875,
|
| 34 |
+
"safe_train_rows": 1024,
|
| 35 |
+
"trainable_parameters": {
|
| 36 |
+
"trainable": 50499584,
|
| 37 |
+
"total": 7991600416
|
| 38 |
+
},
|
| 39 |
+
"log_history_tail": [
|
| 40 |
+
{
|
| 41 |
+
"loss": 1.0154043197631837,
|
| 42 |
+
"grad_norm": 0.37521687150001526,
|
| 43 |
+
"learning_rate": 4.51495073572676e-05,
|
| 44 |
+
"epoch": 1.71875
|
| 45 |
+
},
|
| 46 |
+
{
|
| 47 |
+
"loss": 0.9917967796325684,
|
| 48 |
+
"grad_norm": 0.42887604236602783,
|
| 49 |
+
"learning_rate": 4.114045042103887e-05,
|
| 50 |
+
"epoch": 1.796875
|
| 51 |
+
},
|
| 52 |
+
{
|
| 53 |
+
"loss": 1.1146905899047852,
|
| 54 |
+
"grad_norm": 0.4208148717880249,
|
| 55 |
+
"learning_rate": 3.718944461187138e-05,
|
| 56 |
+
"epoch": 1.875
|
| 57 |
+
},
|
| 58 |
+
{
|
| 59 |
+
"loss": 0.9283761978149414,
|
| 60 |
+
"grad_norm": 0.3849687874317169,
|
| 61 |
+
"learning_rate": 3.332237841745898e-05,
|
| 62 |
+
"epoch": 1.953125
|
| 63 |
+
},
|
| 64 |
+
{
|
| 65 |
+
"loss": 1.113053035736084,
|
| 66 |
+
"grad_norm": 0.4142734110355377,
|
| 67 |
+
"learning_rate": 2.9564590321322207e-05,
|
| 68 |
+
"epoch": 2.03125
|
| 69 |
+
},
|
| 70 |
+
{
|
| 71 |
+
"loss": 0.9842248916625976,
|
| 72 |
+
"grad_norm": 0.44529953598976135,
|
| 73 |
+
"learning_rate": 2.5940702775459747e-05,
|
| 74 |
+
"epoch": 2.109375
|
| 75 |
+
},
|
| 76 |
+
{
|
| 77 |
+
"loss": 0.9449721336364746,
|
| 78 |
+
"grad_norm": 0.3756776750087738,
|
| 79 |
+
"learning_rate": 2.2474460864709824e-05,
|
| 80 |
+
"epoch": 2.1875
|
| 81 |
+
},
|
| 82 |
+
{
|
| 83 |
+
"loss": 1.0590093612670899,
|
| 84 |
+
"grad_norm": 0.4192875325679779,
|
| 85 |
+
"learning_rate": 1.9188576719953633e-05,
|
| 86 |
+
"epoch": 2.265625
|
| 87 |
+
},
|
| 88 |
+
{
|
| 89 |
+
"loss": 0.9768091201782226,
|
| 90 |
+
"grad_norm": 0.5095818638801575,
|
| 91 |
+
"learning_rate": 1.6104580699624837e-05,
|
| 92 |
+
"epoch": 2.34375
|
| 93 |
+
},
|
| 94 |
+
{
|
| 95 |
+
"loss": 1.038302516937256,
|
| 96 |
+
"grad_norm": 0.41709497570991516,
|
| 97 |
+
"learning_rate": 1.3242680314639993e-05,
|
| 98 |
+
"epoch": 2.421875
|
| 99 |
+
},
|
| 100 |
+
{
|
| 101 |
+
"loss": 0.9975608825683594,
|
| 102 |
+
"grad_norm": 0.5563586354255676,
|
| 103 |
+
"learning_rate": 1.0621627821127289e-05,
|
| 104 |
+
"epoch": 2.5
|
| 105 |
+
},
|
| 106 |
+
{
|
| 107 |
+
"loss": 0.9714397430419922,
|
| 108 |
+
"grad_norm": 0.8915637135505676,
|
| 109 |
+
"learning_rate": 8.25859734853645e-06,
|
| 110 |
+
"epoch": 2.578125
|
| 111 |
+
},
|
| 112 |
+
{
|
| 113 |
+
"loss": 0.9948483467102051,
|
| 114 |
+
"grad_norm": 0.4391196370124817,
|
| 115 |
+
"learning_rate": 6.16907236823262e-06,
|
| 116 |
+
"epoch": 2.65625
|
| 117 |
+
},
|
| 118 |
+
{
|
| 119 |
+
"loss": 0.9389057159423828,
|
| 120 |
+
"grad_norm": 0.4650712311267853,
|
| 121 |
+
"learning_rate": 4.366744239922998e-06,
|
| 122 |
+
"epoch": 2.734375
|
| 123 |
+
},
|
| 124 |
+
{
|
| 125 |
+
"loss": 1.06390380859375,
|
| 126 |
+
"grad_norm": 0.4836062788963318,
|
| 127 |
+
"learning_rate": 2.8634225006782865e-06,
|
| 128 |
+
"epoch": 2.8125
|
| 129 |
+
},
|
| 130 |
+
{
|
| 131 |
+
"loss": 1.008359718322754,
|
| 132 |
+
"grad_norm": 0.45215511322021484,
|
| 133 |
+
"learning_rate": 1.6689574843694433e-06,
|
| 134 |
+
"epoch": 2.890625
|
| 135 |
+
},
|
| 136 |
+
{
|
| 137 |
+
"loss": 1.0110493659973145,
|
| 138 |
+
"grad_norm": 0.5408219695091248,
|
| 139 |
+
"learning_rate": 7.911757785462881e-07,
|
| 140 |
+
"epoch": 2.96875
|
| 141 |
+
},
|
| 142 |
+
{
|
| 143 |
+
"loss": 0.911649227142334,
|
| 144 |
+
"grad_norm": 0.4599083364009857,
|
| 145 |
+
"learning_rate": 2.3582894166930268e-07,
|
| 146 |
+
"epoch": 3.046875
|
| 147 |
+
},
|
| 148 |
+
{
|
| 149 |
+
"loss": 0.9673548698425293,
|
| 150 |
+
"grad_norm": 0.43304941058158875,
|
| 151 |
+
"learning_rate": 6.5558167183898955e-09,
|
| 152 |
+
"epoch": 3.125
|
| 153 |
+
},
|
| 154 |
+
{
|
| 155 |
+
"train_runtime": 4256.6409,
|
| 156 |
+
"train_samples_per_second": 0.752,
|
| 157 |
+
"train_steps_per_second": 0.047,
|
| 158 |
+
"total_flos": 4.259313762009523e+16,
|
| 159 |
+
"train_loss": 1.142699921131134,
|
| 160 |
+
"epoch": 3.125
|
| 161 |
+
}
|
| 162 |
+
],
|
| 163 |
+
"train_metrics": {
|
| 164 |
+
"train_runtime": 4256.6409,
|
| 165 |
+
"train_samples_per_second": 0.752,
|
| 166 |
+
"train_steps_per_second": 0.047,
|
| 167 |
+
"total_flos": 4.259313762009523e+16,
|
| 168 |
+
"train_loss": 1.142699921131134,
|
| 169 |
+
"epoch": 3.125
|
| 170 |
+
},
|
| 171 |
+
"after_avg_score": 0.76875,
|
| 172 |
+
"score_delta": 0.0,
|
| 173 |
+
"adapter_dir": "/kaggle/working/gemma4_e4b_coding_lora",
|
| 174 |
+
"release_gate_pass": true
|
| 175 |
+
}
|
tokenizer.json
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:88e7140798a237085e49912b68c73b1928d746c7d263133d61f7c3f39dca8431
|
| 3 |
+
size 32169724
|
tokenizer_config.json
ADDED
|
@@ -0,0 +1,96 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"audio_token": "<|audio|>",
|
| 3 |
+
"backend": "tokenizers",
|
| 4 |
+
"boa_token": "<|audio>",
|
| 5 |
+
"boi_token": "<|image>",
|
| 6 |
+
"bos_token": "<bos>",
|
| 7 |
+
"eoa_token": "<audio|>",
|
| 8 |
+
"eoc_token": "<channel|>",
|
| 9 |
+
"eoi_token": "<image|>",
|
| 10 |
+
"eos_token": "<eos>",
|
| 11 |
+
"eot_token": "<turn|>",
|
| 12 |
+
"escape_token": "<|\"|>",
|
| 13 |
+
"etc_token": "<tool_call|>",
|
| 14 |
+
"etd_token": "<tool|>",
|
| 15 |
+
"etr_token": "<tool_response|>",
|
| 16 |
+
"extra_special_tokens": [
|
| 17 |
+
"<|video|>"
|
| 18 |
+
],
|
| 19 |
+
"image_token": "<|image|>",
|
| 20 |
+
"is_local": false,
|
| 21 |
+
"local_files_only": false,
|
| 22 |
+
"mask_token": "<mask>",
|
| 23 |
+
"model_max_length": 1000000000000000019884624838656,
|
| 24 |
+
"model_specific_special_tokens": {
|
| 25 |
+
"audio_token": "<|audio|>",
|
| 26 |
+
"boa_token": "<|audio>",
|
| 27 |
+
"boi_token": "<|image>",
|
| 28 |
+
"eoa_token": "<audio|>",
|
| 29 |
+
"eoc_token": "<channel|>",
|
| 30 |
+
"eoi_token": "<image|>",
|
| 31 |
+
"eot_token": "<turn|>",
|
| 32 |
+
"escape_token": "<|\"|>",
|
| 33 |
+
"etc_token": "<tool_call|>",
|
| 34 |
+
"etd_token": "<tool|>",
|
| 35 |
+
"etr_token": "<tool_response|>",
|
| 36 |
+
"image_token": "<|image|>",
|
| 37 |
+
"soc_token": "<|channel>",
|
| 38 |
+
"sot_token": "<|turn>",
|
| 39 |
+
"stc_token": "<|tool_call>",
|
| 40 |
+
"std_token": "<|tool>",
|
| 41 |
+
"str_token": "<|tool_response>",
|
| 42 |
+
"think_token": "<|think|>"
|
| 43 |
+
},
|
| 44 |
+
"pad_token": "<pad>",
|
| 45 |
+
"padding_side": "left",
|
| 46 |
+
"processor_class": "Gemma4Processor",
|
| 47 |
+
"response_schema": {
|
| 48 |
+
"properties": {
|
| 49 |
+
"content": {
|
| 50 |
+
"type": "string"
|
| 51 |
+
},
|
| 52 |
+
"role": {
|
| 53 |
+
"const": "assistant"
|
| 54 |
+
},
|
| 55 |
+
"thinking": {
|
| 56 |
+
"type": "string"
|
| 57 |
+
},
|
| 58 |
+
"tool_calls": {
|
| 59 |
+
"items": {
|
| 60 |
+
"properties": {
|
| 61 |
+
"function": {
|
| 62 |
+
"properties": {
|
| 63 |
+
"arguments": {
|
| 64 |
+
"additionalProperties": {},
|
| 65 |
+
"type": "object",
|
| 66 |
+
"x-parser": "gemma4-tool-call"
|
| 67 |
+
},
|
| 68 |
+
"name": {
|
| 69 |
+
"type": "string"
|
| 70 |
+
}
|
| 71 |
+
},
|
| 72 |
+
"type": "object",
|
| 73 |
+
"x-regex": "call\\:(?P<name>\\w+)(?P<arguments>\\{.*\\})"
|
| 74 |
+
},
|
| 75 |
+
"type": {
|
| 76 |
+
"const": "function"
|
| 77 |
+
}
|
| 78 |
+
},
|
| 79 |
+
"type": "object"
|
| 80 |
+
},
|
| 81 |
+
"type": "array",
|
| 82 |
+
"x-regex-iterator": "<\\|tool_call>(.*?)<tool_call\\|>"
|
| 83 |
+
}
|
| 84 |
+
},
|
| 85 |
+
"type": "object",
|
| 86 |
+
"x-regex": "(\\<\\|channel\\>thought\\n(?P<thinking>.*?)\\<channel\\|\\>)?(?P<tool_calls>\\<\\|tool_call\\>.*\\<tool_call\\|\\>)?(?P<content>(?:(?!\\<turn\\|\\>)(?!\\<\\|tool_response\\>).)+)?(?:\\<turn\\|\\>|\\<\\|tool_response\\>)?"
|
| 87 |
+
},
|
| 88 |
+
"soc_token": "<|channel>",
|
| 89 |
+
"sot_token": "<|turn>",
|
| 90 |
+
"stc_token": "<|tool_call>",
|
| 91 |
+
"std_token": "<|tool>",
|
| 92 |
+
"str_token": "<|tool_response>",
|
| 93 |
+
"think_token": "<|think|>",
|
| 94 |
+
"tokenizer_class": "GemmaTokenizer",
|
| 95 |
+
"unk_token": "<unk>"
|
| 96 |
+
}
|
trainer_log_history.json
ADDED
|
@@ -0,0 +1,291 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
[
|
| 2 |
+
{
|
| 3 |
+
"loss": 2.7602264404296877,
|
| 4 |
+
"grad_norm": 3.374150276184082,
|
| 5 |
+
"learning_rate": 6.666666666666667e-05,
|
| 6 |
+
"epoch": 0.078125,
|
| 7 |
+
"step": 5
|
| 8 |
+
},
|
| 9 |
+
{
|
| 10 |
+
"loss": 1.8698038101196288,
|
| 11 |
+
"grad_norm": 1.457783579826355,
|
| 12 |
+
"learning_rate": 9.994100796397954e-05,
|
| 13 |
+
"epoch": 0.15625,
|
| 14 |
+
"step": 10
|
| 15 |
+
},
|
| 16 |
+
{
|
| 17 |
+
"loss": 1.319937038421631,
|
| 18 |
+
"grad_norm": 0.707484245300293,
|
| 19 |
+
"learning_rate": 9.958100506132127e-05,
|
| 20 |
+
"epoch": 0.234375,
|
| 21 |
+
"step": 15
|
| 22 |
+
},
|
| 23 |
+
{
|
| 24 |
+
"loss": 1.3707428932189942,
|
| 25 |
+
"grad_norm": 0.43754199147224426,
|
| 26 |
+
"learning_rate": 9.889612861977853e-05,
|
| 27 |
+
"epoch": 0.3125,
|
| 28 |
+
"step": 20
|
| 29 |
+
},
|
| 30 |
+
{
|
| 31 |
+
"loss": 1.3082144737243653,
|
| 32 |
+
"grad_norm": 0.35036444664001465,
|
| 33 |
+
"learning_rate": 9.789086620939936e-05,
|
| 34 |
+
"epoch": 0.390625,
|
| 35 |
+
"step": 25
|
| 36 |
+
},
|
| 37 |
+
{
|
| 38 |
+
"loss": 1.283499526977539,
|
| 39 |
+
"grad_norm": 0.26873722672462463,
|
| 40 |
+
"learning_rate": 9.657180469054213e-05,
|
| 41 |
+
"epoch": 0.46875,
|
| 42 |
+
"step": 30
|
| 43 |
+
},
|
| 44 |
+
{
|
| 45 |
+
"loss": 1.1428126335144042,
|
| 46 |
+
"grad_norm": 0.31287550926208496,
|
| 47 |
+
"learning_rate": 9.494758705426978e-05,
|
| 48 |
+
"epoch": 0.546875,
|
| 49 |
+
"step": 35
|
| 50 |
+
},
|
| 51 |
+
{
|
| 52 |
+
"loss": 1.1817453384399415,
|
| 53 |
+
"grad_norm": 0.2988075911998749,
|
| 54 |
+
"learning_rate": 9.302885579019627e-05,
|
| 55 |
+
"epoch": 0.625,
|
| 56 |
+
"step": 40
|
| 57 |
+
},
|
| 58 |
+
{
|
| 59 |
+
"loss": 1.1872751235961914,
|
| 60 |
+
"grad_norm": 0.2937701642513275,
|
| 61 |
+
"learning_rate": 9.082818315286055e-05,
|
| 62 |
+
"epoch": 0.703125,
|
| 63 |
+
"step": 45
|
| 64 |
+
},
|
| 65 |
+
{
|
| 66 |
+
"loss": 1.146270751953125,
|
| 67 |
+
"grad_norm": 0.3410361111164093,
|
| 68 |
+
"learning_rate": 8.835998878354931e-05,
|
| 69 |
+
"epoch": 0.78125,
|
| 70 |
+
"step": 50
|
| 71 |
+
},
|
| 72 |
+
{
|
| 73 |
+
"loss": 1.167984962463379,
|
| 74 |
+
"grad_norm": 0.3269799053668976,
|
| 75 |
+
"learning_rate": 8.564044522734147e-05,
|
| 76 |
+
"epoch": 0.859375,
|
| 77 |
+
"step": 55
|
| 78 |
+
},
|
| 79 |
+
{
|
| 80 |
+
"loss": 1.0844226837158204,
|
| 81 |
+
"grad_norm": 0.44009652733802795,
|
| 82 |
+
"learning_rate": 8.268737196446264e-05,
|
| 83 |
+
"epoch": 0.9375,
|
| 84 |
+
"step": 60
|
| 85 |
+
},
|
| 86 |
+
{
|
| 87 |
+
"loss": 1.1493185043334961,
|
| 88 |
+
"grad_norm": 0.383281409740448,
|
| 89 |
+
"learning_rate": 7.952011865029614e-05,
|
| 90 |
+
"epoch": 1.015625,
|
| 91 |
+
"step": 65
|
| 92 |
+
},
|
| 93 |
+
{
|
| 94 |
+
"loss": 1.1071245193481445,
|
| 95 |
+
"grad_norm": 0.40226107835769653,
|
| 96 |
+
"learning_rate": 7.61594383291065e-05,
|
| 97 |
+
"epoch": 1.09375,
|
| 98 |
+
"step": 70
|
| 99 |
+
},
|
| 100 |
+
{
|
| 101 |
+
"loss": 1.062960433959961,
|
| 102 |
+
"grad_norm": 0.3419550359249115,
|
| 103 |
+
"learning_rate": 7.262735145222696e-05,
|
| 104 |
+
"epoch": 1.171875,
|
| 105 |
+
"step": 75
|
| 106 |
+
},
|
| 107 |
+
{
|
| 108 |
+
"loss": 1.1302300453186036,
|
| 109 |
+
"grad_norm": 0.39782464504241943,
|
| 110 |
+
"learning_rate": 6.894700159171534e-05,
|
| 111 |
+
"epoch": 1.25,
|
| 112 |
+
"step": 80
|
| 113 |
+
},
|
| 114 |
+
{
|
| 115 |
+
"loss": 1.0111728668212892,
|
| 116 |
+
"grad_norm": 0.3370531499385834,
|
| 117 |
+
"learning_rate": 6.514250379489753e-05,
|
| 118 |
+
"epoch": 1.328125,
|
| 119 |
+
"step": 85
|
| 120 |
+
},
|
| 121 |
+
{
|
| 122 |
+
"loss": 1.0343121528625487,
|
| 123 |
+
"grad_norm": 0.35490792989730835,
|
| 124 |
+
"learning_rate": 6.123878657343648e-05,
|
| 125 |
+
"epoch": 1.40625,
|
| 126 |
+
"step": 90
|
| 127 |
+
},
|
| 128 |
+
{
|
| 129 |
+
"loss": 1.161344337463379,
|
| 130 |
+
"grad_norm": 0.4397459328174591,
|
| 131 |
+
"learning_rate": 5.726142856227452e-05,
|
| 132 |
+
"epoch": 1.484375,
|
| 133 |
+
"step": 95
|
| 134 |
+
},
|
| 135 |
+
{
|
| 136 |
+
"loss": 1.1250411987304687,
|
| 137 |
+
"grad_norm": 0.375924289226532,
|
| 138 |
+
"learning_rate": 5.3236490918721794e-05,
|
| 139 |
+
"epoch": 1.5625,
|
| 140 |
+
"step": 100
|
| 141 |
+
},
|
| 142 |
+
{
|
| 143 |
+
"loss": 1.0718464851379395,
|
| 144 |
+
"grad_norm": 0.41714778542518616,
|
| 145 |
+
"learning_rate": 4.919034655987493e-05,
|
| 146 |
+
"epoch": 1.640625,
|
| 147 |
+
"step": 105
|
| 148 |
+
},
|
| 149 |
+
{
|
| 150 |
+
"loss": 1.0154043197631837,
|
| 151 |
+
"grad_norm": 0.37521687150001526,
|
| 152 |
+
"learning_rate": 4.51495073572676e-05,
|
| 153 |
+
"epoch": 1.71875,
|
| 154 |
+
"step": 110
|
| 155 |
+
},
|
| 156 |
+
{
|
| 157 |
+
"loss": 0.9917967796325684,
|
| 158 |
+
"grad_norm": 0.42887604236602783,
|
| 159 |
+
"learning_rate": 4.114045042103887e-05,
|
| 160 |
+
"epoch": 1.796875,
|
| 161 |
+
"step": 115
|
| 162 |
+
},
|
| 163 |
+
{
|
| 164 |
+
"loss": 1.1146905899047852,
|
| 165 |
+
"grad_norm": 0.4208148717880249,
|
| 166 |
+
"learning_rate": 3.718944461187138e-05,
|
| 167 |
+
"epoch": 1.875,
|
| 168 |
+
"step": 120
|
| 169 |
+
},
|
| 170 |
+
{
|
| 171 |
+
"loss": 0.9283761978149414,
|
| 172 |
+
"grad_norm": 0.3849687874317169,
|
| 173 |
+
"learning_rate": 3.332237841745898e-05,
|
| 174 |
+
"epoch": 1.953125,
|
| 175 |
+
"step": 125
|
| 176 |
+
},
|
| 177 |
+
{
|
| 178 |
+
"loss": 1.113053035736084,
|
| 179 |
+
"grad_norm": 0.4142734110355377,
|
| 180 |
+
"learning_rate": 2.9564590321322207e-05,
|
| 181 |
+
"epoch": 2.03125,
|
| 182 |
+
"step": 130
|
| 183 |
+
},
|
| 184 |
+
{
|
| 185 |
+
"loss": 0.9842248916625976,
|
| 186 |
+
"grad_norm": 0.44529953598976135,
|
| 187 |
+
"learning_rate": 2.5940702775459747e-05,
|
| 188 |
+
"epoch": 2.109375,
|
| 189 |
+
"step": 135
|
| 190 |
+
},
|
| 191 |
+
{
|
| 192 |
+
"loss": 0.9449721336364746,
|
| 193 |
+
"grad_norm": 0.3756776750087738,
|
| 194 |
+
"learning_rate": 2.2474460864709824e-05,
|
| 195 |
+
"epoch": 2.1875,
|
| 196 |
+
"step": 140
|
| 197 |
+
},
|
| 198 |
+
{
|
| 199 |
+
"loss": 1.0590093612670899,
|
| 200 |
+
"grad_norm": 0.4192875325679779,
|
| 201 |
+
"learning_rate": 1.9188576719953633e-05,
|
| 202 |
+
"epoch": 2.265625,
|
| 203 |
+
"step": 145
|
| 204 |
+
},
|
| 205 |
+
{
|
| 206 |
+
"loss": 0.9768091201782226,
|
| 207 |
+
"grad_norm": 0.5095818638801575,
|
| 208 |
+
"learning_rate": 1.6104580699624837e-05,
|
| 209 |
+
"epoch": 2.34375,
|
| 210 |
+
"step": 150
|
| 211 |
+
},
|
| 212 |
+
{
|
| 213 |
+
"loss": 1.038302516937256,
|
| 214 |
+
"grad_norm": 0.41709497570991516,
|
| 215 |
+
"learning_rate": 1.3242680314639993e-05,
|
| 216 |
+
"epoch": 2.421875,
|
| 217 |
+
"step": 155
|
| 218 |
+
},
|
| 219 |
+
{
|
| 220 |
+
"loss": 0.9975608825683594,
|
| 221 |
+
"grad_norm": 0.5563586354255676,
|
| 222 |
+
"learning_rate": 1.0621627821127289e-05,
|
| 223 |
+
"epoch": 2.5,
|
| 224 |
+
"step": 160
|
| 225 |
+
},
|
| 226 |
+
{
|
| 227 |
+
"loss": 0.9714397430419922,
|
| 228 |
+
"grad_norm": 0.8915637135505676,
|
| 229 |
+
"learning_rate": 8.25859734853645e-06,
|
| 230 |
+
"epoch": 2.578125,
|
| 231 |
+
"step": 165
|
| 232 |
+
},
|
| 233 |
+
{
|
| 234 |
+
"loss": 0.9948483467102051,
|
| 235 |
+
"grad_norm": 0.4391196370124817,
|
| 236 |
+
"learning_rate": 6.16907236823262e-06,
|
| 237 |
+
"epoch": 2.65625,
|
| 238 |
+
"step": 170
|
| 239 |
+
},
|
| 240 |
+
{
|
| 241 |
+
"loss": 0.9389057159423828,
|
| 242 |
+
"grad_norm": 0.4650712311267853,
|
| 243 |
+
"learning_rate": 4.366744239922998e-06,
|
| 244 |
+
"epoch": 2.734375,
|
| 245 |
+
"step": 175
|
| 246 |
+
},
|
| 247 |
+
{
|
| 248 |
+
"loss": 1.06390380859375,
|
| 249 |
+
"grad_norm": 0.4836062788963318,
|
| 250 |
+
"learning_rate": 2.8634225006782865e-06,
|
| 251 |
+
"epoch": 2.8125,
|
| 252 |
+
"step": 180
|
| 253 |
+
},
|
| 254 |
+
{
|
| 255 |
+
"loss": 1.008359718322754,
|
| 256 |
+
"grad_norm": 0.45215511322021484,
|
| 257 |
+
"learning_rate": 1.6689574843694433e-06,
|
| 258 |
+
"epoch": 2.890625,
|
| 259 |
+
"step": 185
|
| 260 |
+
},
|
| 261 |
+
{
|
| 262 |
+
"loss": 1.0110493659973145,
|
| 263 |
+
"grad_norm": 0.5408219695091248,
|
| 264 |
+
"learning_rate": 7.911757785462881e-07,
|
| 265 |
+
"epoch": 2.96875,
|
| 266 |
+
"step": 190
|
| 267 |
+
},
|
| 268 |
+
{
|
| 269 |
+
"loss": 0.911649227142334,
|
| 270 |
+
"grad_norm": 0.4599083364009857,
|
| 271 |
+
"learning_rate": 2.3582894166930268e-07,
|
| 272 |
+
"epoch": 3.046875,
|
| 273 |
+
"step": 195
|
| 274 |
+
},
|
| 275 |
+
{
|
| 276 |
+
"loss": 0.9673548698425293,
|
| 277 |
+
"grad_norm": 0.43304941058158875,
|
| 278 |
+
"learning_rate": 6.5558167183898955e-09,
|
| 279 |
+
"epoch": 3.125,
|
| 280 |
+
"step": 200
|
| 281 |
+
},
|
| 282 |
+
{
|
| 283 |
+
"train_runtime": 4256.6409,
|
| 284 |
+
"train_samples_per_second": 0.752,
|
| 285 |
+
"train_steps_per_second": 0.047,
|
| 286 |
+
"total_flos": 4.259313762009523e+16,
|
| 287 |
+
"train_loss": 1.142699921131134,
|
| 288 |
+
"epoch": 3.125,
|
| 289 |
+
"step": 200
|
| 290 |
+
}
|
| 291 |
+
]
|