Yichen Feng commited on
Commit
1f306bf
·
verified ·
1 Parent(s): aed26a9

End of training

Browse files
Files changed (5) hide show
  1. README.md +2 -1
  2. all_results.json +8 -0
  3. train_results.json +8 -0
  4. trainer_state.json +1156 -0
  5. training_loss.png +0 -0
README.md CHANGED
@@ -4,6 +4,7 @@ license: apache-2.0
4
  base_model: Qwen/Qwen3.5-35B-A3B
5
  tags:
6
  - llama-factory
 
7
  - generated_from_trainer
8
  model-index:
9
  - name: Qwen3.5-35B-A3B-SFT-artarena_sft-LR1.0e-6-EPOCHS3-LF
@@ -15,7 +16,7 @@ should probably proofread and complete it, then remove this comment. -->
15
 
16
  # Qwen3.5-35B-A3B-SFT-artarena_sft-LR1.0e-6-EPOCHS3-LF
17
 
18
- This model is a fine-tuned version of [Qwen/Qwen3.5-35B-A3B](https://huggingface.co/Qwen/Qwen3.5-35B-A3B) on an unknown dataset.
19
 
20
  ## Model description
21
 
 
4
  base_model: Qwen/Qwen3.5-35B-A3B
5
  tags:
6
  - llama-factory
7
+ - full
8
  - generated_from_trainer
9
  model-index:
10
  - name: Qwen3.5-35B-A3B-SFT-artarena_sft-LR1.0e-6-EPOCHS3-LF
 
16
 
17
  # Qwen3.5-35B-A3B-SFT-artarena_sft-LR1.0e-6-EPOCHS3-LF
18
 
19
+ This model is a fine-tuned version of [Qwen/Qwen3.5-35B-A3B](https://huggingface.co/Qwen/Qwen3.5-35B-A3B) on the artarena_sft dataset.
20
 
21
  ## Model description
22
 
all_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 3.0,
3
+ "total_flos": 23335512768512.0,
4
+ "train_loss": 0.8809327138294963,
5
+ "train_runtime": 1440.1859,
6
+ "train_samples_per_second": 3.485,
7
+ "train_steps_per_second": 0.11
8
+ }
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 3.0,
3
+ "total_flos": 23335512768512.0,
4
+ "train_loss": 0.8809327138294963,
5
+ "train_runtime": 1440.1859,
6
+ "train_samples_per_second": 3.485,
7
+ "train_steps_per_second": 0.11
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,1156 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": null,
3
+ "best_metric": null,
4
+ "best_model_checkpoint": null,
5
+ "epoch": 3.0,
6
+ "eval_steps": 500,
7
+ "global_step": 159,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.018867924528301886,
14
+ "grad_norm": 15.087599797333977,
15
+ "learning_rate": 0.0,
16
+ "loss": 1.3308970928192139,
17
+ "step": 1
18
+ },
19
+ {
20
+ "epoch": 0.03773584905660377,
21
+ "grad_norm": 13.905749218469701,
22
+ "learning_rate": 1e-07,
23
+ "loss": 1.285962700843811,
24
+ "step": 2
25
+ },
26
+ {
27
+ "epoch": 0.05660377358490566,
28
+ "grad_norm": 12.056842971309038,
29
+ "learning_rate": 2e-07,
30
+ "loss": 1.3014495372772217,
31
+ "step": 3
32
+ },
33
+ {
34
+ "epoch": 0.07547169811320754,
35
+ "grad_norm": 13.244790508016711,
36
+ "learning_rate": 3e-07,
37
+ "loss": 1.306698203086853,
38
+ "step": 4
39
+ },
40
+ {
41
+ "epoch": 0.09433962264150944,
42
+ "grad_norm": 13.343404572220207,
43
+ "learning_rate": 4e-07,
44
+ "loss": 1.316192388534546,
45
+ "step": 5
46
+ },
47
+ {
48
+ "epoch": 0.11320754716981132,
49
+ "grad_norm": 11.45228080585811,
50
+ "learning_rate": 5e-07,
51
+ "loss": 1.3045374155044556,
52
+ "step": 6
53
+ },
54
+ {
55
+ "epoch": 0.1320754716981132,
56
+ "grad_norm": 14.318691524925551,
57
+ "learning_rate": 6e-07,
58
+ "loss": 1.3311705589294434,
59
+ "step": 7
60
+ },
61
+ {
62
+ "epoch": 0.1509433962264151,
63
+ "grad_norm": 12.050953431316902,
64
+ "learning_rate": 7e-07,
65
+ "loss": 1.2908077239990234,
66
+ "step": 8
67
+ },
68
+ {
69
+ "epoch": 0.16981132075471697,
70
+ "grad_norm": 10.724349690276135,
71
+ "learning_rate": 8e-07,
72
+ "loss": 1.3058435916900635,
73
+ "step": 9
74
+ },
75
+ {
76
+ "epoch": 0.18867924528301888,
77
+ "grad_norm": 9.473154789049158,
78
+ "learning_rate": 9e-07,
79
+ "loss": 1.2856130599975586,
80
+ "step": 10
81
+ },
82
+ {
83
+ "epoch": 0.20754716981132076,
84
+ "grad_norm": 6.5491980764342275,
85
+ "learning_rate": 1e-06,
86
+ "loss": 1.2199636697769165,
87
+ "step": 11
88
+ },
89
+ {
90
+ "epoch": 0.22641509433962265,
91
+ "grad_norm": 6.462810113541478,
92
+ "learning_rate": 9.99888864929809e-07,
93
+ "loss": 1.1673463582992554,
94
+ "step": 12
95
+ },
96
+ {
97
+ "epoch": 0.24528301886792453,
98
+ "grad_norm": 6.9296822460672045,
99
+ "learning_rate": 9.995555091232516e-07,
100
+ "loss": 1.1699671745300293,
101
+ "step": 13
102
+ },
103
+ {
104
+ "epoch": 0.2641509433962264,
105
+ "grad_norm": 6.0515568106146596,
106
+ "learning_rate": 9.990000807704114e-07,
107
+ "loss": 1.1814613342285156,
108
+ "step": 14
109
+ },
110
+ {
111
+ "epoch": 0.2830188679245283,
112
+ "grad_norm": 4.743020637878028,
113
+ "learning_rate": 9.982228267815643e-07,
114
+ "loss": 1.0652694702148438,
115
+ "step": 15
116
+ },
117
+ {
118
+ "epoch": 0.3018867924528302,
119
+ "grad_norm": 4.526630266791274,
120
+ "learning_rate": 9.972240926774166e-07,
121
+ "loss": 1.0635337829589844,
122
+ "step": 16
123
+ },
124
+ {
125
+ "epoch": 0.32075471698113206,
126
+ "grad_norm": 4.609514753545406,
127
+ "learning_rate": 9.96004322435508e-07,
128
+ "loss": 1.0902111530303955,
129
+ "step": 17
130
+ },
131
+ {
132
+ "epoch": 0.33962264150943394,
133
+ "grad_norm": 4.34097054100359,
134
+ "learning_rate": 9.945640582928437e-07,
135
+ "loss": 1.06702721118927,
136
+ "step": 18
137
+ },
138
+ {
139
+ "epoch": 0.3584905660377358,
140
+ "grad_norm": 3.864434007517437,
141
+ "learning_rate": 9.9290394050485e-07,
142
+ "loss": 1.0476477146148682,
143
+ "step": 19
144
+ },
145
+ {
146
+ "epoch": 0.37735849056603776,
147
+ "grad_norm": 3.8969527857656656,
148
+ "learning_rate": 9.91024707060755e-07,
149
+ "loss": 1.0617330074310303,
150
+ "step": 20
151
+ },
152
+ {
153
+ "epoch": 0.39622641509433965,
154
+ "grad_norm": 3.911924948199517,
155
+ "learning_rate": 9.889271933555212e-07,
156
+ "loss": 1.07832932472229,
157
+ "step": 21
158
+ },
159
+ {
160
+ "epoch": 0.41509433962264153,
161
+ "grad_norm": 3.833567236811609,
162
+ "learning_rate": 9.8661233181848e-07,
163
+ "loss": 1.0324124097824097,
164
+ "step": 22
165
+ },
166
+ {
167
+ "epoch": 0.4339622641509434,
168
+ "grad_norm": 3.6955004487569396,
169
+ "learning_rate": 9.840811514988293e-07,
170
+ "loss": 0.9815853834152222,
171
+ "step": 23
172
+ },
173
+ {
174
+ "epoch": 0.4528301886792453,
175
+ "grad_norm": 4.023404125176107,
176
+ "learning_rate": 9.813347776081788e-07,
177
+ "loss": 1.0266845226287842,
178
+ "step": 24
179
+ },
180
+ {
181
+ "epoch": 0.4716981132075472,
182
+ "grad_norm": 3.712461139695743,
183
+ "learning_rate": 9.78374431020349e-07,
184
+ "loss": 1.0085935592651367,
185
+ "step": 25
186
+ },
187
+ {
188
+ "epoch": 0.49056603773584906,
189
+ "grad_norm": 3.7543864084874596,
190
+ "learning_rate": 9.752014277286431e-07,
191
+ "loss": 0.9968965649604797,
192
+ "step": 26
193
+ },
194
+ {
195
+ "epoch": 0.5094339622641509,
196
+ "grad_norm": 3.8046734306564467,
197
+ "learning_rate": 9.718171782608353e-07,
198
+ "loss": 0.9803509712219238,
199
+ "step": 27
200
+ },
201
+ {
202
+ "epoch": 0.5283018867924528,
203
+ "grad_norm": 3.6105782650336433,
204
+ "learning_rate": 9.682231870521345e-07,
205
+ "loss": 0.9759021997451782,
206
+ "step": 28
207
+ },
208
+ {
209
+ "epoch": 0.5471698113207547,
210
+ "grad_norm": 3.3896428780092753,
211
+ "learning_rate": 9.644210517764013e-07,
212
+ "loss": 0.9812103509902954,
213
+ "step": 29
214
+ },
215
+ {
216
+ "epoch": 0.5660377358490566,
217
+ "grad_norm": 3.118079780719029,
218
+ "learning_rate": 9.60412462635919e-07,
219
+ "loss": 0.9091012477874756,
220
+ "step": 30
221
+ },
222
+ {
223
+ "epoch": 0.5849056603773585,
224
+ "grad_norm": 3.3662986364845,
225
+ "learning_rate": 9.561992016100291e-07,
226
+ "loss": 0.9503388404846191,
227
+ "step": 31
228
+ },
229
+ {
230
+ "epoch": 0.6037735849056604,
231
+ "grad_norm": 2.9779547004368196,
232
+ "learning_rate": 9.517831416629716e-07,
233
+ "loss": 0.9247981309890747,
234
+ "step": 32
235
+ },
236
+ {
237
+ "epoch": 0.6226415094339622,
238
+ "grad_norm": 3.468415170701323,
239
+ "learning_rate": 9.471662459112745e-07,
240
+ "loss": 0.9473499655723572,
241
+ "step": 33
242
+ },
243
+ {
244
+ "epoch": 0.6415094339622641,
245
+ "grad_norm": 2.8573918489427688,
246
+ "learning_rate": 9.423505667510723e-07,
247
+ "loss": 0.9340516328811646,
248
+ "step": 34
249
+ },
250
+ {
251
+ "epoch": 0.660377358490566,
252
+ "grad_norm": 2.949529150108781,
253
+ "learning_rate": 9.373382449457303e-07,
254
+ "loss": 0.9248940348625183,
255
+ "step": 35
256
+ },
257
+ {
258
+ "epoch": 0.6792452830188679,
259
+ "grad_norm": 2.9658340262784697,
260
+ "learning_rate": 9.321315086741915e-07,
261
+ "loss": 0.9420664310455322,
262
+ "step": 36
263
+ },
264
+ {
265
+ "epoch": 0.6981132075471698,
266
+ "grad_norm": 3.019712899281778,
267
+ "learning_rate": 9.267326725404598e-07,
268
+ "loss": 0.9231287240982056,
269
+ "step": 37
270
+ },
271
+ {
272
+ "epoch": 0.7169811320754716,
273
+ "grad_norm": 2.827563138085356,
274
+ "learning_rate": 9.21144136544666e-07,
275
+ "loss": 0.9293084740638733,
276
+ "step": 38
277
+ },
278
+ {
279
+ "epoch": 0.7358490566037735,
280
+ "grad_norm": 3.126960585054511,
281
+ "learning_rate": 9.153683850161705e-07,
282
+ "loss": 0.9372609853744507,
283
+ "step": 39
284
+ },
285
+ {
286
+ "epoch": 0.7547169811320755,
287
+ "grad_norm": 2.7757572634358456,
288
+ "learning_rate": 9.094079855091797e-07,
289
+ "loss": 0.9204014539718628,
290
+ "step": 40
291
+ },
292
+ {
293
+ "epoch": 0.7735849056603774,
294
+ "grad_norm": 2.86268897243828,
295
+ "learning_rate": 9.032655876613635e-07,
296
+ "loss": 0.9143469333648682,
297
+ "step": 41
298
+ },
299
+ {
300
+ "epoch": 0.7924528301886793,
301
+ "grad_norm": 2.899411491265449,
302
+ "learning_rate": 8.96943922015986e-07,
303
+ "loss": 0.901626467704773,
304
+ "step": 42
305
+ },
306
+ {
307
+ "epoch": 0.8113207547169812,
308
+ "grad_norm": 3.0296165470958494,
309
+ "learning_rate": 8.90445798808068e-07,
310
+ "loss": 0.9193109273910522,
311
+ "step": 43
312
+ },
313
+ {
314
+ "epoch": 0.8301886792452831,
315
+ "grad_norm": 2.832066082274235,
316
+ "learning_rate": 8.837741067151249e-07,
317
+ "loss": 0.9078618288040161,
318
+ "step": 44
319
+ },
320
+ {
321
+ "epoch": 0.8490566037735849,
322
+ "grad_norm": 2.9792386000035083,
323
+ "learning_rate": 8.769318115730328e-07,
324
+ "loss": 0.9032235145568848,
325
+ "step": 45
326
+ },
327
+ {
328
+ "epoch": 0.8679245283018868,
329
+ "grad_norm": 2.8570785041355373,
330
+ "learning_rate": 8.699219550575952e-07,
331
+ "loss": 0.8799638152122498,
332
+ "step": 46
333
+ },
334
+ {
335
+ "epoch": 0.8867924528301887,
336
+ "grad_norm": 2.8898604537645185,
337
+ "learning_rate": 8.627476533323956e-07,
338
+ "loss": 0.9072629809379578,
339
+ "step": 47
340
+ },
341
+ {
342
+ "epoch": 0.9056603773584906,
343
+ "grad_norm": 2.819489131324746,
344
+ "learning_rate": 8.554120956635374e-07,
345
+ "loss": 0.879642128944397,
346
+ "step": 48
347
+ },
348
+ {
349
+ "epoch": 0.9245283018867925,
350
+ "grad_norm": 2.884576949261456,
351
+ "learning_rate": 8.479185430018858e-07,
352
+ "loss": 0.9129672050476074,
353
+ "step": 49
354
+ },
355
+ {
356
+ "epoch": 0.9433962264150944,
357
+ "grad_norm": 2.8206974490824663,
358
+ "learning_rate": 8.402703265334454e-07,
359
+ "loss": 0.9072036147117615,
360
+ "step": 50
361
+ },
362
+ {
363
+ "epoch": 0.9622641509433962,
364
+ "grad_norm": 2.8666837714043414,
365
+ "learning_rate": 8.324708461985124e-07,
366
+ "loss": 0.8936312198638916,
367
+ "step": 51
368
+ },
369
+ {
370
+ "epoch": 0.9811320754716981,
371
+ "grad_norm": 2.75278105425475,
372
+ "learning_rate": 8.245235691802643e-07,
373
+ "loss": 0.886029839515686,
374
+ "step": 52
375
+ },
376
+ {
377
+ "epoch": 1.0,
378
+ "grad_norm": 2.9063116637756807,
379
+ "learning_rate": 8.164320283634585e-07,
380
+ "loss": 0.886949360370636,
381
+ "step": 53
382
+ },
383
+ {
384
+ "epoch": 1.0188679245283019,
385
+ "grad_norm": 2.8027377644406104,
386
+ "learning_rate": 8.081998207639212e-07,
387
+ "loss": 0.8734487891197205,
388
+ "step": 54
389
+ },
390
+ {
391
+ "epoch": 1.0377358490566038,
392
+ "grad_norm": 2.975237594360833,
393
+ "learning_rate": 7.998306059295302e-07,
394
+ "loss": 0.8541756868362427,
395
+ "step": 55
396
+ },
397
+ {
398
+ "epoch": 1.0566037735849056,
399
+ "grad_norm": 2.7212092257296785,
400
+ "learning_rate": 7.913281043133977e-07,
401
+ "loss": 0.855162501335144,
402
+ "step": 56
403
+ },
404
+ {
405
+ "epoch": 1.0754716981132075,
406
+ "grad_norm": 4.004522306787069,
407
+ "learning_rate": 7.826960956199794e-07,
408
+ "loss": 0.8469276428222656,
409
+ "step": 57
410
+ },
411
+ {
412
+ "epoch": 1.0943396226415094,
413
+ "grad_norm": 2.789521379215554,
414
+ "learning_rate": 7.739384171248434e-07,
415
+ "loss": 0.8612252473831177,
416
+ "step": 58
417
+ },
418
+ {
419
+ "epoch": 1.1132075471698113,
420
+ "grad_norm": 3.0001618191920008,
421
+ "learning_rate": 7.650589619688468e-07,
422
+ "loss": 0.8504967093467712,
423
+ "step": 59
424
+ },
425
+ {
426
+ "epoch": 1.1320754716981132,
427
+ "grad_norm": 2.803340918384437,
428
+ "learning_rate": 7.560616774274774e-07,
429
+ "loss": 0.8487892150878906,
430
+ "step": 60
431
+ },
432
+ {
433
+ "epoch": 1.150943396226415,
434
+ "grad_norm": 2.7872996717171112,
435
+ "learning_rate": 7.469505631561317e-07,
436
+ "loss": 0.8430064916610718,
437
+ "step": 61
438
+ },
439
+ {
440
+ "epoch": 1.169811320754717,
441
+ "grad_norm": 2.767338948376076,
442
+ "learning_rate": 7.377296694121058e-07,
443
+ "loss": 0.834577202796936,
444
+ "step": 62
445
+ },
446
+ {
447
+ "epoch": 1.1886792452830188,
448
+ "grad_norm": 2.7744551402453883,
449
+ "learning_rate": 7.284030952540936e-07,
450
+ "loss": 0.8389214277267456,
451
+ "step": 63
452
+ },
453
+ {
454
+ "epoch": 1.2075471698113207,
455
+ "grad_norm": 2.94391173341089,
456
+ "learning_rate": 7.189749867199898e-07,
457
+ "loss": 0.8442764282226562,
458
+ "step": 64
459
+ },
460
+ {
461
+ "epoch": 1.2264150943396226,
462
+ "grad_norm": 2.9244734720758285,
463
+ "learning_rate": 7.094495349838092e-07,
464
+ "loss": 0.802047848701477,
465
+ "step": 65
466
+ },
467
+ {
468
+ "epoch": 1.2452830188679245,
469
+ "grad_norm": 2.997891576167027,
470
+ "learning_rate": 6.998309744925411e-07,
471
+ "loss": 0.8562427163124084,
472
+ "step": 66
473
+ },
474
+ {
475
+ "epoch": 1.2641509433962264,
476
+ "grad_norm": 2.7454101056544618,
477
+ "learning_rate": 6.901235810837667e-07,
478
+ "loss": 0.8214827179908752,
479
+ "step": 67
480
+ },
481
+ {
482
+ "epoch": 1.2830188679245282,
483
+ "grad_norm": 2.9952605769764853,
484
+ "learning_rate": 6.803316700848778e-07,
485
+ "loss": 0.7995479702949524,
486
+ "step": 68
487
+ },
488
+ {
489
+ "epoch": 1.3018867924528301,
490
+ "grad_norm": 2.86683247629566,
491
+ "learning_rate": 6.704595943947385e-07,
492
+ "loss": 0.8077808022499084,
493
+ "step": 69
494
+ },
495
+ {
496
+ "epoch": 1.320754716981132,
497
+ "grad_norm": 2.7702979738330322,
498
+ "learning_rate": 6.605117425486481e-07,
499
+ "loss": 0.8417398929595947,
500
+ "step": 70
501
+ },
502
+ {
503
+ "epoch": 1.3396226415094339,
504
+ "grad_norm": 2.725158428984504,
505
+ "learning_rate": 6.504925367674594e-07,
506
+ "loss": 0.8494030833244324,
507
+ "step": 71
508
+ },
509
+ {
510
+ "epoch": 1.3584905660377358,
511
+ "grad_norm": 2.8106277256279255,
512
+ "learning_rate": 6.40406430991723e-07,
513
+ "loss": 0.8620424866676331,
514
+ "step": 72
515
+ },
516
+ {
517
+ "epoch": 1.3773584905660377,
518
+ "grad_norm": 2.818628329932316,
519
+ "learning_rate": 6.302579089017327e-07,
520
+ "loss": 0.8398749232292175,
521
+ "step": 73
522
+ },
523
+ {
524
+ "epoch": 1.3962264150943398,
525
+ "grad_norm": 2.745904001646307,
526
+ "learning_rate": 6.200514819243475e-07,
527
+ "loss": 0.8420323133468628,
528
+ "step": 74
529
+ },
530
+ {
531
+ "epoch": 1.4150943396226414,
532
+ "grad_norm": 2.7850840819985416,
533
+ "learning_rate": 6.097916872274814e-07,
534
+ "loss": 0.8359158635139465,
535
+ "step": 75
536
+ },
537
+ {
538
+ "epoch": 1.4339622641509435,
539
+ "grad_norm": 2.793048578545994,
540
+ "learning_rate": 5.994830857031499e-07,
541
+ "loss": 0.8336814641952515,
542
+ "step": 76
543
+ },
544
+ {
545
+ "epoch": 1.4528301886792452,
546
+ "grad_norm": 2.8505241824701826,
547
+ "learning_rate": 5.891302599399684e-07,
548
+ "loss": 0.7930982112884521,
549
+ "step": 77
550
+ },
551
+ {
552
+ "epoch": 1.4716981132075473,
553
+ "grad_norm": 2.6769256052426615,
554
+ "learning_rate": 5.78737812186009e-07,
555
+ "loss": 0.8192281723022461,
556
+ "step": 78
557
+ },
558
+ {
559
+ "epoch": 1.490566037735849,
560
+ "grad_norm": 2.7762595596745916,
561
+ "learning_rate": 5.683103623029134e-07,
562
+ "loss": 0.8389377593994141,
563
+ "step": 79
564
+ },
565
+ {
566
+ "epoch": 1.509433962264151,
567
+ "grad_norm": 2.8899154085340166,
568
+ "learning_rate": 5.578525457121806e-07,
569
+ "loss": 0.8256187438964844,
570
+ "step": 80
571
+ },
572
+ {
573
+ "epoch": 1.5283018867924527,
574
+ "grad_norm": 2.7720983651750917,
575
+ "learning_rate": 5.473690113345342e-07,
576
+ "loss": 0.8473238945007324,
577
+ "step": 81
578
+ },
579
+ {
580
+ "epoch": 1.5471698113207548,
581
+ "grad_norm": 2.8065774463241495,
582
+ "learning_rate": 5.368644195232895e-07,
583
+ "loss": 0.8165145516395569,
584
+ "step": 82
585
+ },
586
+ {
587
+ "epoch": 1.5660377358490565,
588
+ "grad_norm": 2.9614754969968016,
589
+ "learning_rate": 5.263434399926398e-07,
590
+ "loss": 0.8529609441757202,
591
+ "step": 83
592
+ },
593
+ {
594
+ "epoch": 1.5849056603773586,
595
+ "grad_norm": 2.90447128441676,
596
+ "learning_rate": 5.158107497417794e-07,
597
+ "loss": 0.8249980211257935,
598
+ "step": 84
599
+ },
600
+ {
601
+ "epoch": 1.6037735849056602,
602
+ "grad_norm": 2.7563670691746767,
603
+ "learning_rate": 5.052710309757898e-07,
604
+ "loss": 0.7900608777999878,
605
+ "step": 85
606
+ },
607
+ {
608
+ "epoch": 1.6226415094339623,
609
+ "grad_norm": 2.781624786647774,
610
+ "learning_rate": 4.947289690242102e-07,
611
+ "loss": 0.7917711734771729,
612
+ "step": 86
613
+ },
614
+ {
615
+ "epoch": 1.641509433962264,
616
+ "grad_norm": 2.8227831992064165,
617
+ "learning_rate": 4.841892502582205e-07,
618
+ "loss": 0.8228881359100342,
619
+ "step": 87
620
+ },
621
+ {
622
+ "epoch": 1.6603773584905661,
623
+ "grad_norm": 3.0626612203128687,
624
+ "learning_rate": 4.736565600073602e-07,
625
+ "loss": 0.8176588416099548,
626
+ "step": 88
627
+ },
628
+ {
629
+ "epoch": 1.6792452830188678,
630
+ "grad_norm": 2.7691999193756316,
631
+ "learning_rate": 4.6313558047671047e-07,
632
+ "loss": 0.8315557837486267,
633
+ "step": 89
634
+ },
635
+ {
636
+ "epoch": 1.6981132075471699,
637
+ "grad_norm": 2.9603416787137276,
638
+ "learning_rate": 4.5263098866546586e-07,
639
+ "loss": 0.8079712390899658,
640
+ "step": 90
641
+ },
642
+ {
643
+ "epoch": 1.7169811320754715,
644
+ "grad_norm": 2.7648310195075023,
645
+ "learning_rate": 4.421474542878194e-07,
646
+ "loss": 0.7854694128036499,
647
+ "step": 91
648
+ },
649
+ {
650
+ "epoch": 1.7358490566037736,
651
+ "grad_norm": 2.9565749840190736,
652
+ "learning_rate": 4.316896376970866e-07,
653
+ "loss": 0.8382487297058105,
654
+ "step": 92
655
+ },
656
+ {
657
+ "epoch": 1.7547169811320755,
658
+ "grad_norm": 2.904524931485949,
659
+ "learning_rate": 4.2126218781399114e-07,
660
+ "loss": 0.8337287902832031,
661
+ "step": 93
662
+ },
663
+ {
664
+ "epoch": 1.7735849056603774,
665
+ "grad_norm": 2.9419686201700794,
666
+ "learning_rate": 4.1086974006003154e-07,
667
+ "loss": 0.8450314402580261,
668
+ "step": 94
669
+ },
670
+ {
671
+ "epoch": 1.7924528301886793,
672
+ "grad_norm": 2.738066358519684,
673
+ "learning_rate": 4.0051691429685023e-07,
674
+ "loss": 0.7846765518188477,
675
+ "step": 95
676
+ },
677
+ {
678
+ "epoch": 1.8113207547169812,
679
+ "grad_norm": 2.7276079074380895,
680
+ "learning_rate": 3.902083127725186e-07,
681
+ "loss": 0.814504861831665,
682
+ "step": 96
683
+ },
684
+ {
685
+ "epoch": 1.830188679245283,
686
+ "grad_norm": 2.8093937971147835,
687
+ "learning_rate": 3.799485180756525e-07,
688
+ "loss": 0.8011671304702759,
689
+ "step": 97
690
+ },
691
+ {
692
+ "epoch": 1.849056603773585,
693
+ "grad_norm": 2.842796846086812,
694
+ "learning_rate": 3.697420910982672e-07,
695
+ "loss": 0.8165295124053955,
696
+ "step": 98
697
+ },
698
+ {
699
+ "epoch": 1.8679245283018868,
700
+ "grad_norm": 2.8189503982268977,
701
+ "learning_rate": 3.5959356900827687e-07,
702
+ "loss": 0.8199301958084106,
703
+ "step": 99
704
+ },
705
+ {
706
+ "epoch": 1.8867924528301887,
707
+ "grad_norm": 2.910644604198592,
708
+ "learning_rate": 3.4950746323254063e-07,
709
+ "loss": 0.8019869327545166,
710
+ "step": 100
711
+ },
712
+ {
713
+ "epoch": 1.9056603773584906,
714
+ "grad_norm": 2.863904675767849,
715
+ "learning_rate": 3.394882574513519e-07,
716
+ "loss": 0.8060827255249023,
717
+ "step": 101
718
+ },
719
+ {
720
+ "epoch": 1.9245283018867925,
721
+ "grad_norm": 2.8904123754351723,
722
+ "learning_rate": 3.295404056052616e-07,
723
+ "loss": 0.8078351020812988,
724
+ "step": 102
725
+ },
726
+ {
727
+ "epoch": 1.9433962264150944,
728
+ "grad_norm": 2.8850916542883778,
729
+ "learning_rate": 3.1966832991512225e-07,
730
+ "loss": 0.8068495988845825,
731
+ "step": 103
732
+ },
733
+ {
734
+ "epoch": 1.9622641509433962,
735
+ "grad_norm": 2.9528533111592865,
736
+ "learning_rate": 3.0987641891623315e-07,
737
+ "loss": 0.8184278011322021,
738
+ "step": 104
739
+ },
740
+ {
741
+ "epoch": 1.9811320754716981,
742
+ "grad_norm": 2.869159446180868,
743
+ "learning_rate": 3.0016902550745895e-07,
744
+ "loss": 0.8299746513366699,
745
+ "step": 105
746
+ },
747
+ {
748
+ "epoch": 2.0,
749
+ "grad_norm": 2.778568933671074,
750
+ "learning_rate": 2.9055046501619083e-07,
751
+ "loss": 0.785747766494751,
752
+ "step": 106
753
+ },
754
+ {
755
+ "epoch": 2.018867924528302,
756
+ "grad_norm": 2.9408610818195062,
757
+ "learning_rate": 2.810250132800103e-07,
758
+ "loss": 0.7670397758483887,
759
+ "step": 107
760
+ },
761
+ {
762
+ "epoch": 2.0377358490566038,
763
+ "grad_norm": 2.6257935800346694,
764
+ "learning_rate": 2.715969047459066e-07,
765
+ "loss": 0.7878092527389526,
766
+ "step": 108
767
+ },
768
+ {
769
+ "epoch": 2.056603773584906,
770
+ "grad_norm": 3.058449053263793,
771
+ "learning_rate": 2.6227033058789403e-07,
772
+ "loss": 0.7904379367828369,
773
+ "step": 109
774
+ },
775
+ {
776
+ "epoch": 2.0754716981132075,
777
+ "grad_norm": 2.88973427193669,
778
+ "learning_rate": 2.5304943684386825e-07,
779
+ "loss": 0.8011707067489624,
780
+ "step": 110
781
+ },
782
+ {
783
+ "epoch": 2.0943396226415096,
784
+ "grad_norm": 2.723021754211135,
785
+ "learning_rate": 2.439383225725225e-07,
786
+ "loss": 0.7658779621124268,
787
+ "step": 111
788
+ },
789
+ {
790
+ "epoch": 2.1132075471698113,
791
+ "grad_norm": 2.787460559434829,
792
+ "learning_rate": 2.3494103803115318e-07,
793
+ "loss": 0.7720337510108948,
794
+ "step": 112
795
+ },
796
+ {
797
+ "epoch": 2.1320754716981134,
798
+ "grad_norm": 2.7422069166294802,
799
+ "learning_rate": 2.2606158287515658e-07,
800
+ "loss": 0.7842212915420532,
801
+ "step": 113
802
+ },
803
+ {
804
+ "epoch": 2.150943396226415,
805
+ "grad_norm": 3.381034950183202,
806
+ "learning_rate": 2.1730390438002056e-07,
807
+ "loss": 0.7690730094909668,
808
+ "step": 114
809
+ },
810
+ {
811
+ "epoch": 2.169811320754717,
812
+ "grad_norm": 2.7764924352985663,
813
+ "learning_rate": 2.0867189568660236e-07,
814
+ "loss": 0.7737655639648438,
815
+ "step": 115
816
+ },
817
+ {
818
+ "epoch": 2.188679245283019,
819
+ "grad_norm": 2.8245587551592264,
820
+ "learning_rate": 2.0016939407046986e-07,
821
+ "loss": 0.7852470278739929,
822
+ "step": 116
823
+ },
824
+ {
825
+ "epoch": 2.207547169811321,
826
+ "grad_norm": 3.429004827616326,
827
+ "learning_rate": 1.9180017923607883e-07,
828
+ "loss": 0.7893455624580383,
829
+ "step": 117
830
+ },
831
+ {
832
+ "epoch": 2.2264150943396226,
833
+ "grad_norm": 3.1969648790899408,
834
+ "learning_rate": 1.835679716365417e-07,
835
+ "loss": 0.7634609937667847,
836
+ "step": 118
837
+ },
838
+ {
839
+ "epoch": 2.2452830188679247,
840
+ "grad_norm": 2.70318214433158,
841
+ "learning_rate": 1.7547643081973578e-07,
842
+ "loss": 0.7859703898429871,
843
+ "step": 119
844
+ },
845
+ {
846
+ "epoch": 2.2641509433962264,
847
+ "grad_norm": 2.961996890522788,
848
+ "learning_rate": 1.6752915380148768e-07,
849
+ "loss": 0.7709099650382996,
850
+ "step": 120
851
+ },
852
+ {
853
+ "epoch": 2.2830188679245285,
854
+ "grad_norm": 2.8177889556978095,
855
+ "learning_rate": 1.5972967346655448e-07,
856
+ "loss": 0.7789061069488525,
857
+ "step": 121
858
+ },
859
+ {
860
+ "epoch": 2.30188679245283,
861
+ "grad_norm": 3.320024417308839,
862
+ "learning_rate": 1.5208145699811415e-07,
863
+ "loss": 0.7862054705619812,
864
+ "step": 122
865
+ },
866
+ {
867
+ "epoch": 2.3207547169811322,
868
+ "grad_norm": 2.8631784669698415,
869
+ "learning_rate": 1.4458790433646263e-07,
870
+ "loss": 0.7816888689994812,
871
+ "step": 123
872
+ },
873
+ {
874
+ "epoch": 2.339622641509434,
875
+ "grad_norm": 2.902161614336072,
876
+ "learning_rate": 1.3725234666760427e-07,
877
+ "loss": 0.7391059398651123,
878
+ "step": 124
879
+ },
880
+ {
881
+ "epoch": 2.358490566037736,
882
+ "grad_norm": 2.882470659827849,
883
+ "learning_rate": 1.3007804494240476e-07,
884
+ "loss": 0.7627633810043335,
885
+ "step": 125
886
+ },
887
+ {
888
+ "epoch": 2.3773584905660377,
889
+ "grad_norm": 2.8433427591245284,
890
+ "learning_rate": 1.2306818842696715e-07,
891
+ "loss": 0.7769066095352173,
892
+ "step": 126
893
+ },
894
+ {
895
+ "epoch": 2.3962264150943398,
896
+ "grad_norm": 2.8617729260756573,
897
+ "learning_rate": 1.1622589328487503e-07,
898
+ "loss": 0.7934216856956482,
899
+ "step": 127
900
+ },
901
+ {
902
+ "epoch": 2.4150943396226414,
903
+ "grad_norm": 2.8509595069990823,
904
+ "learning_rate": 1.0955420119193198e-07,
905
+ "loss": 0.7673547863960266,
906
+ "step": 128
907
+ },
908
+ {
909
+ "epoch": 2.4339622641509435,
910
+ "grad_norm": 2.874293982355328,
911
+ "learning_rate": 1.03056077984014e-07,
912
+ "loss": 0.7849991917610168,
913
+ "step": 129
914
+ },
915
+ {
916
+ "epoch": 2.452830188679245,
917
+ "grad_norm": 3.0937215388279,
918
+ "learning_rate": 9.673441233863661e-08,
919
+ "loss": 0.7473263740539551,
920
+ "step": 130
921
+ },
922
+ {
923
+ "epoch": 2.4716981132075473,
924
+ "grad_norm": 2.9292035796935054,
925
+ "learning_rate": 9.059201449082043e-08,
926
+ "loss": 0.784021258354187,
927
+ "step": 131
928
+ },
929
+ {
930
+ "epoch": 2.490566037735849,
931
+ "grad_norm": 2.810444173384006,
932
+ "learning_rate": 8.463161498382949e-08,
933
+ "loss": 0.7882828712463379,
934
+ "step": 132
935
+ },
936
+ {
937
+ "epoch": 2.509433962264151,
938
+ "grad_norm": 2.829313317652292,
939
+ "learning_rate": 7.885586345533396e-08,
940
+ "loss": 0.7572199702262878,
941
+ "step": 133
942
+ },
943
+ {
944
+ "epoch": 2.5283018867924527,
945
+ "grad_norm": 2.6656369607187567,
946
+ "learning_rate": 7.326732745954e-08,
947
+ "loss": 0.7826784253120422,
948
+ "step": 134
949
+ },
950
+ {
951
+ "epoch": 2.547169811320755,
952
+ "grad_norm": 2.7036355808226173,
953
+ "learning_rate": 6.786849132580841e-08,
954
+ "loss": 0.7726486325263977,
955
+ "step": 135
956
+ },
957
+ {
958
+ "epoch": 2.5660377358490565,
959
+ "grad_norm": 2.805033772692598,
960
+ "learning_rate": 6.266175505426957e-08,
961
+ "loss": 0.7736940383911133,
962
+ "step": 136
963
+ },
964
+ {
965
+ "epoch": 2.5849056603773586,
966
+ "grad_norm": 2.8181269221147396,
967
+ "learning_rate": 5.7649433248927794e-08,
968
+ "loss": 0.7888213396072388,
969
+ "step": 137
970
+ },
971
+ {
972
+ "epoch": 2.6037735849056602,
973
+ "grad_norm": 2.9760303324315256,
974
+ "learning_rate": 5.283375408872537e-08,
975
+ "loss": 0.7611340284347534,
976
+ "step": 138
977
+ },
978
+ {
979
+ "epoch": 2.6226415094339623,
980
+ "grad_norm": 2.828152013200315,
981
+ "learning_rate": 4.821685833702849e-08,
982
+ "loss": 0.779454231262207,
983
+ "step": 139
984
+ },
985
+ {
986
+ "epoch": 2.641509433962264,
987
+ "grad_norm": 2.8581322420761786,
988
+ "learning_rate": 4.3800798389970863e-08,
989
+ "loss": 0.769560694694519,
990
+ "step": 140
991
+ },
992
+ {
993
+ "epoch": 2.660377358490566,
994
+ "grad_norm": 2.8125888801619103,
995
+ "learning_rate": 3.958753736408105e-08,
996
+ "loss": 0.7890896797180176,
997
+ "step": 141
998
+ },
999
+ {
1000
+ "epoch": 2.6792452830188678,
1001
+ "grad_norm": 2.757727954638762,
1002
+ "learning_rate": 3.557894822359864e-08,
1003
+ "loss": 0.7476776838302612,
1004
+ "step": 142
1005
+ },
1006
+ {
1007
+ "epoch": 2.69811320754717,
1008
+ "grad_norm": 2.802525331124496,
1009
+ "learning_rate": 3.1776812947865384e-08,
1010
+ "loss": 0.7551087737083435,
1011
+ "step": 143
1012
+ },
1013
+ {
1014
+ "epoch": 2.7169811320754715,
1015
+ "grad_norm": 3.172109709327269,
1016
+ "learning_rate": 2.818282173916453e-08,
1017
+ "loss": 0.7675119638442993,
1018
+ "step": 144
1019
+ },
1020
+ {
1021
+ "epoch": 2.7358490566037736,
1022
+ "grad_norm": 2.836017838014085,
1023
+ "learning_rate": 2.4798572271356843e-08,
1024
+ "loss": 0.7670686841011047,
1025
+ "step": 145
1026
+ },
1027
+ {
1028
+ "epoch": 2.7547169811320753,
1029
+ "grad_norm": 2.9198667506437905,
1030
+ "learning_rate": 2.162556897965101e-08,
1031
+ "loss": 0.7993500828742981,
1032
+ "step": 146
1033
+ },
1034
+ {
1035
+ "epoch": 2.7735849056603774,
1036
+ "grad_norm": 2.795471164301072,
1037
+ "learning_rate": 1.8665222391821166e-08,
1038
+ "loss": 0.7754116654396057,
1039
+ "step": 147
1040
+ },
1041
+ {
1042
+ "epoch": 2.7924528301886795,
1043
+ "grad_norm": 2.7725526525432787,
1044
+ "learning_rate": 1.5918848501170644e-08,
1045
+ "loss": 0.7710179090499878,
1046
+ "step": 148
1047
+ },
1048
+ {
1049
+ "epoch": 2.811320754716981,
1050
+ "grad_norm": 2.784214561225124,
1051
+ "learning_rate": 1.3387668181519818e-08,
1052
+ "loss": 0.7384580969810486,
1053
+ "step": 149
1054
+ },
1055
+ {
1056
+ "epoch": 2.830188679245283,
1057
+ "grad_norm": 2.8847249743481833,
1058
+ "learning_rate": 1.1072806644478738e-08,
1059
+ "loss": 0.7740883827209473,
1060
+ "step": 150
1061
+ },
1062
+ {
1063
+ "epoch": 2.849056603773585,
1064
+ "grad_norm": 2.8315645307075945,
1065
+ "learning_rate": 8.975292939244927e-09,
1066
+ "loss": 0.7919697165489197,
1067
+ "step": 151
1068
+ },
1069
+ {
1070
+ "epoch": 2.867924528301887,
1071
+ "grad_norm": 2.9085892225722034,
1072
+ "learning_rate": 7.096059495149853e-09,
1073
+ "loss": 0.781722903251648,
1074
+ "step": 152
1075
+ },
1076
+ {
1077
+ "epoch": 2.8867924528301887,
1078
+ "grad_norm": 2.7506543384708224,
1079
+ "learning_rate": 5.435941707156388e-09,
1080
+ "loss": 0.7471998929977417,
1081
+ "step": 153
1082
+ },
1083
+ {
1084
+ "epoch": 2.9056603773584904,
1085
+ "grad_norm": 2.8426972222396136,
1086
+ "learning_rate": 3.995677564492039e-09,
1087
+ "loss": 0.7751771807670593,
1088
+ "step": 154
1089
+ },
1090
+ {
1091
+ "epoch": 2.9245283018867925,
1092
+ "grad_norm": 2.844363880881091,
1093
+ "learning_rate": 2.7759073225832597e-09,
1094
+ "loss": 0.7668254375457764,
1095
+ "step": 155
1096
+ },
1097
+ {
1098
+ "epoch": 2.9433962264150946,
1099
+ "grad_norm": 3.278094344932399,
1100
+ "learning_rate": 1.7771732184357901e-09,
1101
+ "loss": 0.7961957454681396,
1102
+ "step": 156
1103
+ },
1104
+ {
1105
+ "epoch": 2.9622641509433962,
1106
+ "grad_norm": 2.9897635623753955,
1107
+ "learning_rate": 9.999192295886971e-10,
1108
+ "loss": 0.7848834991455078,
1109
+ "step": 157
1110
+ },
1111
+ {
1112
+ "epoch": 2.981132075471698,
1113
+ "grad_norm": 2.748244107712091,
1114
+ "learning_rate": 4.4449087674847117e-10,
1115
+ "loss": 0.777495801448822,
1116
+ "step": 158
1117
+ },
1118
+ {
1119
+ "epoch": 3.0,
1120
+ "grad_norm": 2.9554977361208974,
1121
+ "learning_rate": 1.1113507019094858e-10,
1122
+ "loss": 0.7618961334228516,
1123
+ "step": 159
1124
+ },
1125
+ {
1126
+ "epoch": 3.0,
1127
+ "step": 159,
1128
+ "total_flos": 23335512768512.0,
1129
+ "train_loss": 0.8809327138294963,
1130
+ "train_runtime": 1440.1859,
1131
+ "train_samples_per_second": 3.485,
1132
+ "train_steps_per_second": 0.11
1133
+ }
1134
+ ],
1135
+ "logging_steps": 1.0,
1136
+ "max_steps": 159,
1137
+ "num_input_tokens_seen": 0,
1138
+ "num_train_epochs": 3,
1139
+ "save_steps": 999999,
1140
+ "stateful_callbacks": {
1141
+ "TrainerControl": {
1142
+ "args": {
1143
+ "should_epoch_stop": false,
1144
+ "should_evaluate": false,
1145
+ "should_log": false,
1146
+ "should_save": true,
1147
+ "should_training_stop": true
1148
+ },
1149
+ "attributes": {}
1150
+ }
1151
+ },
1152
+ "total_flos": 23335512768512.0,
1153
+ "train_batch_size": 4,
1154
+ "trial_name": null,
1155
+ "trial_params": null
1156
+ }
training_loss.png ADDED