umer07 commited on
Commit
f29eecf
·
verified ·
1 Parent(s): 29aeae4

Fathom: upload expert-e1-static/training_log.json

Browse files
adapters/expert-e1-static/training_log.json ADDED
@@ -0,0 +1,1141 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "loss": 1.2771,
4
+ "grad_norm": 0.48179489374160767,
5
+ "learning_rate": 9e-06,
6
+ "entropy": 1.0744440883398056,
7
+ "num_tokens": 384772.0,
8
+ "mean_token_accuracy": 0.7264965653419495,
9
+ "epoch": 0.008849557522123894,
10
+ "step": 10
11
+ },
12
+ {
13
+ "loss": 1.2083,
14
+ "grad_norm": 0.22539448738098145,
15
+ "learning_rate": 1.9e-05,
16
+ "entropy": 1.0771890074014663,
17
+ "num_tokens": 794683.0,
18
+ "mean_token_accuracy": 0.7380238056182862,
19
+ "epoch": 0.017699115044247787,
20
+ "step": 20
21
+ },
22
+ {
23
+ "loss": 1.1201,
24
+ "grad_norm": 0.22262926399707794,
25
+ "learning_rate": 2.9e-05,
26
+ "entropy": 1.1027102798223496,
27
+ "num_tokens": 1216610.0,
28
+ "mean_token_accuracy": 0.7497482478618622,
29
+ "epoch": 0.02654867256637168,
30
+ "step": 30
31
+ },
32
+ {
33
+ "loss": 1.0997,
34
+ "grad_norm": 0.2455902248620987,
35
+ "learning_rate": 3.9000000000000006e-05,
36
+ "entropy": 1.13816859126091,
37
+ "num_tokens": 1636160.0,
38
+ "mean_token_accuracy": 0.751062735915184,
39
+ "epoch": 0.035398230088495575,
40
+ "step": 40
41
+ },
42
+ {
43
+ "loss": 0.9732,
44
+ "grad_norm": 0.25331178307533264,
45
+ "learning_rate": 4.9e-05,
46
+ "entropy": 0.967724832892418,
47
+ "num_tokens": 2047660.0,
48
+ "mean_token_accuracy": 0.7753094494342804,
49
+ "epoch": 0.04424778761061947,
50
+ "step": 50
51
+ },
52
+ {
53
+ "loss": 0.8376,
54
+ "grad_norm": 0.2812139391899109,
55
+ "learning_rate": 5.9e-05,
56
+ "entropy": 0.8342341184616089,
57
+ "num_tokens": 2442064.0,
58
+ "mean_token_accuracy": 0.8068537533283233,
59
+ "epoch": 0.05309734513274336,
60
+ "step": 60
61
+ },
62
+ {
63
+ "loss": 0.7207,
64
+ "grad_norm": 0.5564813613891602,
65
+ "learning_rate": 6.9e-05,
66
+ "entropy": 0.7252395421266555,
67
+ "num_tokens": 2867532.0,
68
+ "mean_token_accuracy": 0.832821500301361,
69
+ "epoch": 0.061946902654867256,
70
+ "step": 70
71
+ },
72
+ {
73
+ "loss": 0.6873,
74
+ "grad_norm": 0.28080248832702637,
75
+ "learning_rate": 7.900000000000001e-05,
76
+ "entropy": 0.6883613392710686,
77
+ "num_tokens": 3271081.0,
78
+ "mean_token_accuracy": 0.8418284684419632,
79
+ "epoch": 0.07079646017699115,
80
+ "step": 80
81
+ },
82
+ {
83
+ "loss": 0.6869,
84
+ "grad_norm": 0.37117382884025574,
85
+ "learning_rate": 8.900000000000001e-05,
86
+ "entropy": 0.6866619646549225,
87
+ "num_tokens": 3676002.0,
88
+ "mean_token_accuracy": 0.8389175772666931,
89
+ "epoch": 0.07964601769911504,
90
+ "step": 90
91
+ },
92
+ {
93
+ "loss": 0.641,
94
+ "grad_norm": 0.34726274013519287,
95
+ "learning_rate": 9.900000000000001e-05,
96
+ "entropy": 0.6441589877009392,
97
+ "num_tokens": 4109097.0,
98
+ "mean_token_accuracy": 0.849273630976677,
99
+ "epoch": 0.08849557522123894,
100
+ "step": 100
101
+ },
102
+ {
103
+ "loss": 0.541,
104
+ "grad_norm": 0.3802001476287842,
105
+ "learning_rate": 9.998116250927091e-05,
106
+ "entropy": 0.54506506472826,
107
+ "num_tokens": 4518063.0,
108
+ "mean_token_accuracy": 0.8707349866628646,
109
+ "epoch": 0.09734513274336283,
110
+ "step": 110
111
+ },
112
+ {
113
+ "loss": 0.5358,
114
+ "grad_norm": 0.33883532881736755,
115
+ "learning_rate": 9.991606348016586e-05,
116
+ "entropy": 0.5439137145876884,
117
+ "num_tokens": 4942538.0,
118
+ "mean_token_accuracy": 0.8710391223430634,
119
+ "epoch": 0.10619469026548672,
120
+ "step": 120
121
+ },
122
+ {
123
+ "loss": 0.513,
124
+ "grad_norm": 0.2743840515613556,
125
+ "learning_rate": 9.980453089389592e-05,
126
+ "entropy": 0.530657310783863,
127
+ "num_tokens": 5366333.0,
128
+ "mean_token_accuracy": 0.8756012558937073,
129
+ "epoch": 0.11504424778761062,
130
+ "step": 130
131
+ },
132
+ {
133
+ "loss": 0.499,
134
+ "grad_norm": 0.3075107932090759,
135
+ "learning_rate": 9.964666850172589e-05,
136
+ "entropy": 0.5022920548915863,
137
+ "num_tokens": 5770091.0,
138
+ "mean_token_accuracy": 0.8816363781690597,
139
+ "epoch": 0.12389380530973451,
140
+ "step": 140
141
+ },
142
+ {
143
+ "loss": 0.4638,
144
+ "grad_norm": 0.3095848560333252,
145
+ "learning_rate": 9.944262315242346e-05,
146
+ "entropy": 0.47086685746908186,
147
+ "num_tokens": 6186047.0,
148
+ "mean_token_accuracy": 0.8876948118209839,
149
+ "epoch": 0.13274336283185842,
150
+ "step": 150
151
+ },
152
+ {
153
+ "loss": 0.4325,
154
+ "grad_norm": 0.28685304522514343,
155
+ "learning_rate": 9.919258465565575e-05,
156
+ "entropy": 0.43743315935134885,
157
+ "num_tokens": 6585548.0,
158
+ "mean_token_accuracy": 0.8952628016471863,
159
+ "epoch": 0.1415929203539823,
160
+ "step": 160
161
+ },
162
+ {
163
+ "loss": 0.4701,
164
+ "grad_norm": 0.324359267950058,
165
+ "learning_rate": 9.889678560542202e-05,
166
+ "entropy": 0.4746619015932083,
167
+ "num_tokens": 7013633.0,
168
+ "mean_token_accuracy": 0.8870637893676758,
169
+ "epoch": 0.1504424778761062,
170
+ "step": 170
171
+ },
172
+ {
173
+ "loss": 0.4442,
174
+ "grad_norm": 0.31656569242477417,
175
+ "learning_rate": 9.855550116368716e-05,
176
+ "entropy": 0.4526193618774414,
177
+ "num_tokens": 7419505.0,
178
+ "mean_token_accuracy": 0.8906016558408737,
179
+ "epoch": 0.1592920353982301,
180
+ "step": 180
181
+ },
182
+ {
183
+ "loss": 0.412,
184
+ "grad_norm": 0.30724215507507324,
185
+ "learning_rate": 9.816904880441713e-05,
186
+ "entropy": 0.4151700422167778,
187
+ "num_tokens": 7818357.0,
188
+ "mean_token_accuracy": 0.8987403184175491,
189
+ "epoch": 0.168141592920354,
190
+ "step": 190
191
+ },
192
+ {
193
+ "loss": 0.4121,
194
+ "grad_norm": 0.49001774191856384,
195
+ "learning_rate": 9.773778801825405e-05,
196
+ "entropy": 0.4183415174484253,
197
+ "num_tokens": 8275452.0,
198
+ "mean_token_accuracy": 0.898220694065094,
199
+ "epoch": 0.17699115044247787,
200
+ "step": 200
201
+ },
202
+ {
203
+ "loss": 0.4359,
204
+ "grad_norm": 0.5563516616821289,
205
+ "learning_rate": 9.726211997810646e-05,
206
+ "entropy": 0.44024592265486717,
207
+ "num_tokens": 8689265.0,
208
+ "mean_token_accuracy": 0.8926420778036117,
209
+ "epoch": 0.18584070796460178,
210
+ "step": 210
211
+ },
212
+ {
213
+ "loss": 0.4265,
214
+ "grad_norm": 0.4248098134994507,
215
+ "learning_rate": 9.674248716596497e-05,
216
+ "entropy": 0.43353245332837104,
217
+ "num_tokens": 9108244.0,
218
+ "mean_token_accuracy": 0.8954088747501373,
219
+ "epoch": 0.19469026548672566,
220
+ "step": 220
221
+ },
222
+ {
223
+ "loss": 0.4113,
224
+ "grad_norm": 0.5417165160179138,
225
+ "learning_rate": 9.617937296129115e-05,
226
+ "entropy": 0.4203181877732277,
227
+ "num_tokens": 9508482.0,
228
+ "mean_token_accuracy": 0.8986817359924316,
229
+ "epoch": 0.20353982300884957,
230
+ "step": 230
231
+ },
232
+ {
233
+ "loss": 0.4041,
234
+ "grad_norm": 0.37166956067085266,
235
+ "learning_rate": 9.557330119136203e-05,
236
+ "entropy": 0.4078876577317715,
237
+ "num_tokens": 9929607.0,
238
+ "mean_token_accuracy": 0.9026990383863449,
239
+ "epoch": 0.21238938053097345,
240
+ "step": 240
241
+ },
242
+ {
243
+ "loss": 0.3372,
244
+ "grad_norm": 0.37398409843444824,
245
+ "learning_rate": 9.492483564398883e-05,
246
+ "entropy": 0.3507886216044426,
247
+ "num_tokens": 10352037.0,
248
+ "mean_token_accuracy": 0.9149532109498978,
249
+ "epoch": 0.22123893805309736,
250
+ "step": 250
251
+ },
252
+ {
253
+ "loss": 0.3754,
254
+ "grad_norm": 0.4071371853351593,
255
+ "learning_rate": 9.423457954306312e-05,
256
+ "entropy": 0.3833847470581532,
257
+ "num_tokens": 10751656.0,
258
+ "mean_token_accuracy": 0.9078819990158081,
259
+ "epoch": 0.23008849557522124,
260
+ "step": 260
261
+ },
262
+ {
263
+ "loss": 0.3092,
264
+ "grad_norm": 0.3898226022720337,
265
+ "learning_rate": 9.350317498741811e-05,
266
+ "entropy": 0.31048981472849846,
267
+ "num_tokens": 11177021.0,
268
+ "mean_token_accuracy": 0.9235329061746598,
269
+ "epoch": 0.23893805309734514,
270
+ "step": 270
271
+ },
272
+ {
273
+ "loss": 0.2992,
274
+ "grad_norm": 0.4743488132953644,
275
+ "learning_rate": 9.273130235352743e-05,
276
+ "entropy": 0.3064242236316204,
277
+ "num_tokens": 11589241.0,
278
+ "mean_token_accuracy": 0.926247027516365,
279
+ "epoch": 0.24778761061946902,
280
+ "step": 280
281
+ },
282
+ {
283
+ "loss": 0.3432,
284
+ "grad_norm": 0.3992835283279419,
285
+ "learning_rate": 9.191967966259645e-05,
286
+ "entropy": 0.35372441038489344,
287
+ "num_tokens": 11997925.0,
288
+ "mean_token_accuracy": 0.914875665307045,
289
+ "epoch": 0.25663716814159293,
290
+ "step": 290
291
+ },
292
+ {
293
+ "loss": 0.311,
294
+ "grad_norm": 0.4893634021282196,
295
+ "learning_rate": 9.10690619126356e-05,
296
+ "entropy": 0.31616691052913665,
297
+ "num_tokens": 12415929.0,
298
+ "mean_token_accuracy": 0.9236188173294068,
299
+ "epoch": 0.26548672566371684,
300
+ "step": 300
301
+ },
302
+ {
303
+ "loss": 0.3358,
304
+ "grad_norm": 0.38468873500823975,
305
+ "learning_rate": 9.018024037613646e-05,
306
+ "entropy": 0.3419831946492195,
307
+ "num_tokens": 12828979.0,
308
+ "mean_token_accuracy": 0.9184246301651001,
309
+ "epoch": 0.2743362831858407,
310
+ "step": 310
311
+ },
312
+ {
313
+ "loss": 0.3176,
314
+ "grad_norm": 0.4959142208099365,
315
+ "learning_rate": 8.925404186400408e-05,
316
+ "entropy": 0.3222149942070246,
317
+ "num_tokens": 13222249.0,
318
+ "mean_token_accuracy": 0.921812680363655,
319
+ "epoch": 0.2831858407079646,
320
+ "step": 320
321
+ },
322
+ {
323
+ "loss": 0.2962,
324
+ "grad_norm": 0.33697429299354553,
325
+ "learning_rate": 8.829132795643051e-05,
326
+ "entropy": 0.30270505994558333,
327
+ "num_tokens": 13653508.0,
328
+ "mean_token_accuracy": 0.9271777629852295,
329
+ "epoch": 0.2920353982300885,
330
+ "step": 330
331
+ },
332
+ {
333
+ "loss": 0.2721,
334
+ "grad_norm": 0.3220033645629883,
335
+ "learning_rate": 8.729299420142465e-05,
336
+ "entropy": 0.2873320683836937,
337
+ "num_tokens": 14076074.0,
338
+ "mean_token_accuracy": 0.9307485371828079,
339
+ "epoch": 0.3008849557522124,
340
+ "step": 340
341
+ },
342
+ {
343
+ "loss": 0.3197,
344
+ "grad_norm": 0.3136242628097534,
345
+ "learning_rate": 8.625996928174412e-05,
346
+ "entropy": 0.332285375893116,
347
+ "num_tokens": 14496295.0,
348
+ "mean_token_accuracy": 0.9201685935258865,
349
+ "epoch": 0.30973451327433627,
350
+ "step": 350
351
+ },
352
+ {
353
+ "loss": 0.3053,
354
+ "grad_norm": 0.43096667528152466,
355
+ "learning_rate": 8.519321415100414e-05,
356
+ "entropy": 0.310824865847826,
357
+ "num_tokens": 14897719.0,
358
+ "mean_token_accuracy": 0.924813660979271,
359
+ "epoch": 0.3185840707964602,
360
+ "step": 360
361
+ },
362
+ {
363
+ "loss": 0.2486,
364
+ "grad_norm": 0.4815881848335266,
365
+ "learning_rate": 8.409372113976712e-05,
366
+ "entropy": 0.2627465195953846,
367
+ "num_tokens": 15293602.0,
368
+ "mean_token_accuracy": 0.9371735215187073,
369
+ "epoch": 0.3274336283185841,
370
+ "step": 370
371
+ },
372
+ {
373
+ "loss": 0.2657,
374
+ "grad_norm": 0.4008077085018158,
375
+ "learning_rate": 8.296251303244413e-05,
376
+ "entropy": 0.26940090730786326,
377
+ "num_tokens": 15726200.0,
378
+ "mean_token_accuracy": 0.9358527153730393,
379
+ "epoch": 0.336283185840708,
380
+ "step": 380
381
+ },
382
+ {
383
+ "loss": 0.2387,
384
+ "grad_norm": 0.3524218201637268,
385
+ "learning_rate": 8.180064211586738e-05,
386
+ "entropy": 0.24938426464796065,
387
+ "num_tokens": 16150169.0,
388
+ "mean_token_accuracy": 0.9398461669683457,
389
+ "epoch": 0.34513274336283184,
390
+ "step": 390
391
+ },
392
+ {
393
+ "loss": 0.2634,
394
+ "grad_norm": 0.35844066739082336,
395
+ "learning_rate": 8.060918920041856e-05,
396
+ "entropy": 0.2718000315129757,
397
+ "num_tokens": 16558293.0,
398
+ "mean_token_accuracy": 0.9348433256149292,
399
+ "epoch": 0.35398230088495575,
400
+ "step": 400
401
+ },
402
+ {
403
+ "loss": 0.2835,
404
+ "grad_norm": 0.46172916889190674,
405
+ "learning_rate": 7.938926261462366e-05,
406
+ "entropy": 0.2921802319586277,
407
+ "num_tokens": 16997838.0,
408
+ "mean_token_accuracy": 0.9297347247600556,
409
+ "epoch": 0.36283185840707965,
410
+ "step": 410
411
+ },
412
+ {
413
+ "loss": 0.2743,
414
+ "grad_norm": 0.3828360140323639,
415
+ "learning_rate": 7.81419971741494e-05,
416
+ "entropy": 0.2844072911888361,
417
+ "num_tokens": 17414034.0,
418
+ "mean_token_accuracy": 0.9316927522420884,
419
+ "epoch": 0.37168141592920356,
420
+ "step": 420
421
+ },
422
+ {
423
+ "loss": 0.2481,
424
+ "grad_norm": 0.2896506190299988,
425
+ "learning_rate": 7.686855312616055e-05,
426
+ "entropy": 0.2565284200012684,
427
+ "num_tokens": 17851569.0,
428
+ "mean_token_accuracy": 0.9383330553770065,
429
+ "epoch": 0.3805309734513274,
430
+ "step": 430
431
+ },
432
+ {
433
+ "loss": 0.278,
434
+ "grad_norm": 0.5132155418395996,
435
+ "learning_rate": 7.557011507002004e-05,
436
+ "entropy": 0.2847262255847454,
437
+ "num_tokens": 18265230.0,
438
+ "mean_token_accuracy": 0.9326902598142623,
439
+ "epoch": 0.3893805309734513,
440
+ "step": 440
441
+ },
442
+ {
443
+ "loss": 0.2172,
444
+ "grad_norm": 0.36239564418792725,
445
+ "learning_rate": 7.424789085533584e-05,
446
+ "entropy": 0.2237854577600956,
447
+ "num_tokens": 18669967.0,
448
+ "mean_token_accuracy": 0.9465960383415222,
449
+ "epoch": 0.39823008849557523,
450
+ "step": 450
451
+ },
452
+ {
453
+ "loss": 0.2868,
454
+ "grad_norm": 0.263062059879303,
455
+ "learning_rate": 7.290311045837963e-05,
456
+ "entropy": 0.29681268632411956,
457
+ "num_tokens": 19100082.0,
458
+ "mean_token_accuracy": 0.9297707736492157,
459
+ "epoch": 0.40707964601769914,
460
+ "step": 460
461
+ },
462
+ {
463
+ "loss": 0.2535,
464
+ "grad_norm": 0.25322869420051575,
465
+ "learning_rate": 7.153702483792266e-05,
466
+ "entropy": 0.2623436853289604,
467
+ "num_tokens": 19535145.0,
468
+ "mean_token_accuracy": 0.9386782139539719,
469
+ "epoch": 0.415929203539823,
470
+ "step": 470
471
+ },
472
+ {
473
+ "loss": 0.2781,
474
+ "grad_norm": 0.32808470726013184,
475
+ "learning_rate": 7.015090477155288e-05,
476
+ "entropy": 0.2793763652443886,
477
+ "num_tokens": 19946800.0,
478
+ "mean_token_accuracy": 0.9325116276741028,
479
+ "epoch": 0.4247787610619469,
480
+ "step": 480
481
+ },
482
+ {
483
+ "loss": 0.2166,
484
+ "grad_norm": 0.3056963086128235,
485
+ "learning_rate": 6.874603967355603e-05,
486
+ "entropy": 0.2249807395040989,
487
+ "num_tokens": 20374389.0,
488
+ "mean_token_accuracy": 0.9458037465810776,
489
+ "epoch": 0.4336283185840708,
490
+ "step": 490
491
+ },
492
+ {
493
+ "loss": 0.2528,
494
+ "grad_norm": 0.3542698919773102,
495
+ "learning_rate": 6.732373639546025e-05,
496
+ "entropy": 0.2597987335175276,
497
+ "num_tokens": 20792226.0,
498
+ "mean_token_accuracy": 0.9383546471595764,
499
+ "epoch": 0.4424778761061947,
500
+ "step": 500
501
+ },
502
+ {
503
+ "loss": 0.2007,
504
+ "grad_norm": 0.2978123426437378,
505
+ "learning_rate": 6.588531801035993e-05,
506
+ "entropy": 0.2049595110118389,
507
+ "num_tokens": 21206944.0,
508
+ "mean_token_accuracy": 0.9496358513832093,
509
+ "epoch": 0.45132743362831856,
510
+ "step": 510
511
+ },
512
+ {
513
+ "loss": 0.2147,
514
+ "grad_norm": 0.3151644766330719,
515
+ "learning_rate": 6.443212258214983e-05,
516
+ "entropy": 0.221473628282547,
517
+ "num_tokens": 21597252.0,
518
+ "mean_token_accuracy": 0.9471895158290863,
519
+ "epoch": 0.46017699115044247,
520
+ "step": 520
521
+ },
522
+ {
523
+ "loss": 0.2534,
524
+ "grad_norm": 0.28072014451026917,
525
+ "learning_rate": 6.296550192081421e-05,
526
+ "entropy": 0.25930051133036613,
527
+ "num_tokens": 22019525.0,
528
+ "mean_token_accuracy": 0.9380945324897766,
529
+ "epoch": 0.4690265486725664,
530
+ "step": 530
531
+ },
532
+ {
533
+ "loss": 0.2773,
534
+ "grad_norm": 0.28228074312210083,
535
+ "learning_rate": 6.148682032492894e-05,
536
+ "entropy": 0.2860624067485332,
537
+ "num_tokens": 22450805.0,
538
+ "mean_token_accuracy": 0.9322474598884583,
539
+ "epoch": 0.4778761061946903,
540
+ "step": 540
541
+ },
542
+ {
543
+ "loss": 0.2331,
544
+ "grad_norm": 0.22247536480426788,
545
+ "learning_rate": 5.999745331254616e-05,
546
+ "entropy": 0.2394201297312975,
547
+ "num_tokens": 22860630.0,
548
+ "mean_token_accuracy": 0.9421512335538864,
549
+ "epoch": 0.48672566371681414,
550
+ "step": 550
551
+ },
552
+ {
553
+ "loss": 0.2545,
554
+ "grad_norm": 0.36099445819854736,
555
+ "learning_rate": 5.849878634164251e-05,
556
+ "entropy": 0.26039876230061054,
557
+ "num_tokens": 23276454.0,
558
+ "mean_token_accuracy": 0.9390616893768311,
559
+ "epoch": 0.49557522123893805,
560
+ "step": 560
561
+ },
562
+ {
563
+ "loss": 0.2566,
564
+ "grad_norm": 0.24887335300445557,
565
+ "learning_rate": 5.699221352132059e-05,
566
+ "entropy": 0.2648449897766113,
567
+ "num_tokens": 23710582.0,
568
+ "mean_token_accuracy": 0.9375686824321747,
569
+ "epoch": 0.504424778761062,
570
+ "step": 570
571
+ },
572
+ {
573
+ "loss": 0.2481,
574
+ "grad_norm": 0.26001015305519104,
575
+ "learning_rate": 5.547913631496306e-05,
576
+ "entropy": 0.2597955591976643,
577
+ "num_tokens": 24130902.0,
578
+ "mean_token_accuracy": 0.9392797619104385,
579
+ "epoch": 0.5132743362831859,
580
+ "step": 580
581
+ },
582
+ {
583
+ "loss": 0.2491,
584
+ "grad_norm": 0.25495871901512146,
585
+ "learning_rate": 5.396096223654561e-05,
586
+ "entropy": 0.24957055263221264,
587
+ "num_tokens": 24565476.0,
588
+ "mean_token_accuracy": 0.9401059657335281,
589
+ "epoch": 0.5221238938053098,
590
+ "step": 590
591
+ },
592
+ {
593
+ "loss": 0.2479,
594
+ "grad_norm": 0.28261682391166687,
595
+ "learning_rate": 5.2439103541321144e-05,
596
+ "entropy": 0.2580213598906994,
597
+ "num_tokens": 24971253.0,
598
+ "mean_token_accuracy": 0.9390650987625122,
599
+ "epoch": 0.5309734513274337,
600
+ "step": 600
601
+ },
602
+ {
603
+ "loss": 0.2856,
604
+ "grad_norm": 0.2623113691806793,
605
+ "learning_rate": 5.0914975912093854e-05,
606
+ "entropy": 0.2843888055533171,
607
+ "num_tokens": 25375477.0,
608
+ "mean_token_accuracy": 0.9336756229400635,
609
+ "epoch": 0.5398230088495575,
610
+ "step": 610
611
+ },
612
+ {
613
+ "loss": 0.2338,
614
+ "grad_norm": 0.3092866539955139,
615
+ "learning_rate": 4.938999714230467e-05,
616
+ "entropy": 0.23968622721731664,
617
+ "num_tokens": 25767651.0,
618
+ "mean_token_accuracy": 0.9425904780626297,
619
+ "epoch": 0.5486725663716814,
620
+ "step": 620
621
+ },
622
+ {
623
+ "loss": 0.2382,
624
+ "grad_norm": 0.20757952332496643,
625
+ "learning_rate": 4.78655858171533e-05,
626
+ "entropy": 0.24690472409129144,
627
+ "num_tokens": 26197439.0,
628
+ "mean_token_accuracy": 0.9411264300346375,
629
+ "epoch": 0.5575221238938053,
630
+ "step": 630
631
+ },
632
+ {
633
+ "loss": 0.2557,
634
+ "grad_norm": 0.24662645161151886,
635
+ "learning_rate": 4.634315999398393e-05,
636
+ "entropy": 0.2634947098791599,
637
+ "num_tokens": 26612051.0,
638
+ "mean_token_accuracy": 0.9378427803516388,
639
+ "epoch": 0.5663716814159292,
640
+ "step": 640
641
+ },
642
+ {
643
+ "loss": 0.1985,
644
+ "grad_norm": 0.18498662114143372,
645
+ "learning_rate": 4.48241358831617e-05,
646
+ "entropy": 0.20072167329490184,
647
+ "num_tokens": 27024799.0,
648
+ "mean_token_accuracy": 0.9514979749917984,
649
+ "epoch": 0.5752212389380531,
650
+ "step": 650
651
+ },
652
+ {
653
+ "loss": 0.2368,
654
+ "grad_norm": 0.27973467111587524,
655
+ "learning_rate": 4.3309926530667586e-05,
656
+ "entropy": 0.24167309887707233,
657
+ "num_tokens": 27423001.0,
658
+ "mean_token_accuracy": 0.9416618674993515,
659
+ "epoch": 0.584070796460177,
660
+ "step": 660
661
+ },
662
+ {
663
+ "loss": 0.275,
664
+ "grad_norm": 0.25306880474090576,
665
+ "learning_rate": 4.18019405036366e-05,
666
+ "entropy": 0.2826690062880516,
667
+ "num_tokens": 27822869.0,
668
+ "mean_token_accuracy": 0.9324941843748092,
669
+ "epoch": 0.5929203539823009,
670
+ "step": 670
671
+ },
672
+ {
673
+ "loss": 0.2537,
674
+ "grad_norm": 0.20408447086811066,
675
+ "learning_rate": 4.030158058006262e-05,
676
+ "entropy": 0.25533573105931284,
677
+ "num_tokens": 28246631.0,
678
+ "mean_token_accuracy": 0.9390394508838653,
679
+ "epoch": 0.6017699115044248,
680
+ "step": 680
681
+ },
682
+ {
683
+ "loss": 0.1987,
684
+ "grad_norm": 0.2749539017677307,
685
+ "learning_rate": 3.881024244388827e-05,
686
+ "entropy": 0.2049507148563862,
687
+ "num_tokens": 28686952.0,
688
+ "mean_token_accuracy": 0.9508672833442688,
689
+ "epoch": 0.6106194690265486,
690
+ "step": 690
691
+ },
692
+ {
693
+ "loss": 0.2304,
694
+ "grad_norm": 0.1964893341064453,
695
+ "learning_rate": 3.7329313386694065e-05,
696
+ "entropy": 0.23374196365475655,
697
+ "num_tokens": 29090626.0,
698
+ "mean_token_accuracy": 0.9433914422988892,
699
+ "epoch": 0.6194690265486725,
700
+ "step": 700
701
+ },
702
+ {
703
+ "loss": 0.205,
704
+ "grad_norm": 0.18834583461284637,
705
+ "learning_rate": 3.586017101719432e-05,
706
+ "entropy": 0.21235244944691659,
707
+ "num_tokens": 29510980.0,
708
+ "mean_token_accuracy": 0.9498258531093597,
709
+ "epoch": 0.6283185840707964,
710
+ "step": 710
711
+ },
712
+ {
713
+ "loss": 0.2627,
714
+ "grad_norm": 0.23034313321113586,
715
+ "learning_rate": 3.440418197974039e-05,
716
+ "entropy": 0.26501770280301573,
717
+ "num_tokens": 29922686.0,
718
+ "mean_token_accuracy": 0.9365832090377808,
719
+ "epoch": 0.6371681415929203,
720
+ "step": 720
721
+ },
722
+ {
723
+ "loss": 0.2408,
724
+ "grad_norm": 0.33568480610847473,
725
+ "learning_rate": 3.2962700683023376e-05,
726
+ "entropy": 0.2463029097765684,
727
+ "num_tokens": 30338628.0,
728
+ "mean_token_accuracy": 0.940551894903183,
729
+ "epoch": 0.6460176991150443,
730
+ "step": 730
731
+ },
732
+ {
733
+ "loss": 0.2326,
734
+ "grad_norm": 0.22903338074684143,
735
+ "learning_rate": 3.153706804015873e-05,
736
+ "entropy": 0.23602683916687967,
737
+ "num_tokens": 30751615.0,
738
+ "mean_token_accuracy": 0.9433848887681962,
739
+ "epoch": 0.6548672566371682,
740
+ "step": 740
741
+ },
742
+ {
743
+ "loss": 0.283,
744
+ "grad_norm": 0.24895504117012024,
745
+ "learning_rate": 3.0128610221325022e-05,
746
+ "entropy": 0.28722366876900196,
747
+ "num_tokens": 31149202.0,
748
+ "mean_token_accuracy": 0.9306167840957642,
749
+ "epoch": 0.6637168141592921,
750
+ "step": 750
751
+ },
752
+ {
753
+ "loss": 0.226,
754
+ "grad_norm": 0.20008179545402527,
755
+ "learning_rate": 2.873863742011696e-05,
756
+ "entropy": 0.23511841669678687,
757
+ "num_tokens": 31550325.0,
758
+ "mean_token_accuracy": 0.9445117026567459,
759
+ "epoch": 0.672566371681416,
760
+ "step": 760
761
+ },
762
+ {
763
+ "loss": 0.183,
764
+ "grad_norm": 0.1967935413122177,
765
+ "learning_rate": 2.7368442634760438e-05,
766
+ "entropy": 0.18726294599473475,
767
+ "num_tokens": 31986135.0,
768
+ "mean_token_accuracy": 0.9549686163663864,
769
+ "epoch": 0.6814159292035398,
770
+ "step": 770
771
+ },
772
+ {
773
+ "loss": 0.2182,
774
+ "grad_norm": 0.18984965980052948,
775
+ "learning_rate": 2.6019300465323177e-05,
776
+ "entropy": 0.22221928611397743,
777
+ "num_tokens": 32415282.0,
778
+ "mean_token_accuracy": 0.9468269884586334,
779
+ "epoch": 0.6902654867256637,
780
+ "step": 780
781
+ },
782
+ {
783
+ "loss": 0.2005,
784
+ "grad_norm": 0.22404947876930237,
785
+ "learning_rate": 2.4692465928040043e-05,
786
+ "entropy": 0.20175706930458545,
787
+ "num_tokens": 32833217.0,
788
+ "mean_token_accuracy": 0.951060900092125,
789
+ "epoch": 0.6991150442477876,
790
+ "step": 790
791
+ },
792
+ {
793
+ "loss": 0.21,
794
+ "grad_norm": 0.16944018006324768,
795
+ "learning_rate": 2.3389173287855825e-05,
796
+ "entropy": 0.21427074801176788,
797
+ "num_tokens": 33241388.0,
798
+ "mean_token_accuracy": 0.9489449441432953,
799
+ "epoch": 0.7079646017699115,
800
+ "step": 800
801
+ },
802
+ {
803
+ "loss": 0.2413,
804
+ "grad_norm": 0.2791028916835785,
805
+ "learning_rate": 2.2110634910271554e-05,
806
+ "entropy": 0.24735822584480047,
807
+ "num_tokens": 33638748.0,
808
+ "mean_token_accuracy": 0.9416136890649796,
809
+ "epoch": 0.7168141592920354,
810
+ "step": 810
811
+ },
812
+ {
813
+ "loss": 0.2428,
814
+ "grad_norm": 0.17646969854831696,
815
+ "learning_rate": 2.0858040133562383e-05,
816
+ "entropy": 0.24746169298887252,
817
+ "num_tokens": 34065988.0,
818
+ "mean_token_accuracy": 0.9409119069576264,
819
+ "epoch": 0.7256637168141593,
820
+ "step": 820
821
+ },
822
+ {
823
+ "loss": 0.2431,
824
+ "grad_norm": 0.25427207350730896,
825
+ "learning_rate": 1.963255416241626e-05,
826
+ "entropy": 0.24203080534934998,
827
+ "num_tokens": 34488707.0,
828
+ "mean_token_accuracy": 0.9407094061374665,
829
+ "epoch": 0.7345132743362832,
830
+ "step": 830
831
+ },
832
+ {
833
+ "loss": 0.2239,
834
+ "grad_norm": 0.21476761996746063,
835
+ "learning_rate": 1.843531698402222e-05,
836
+ "entropy": 0.22717048823833466,
837
+ "num_tokens": 34909244.0,
838
+ "mean_token_accuracy": 0.9451982468366623,
839
+ "epoch": 0.7433628318584071,
840
+ "step": 840
841
+ },
842
+ {
843
+ "loss": 0.2208,
844
+ "grad_norm": 0.22185730934143066,
845
+ "learning_rate": 1.7267442307617082e-05,
846
+ "entropy": 0.22914122715592383,
847
+ "num_tokens": 35349557.0,
848
+ "mean_token_accuracy": 0.9452274709939956,
849
+ "epoch": 0.7522123893805309,
850
+ "step": 850
851
+ },
852
+ {
853
+ "loss": 0.2294,
854
+ "grad_norm": 0.1716049462556839,
855
+ "learning_rate": 1.613001652847658e-05,
856
+ "entropy": 0.23541589826345444,
857
+ "num_tokens": 35775557.0,
858
+ "mean_token_accuracy": 0.9432215124368668,
859
+ "epoch": 0.7610619469026548,
860
+ "step": 860
861
+ },
862
+ {
863
+ "loss": 0.3027,
864
+ "grad_norm": 0.22304639220237732,
865
+ "learning_rate": 1.5024097717314894e-05,
866
+ "entropy": 0.3032771345227957,
867
+ "num_tokens": 36187680.0,
868
+ "mean_token_accuracy": 0.9272681981325149,
869
+ "epoch": 0.7699115044247787,
870
+ "step": 870
871
+ },
872
+ {
873
+ "loss": 0.2042,
874
+ "grad_norm": 0.21043775975704193,
875
+ "learning_rate": 1.3950714636032691e-05,
876
+ "entropy": 0.20667122304439545,
877
+ "num_tokens": 36599241.0,
878
+ "mean_token_accuracy": 0.950884211063385,
879
+ "epoch": 0.7787610619469026,
880
+ "step": 880
881
+ },
882
+ {
883
+ "loss": 0.2206,
884
+ "grad_norm": 0.19236122071743011,
885
+ "learning_rate": 1.2910865780728998e-05,
886
+ "entropy": 0.22606479078531266,
887
+ "num_tokens": 37006428.0,
888
+ "mean_token_accuracy": 0.9452178955078125,
889
+ "epoch": 0.7876106194690266,
890
+ "step": 890
891
+ },
892
+ {
893
+ "loss": 0.1997,
894
+ "grad_norm": 0.23871539533138275,
895
+ "learning_rate": 1.1905518452867475e-05,
896
+ "entropy": 0.20729146413505078,
897
+ "num_tokens": 37403547.0,
898
+ "mean_token_accuracy": 0.9499319463968277,
899
+ "epoch": 0.7964601769911505,
900
+ "step": 900
901
+ },
902
+ {
903
+ "loss": 0.2453,
904
+ "grad_norm": 0.18100206553936005,
905
+ "learning_rate": 1.0935607859460984e-05,
906
+ "entropy": 0.2504092514514923,
907
+ "num_tokens": 37815049.0,
908
+ "mean_token_accuracy": 0.9400516986846924,
909
+ "epoch": 0.8053097345132744,
910
+ "step": 910
911
+ },
912
+ {
913
+ "loss": 0.1926,
914
+ "grad_norm": 0.1908176839351654,
915
+ "learning_rate": 1.0002036243111251e-05,
916
+ "entropy": 0.1982258576899767,
917
+ "num_tokens": 38229740.0,
918
+ "mean_token_accuracy": 0.9525273025035859,
919
+ "epoch": 0.8141592920353983,
920
+ "step": 920
921
+ },
922
+ {
923
+ "loss": 0.2521,
924
+ "grad_norm": 0.1694677472114563,
925
+ "learning_rate": 9.10567204271336e-06,
926
+ "entropy": 0.25635765232145785,
927
+ "num_tokens": 38652542.0,
928
+ "mean_token_accuracy": 0.9399167329072953,
929
+ "epoch": 0.8230088495575221,
930
+ "step": 930
931
+ },
932
+ {
933
+ "loss": 0.2137,
934
+ "grad_norm": 0.27372586727142334,
935
+ "learning_rate": 8.247349085605389e-06,
936
+ "entropy": 0.21369801536202432,
937
+ "num_tokens": 39046941.0,
938
+ "mean_token_accuracy": 0.94815693795681,
939
+ "epoch": 0.831858407079646,
940
+ "step": 940
941
+ },
942
+ {
943
+ "loss": 0.2466,
944
+ "grad_norm": 0.20555748045444489,
945
+ "learning_rate": 7.4278658119149695e-06,
946
+ "entropy": 0.24944488294422626,
947
+ "num_tokens": 39471447.0,
948
+ "mean_token_accuracy": 0.9402138769626618,
949
+ "epoch": 0.8407079646017699,
950
+ "step": 950
951
+ },
952
+ {
953
+ "loss": 0.1957,
954
+ "grad_norm": 0.1944950520992279,
955
+ "learning_rate": 6.647984531824064e-06,
956
+ "entropy": 0.20089373365044594,
957
+ "num_tokens": 39880021.0,
958
+ "mean_token_accuracy": 0.9509478449821472,
959
+ "epoch": 0.8495575221238938,
960
+ "step": 960
961
+ },
962
+ {
963
+ "loss": 0.2061,
964
+ "grad_norm": 0.17296965420246124,
965
+ "learning_rate": 5.908430716443086e-06,
966
+ "entropy": 0.21387975346297025,
967
+ "num_tokens": 40310388.0,
968
+ "mean_token_accuracy": 0.9484301716089248,
969
+ "epoch": 0.8584070796460177,
970
+ "step": 970
971
+ },
972
+ {
973
+ "loss": 0.2323,
974
+ "grad_norm": 0.20349572598934174,
975
+ "learning_rate": 5.20989232295393e-06,
976
+ "entropy": 0.23771359361708164,
977
+ "num_tokens": 40710959.0,
978
+ "mean_token_accuracy": 0.94212906062603,
979
+ "epoch": 0.8672566371681416,
980
+ "step": 980
981
+ },
982
+ {
983
+ "loss": 0.2377,
984
+ "grad_norm": 0.18183237314224243,
985
+ "learning_rate": 4.5530191546496515e-06,
986
+ "entropy": 0.2454454731196165,
987
+ "num_tokens": 41138761.0,
988
+ "mean_token_accuracy": 0.9407603412866592,
989
+ "epoch": 0.8761061946902655,
990
+ "step": 990
991
+ },
992
+ {
993
+ "loss": 0.2071,
994
+ "grad_norm": 0.177731454372406,
995
+ "learning_rate": 3.938422256466185e-06,
996
+ "entropy": 0.21258567087352276,
997
+ "num_tokens": 41549796.0,
998
+ "mean_token_accuracy": 0.948466694355011,
999
+ "epoch": 0.8849557522123894,
1000
+ "step": 1000
1001
+ },
1002
+ {
1003
+ "loss": 0.2984,
1004
+ "grad_norm": 0.18465429544448853,
1005
+ "learning_rate": 3.3666733465682833e-06,
1006
+ "entropy": 0.29959365651011466,
1007
+ "num_tokens": 41982205.0,
1008
+ "mean_token_accuracy": 0.9291581392288208,
1009
+ "epoch": 0.8938053097345132,
1010
+ "step": 1010
1011
+ },
1012
+ {
1013
+ "loss": 0.223,
1014
+ "grad_norm": 0.18058468401432037,
1015
+ "learning_rate": 2.8383042845186004e-06,
1016
+ "entropy": 0.22986317798495293,
1017
+ "num_tokens": 42424950.0,
1018
+ "mean_token_accuracy": 0.9453666418790817,
1019
+ "epoch": 0.9026548672566371,
1020
+ "step": 1020
1021
+ },
1022
+ {
1023
+ "loss": 0.221,
1024
+ "grad_norm": 0.19499893486499786,
1025
+ "learning_rate": 2.3538065765244755e-06,
1026
+ "entropy": 0.22309133298695089,
1027
+ "num_tokens": 42834983.0,
1028
+ "mean_token_accuracy": 0.9468386858701706,
1029
+ "epoch": 0.911504424778761,
1030
+ "step": 1030
1031
+ },
1032
+ {
1033
+ "loss": 0.2156,
1034
+ "grad_norm": 0.19273503124713898,
1035
+ "learning_rate": 1.913630918222792e-06,
1036
+ "entropy": 0.21915269698947668,
1037
+ "num_tokens": 43234797.0,
1038
+ "mean_token_accuracy": 0.9469860643148422,
1039
+ "epoch": 0.9203539823008849,
1040
+ "step": 1040
1041
+ },
1042
+ {
1043
+ "loss": 0.2552,
1044
+ "grad_norm": 0.15608751773834229,
1045
+ "learning_rate": 1.5181867754280931e-06,
1046
+ "entropy": 0.25907820984721186,
1047
+ "num_tokens": 43679426.0,
1048
+ "mean_token_accuracy": 0.9379706591367721,
1049
+ "epoch": 0.9292035398230089,
1050
+ "step": 1050
1051
+ },
1052
+ {
1053
+ "loss": 0.2416,
1054
+ "grad_norm": 0.1610129177570343,
1055
+ "learning_rate": 1.1678420032341153e-06,
1056
+ "entropy": 0.24386902302503585,
1057
+ "num_tokens": 44112256.0,
1058
+ "mean_token_accuracy": 0.9416323363780975,
1059
+ "epoch": 0.9380530973451328,
1060
+ "step": 1060
1061
+ },
1062
+ {
1063
+ "loss": 0.2071,
1064
+ "grad_norm": 0.16213390231132507,
1065
+ "learning_rate": 8.629225038229049e-07,
1066
+ "entropy": 0.20874513685703278,
1067
+ "num_tokens": 44570422.0,
1068
+ "mean_token_accuracy": 0.9496494054794311,
1069
+ "epoch": 0.9469026548672567,
1070
+ "step": 1070
1071
+ },
1072
+ {
1073
+ "loss": 0.1747,
1074
+ "grad_norm": 0.18272142112255096,
1075
+ "learning_rate": 6.037119232999266e-07,
1076
+ "entropy": 0.17933553121984006,
1077
+ "num_tokens": 44972484.0,
1078
+ "mean_token_accuracy": 0.9556117475032806,
1079
+ "epoch": 0.9557522123893806,
1080
+ "step": 1080
1081
+ },
1082
+ {
1083
+ "loss": 0.2177,
1084
+ "grad_norm": 0.21331429481506348,
1085
+ "learning_rate": 3.904513878371818e-07,
1086
+ "entropy": 0.2203224040567875,
1087
+ "num_tokens": 45399966.0,
1088
+ "mean_token_accuracy": 0.9465226858854294,
1089
+ "epoch": 0.9646017699115044,
1090
+ "step": 1090
1091
+ },
1092
+ {
1093
+ "loss": 0.2361,
1094
+ "grad_norm": 0.19762705266475677,
1095
+ "learning_rate": 2.233392793697442e-07,
1096
+ "entropy": 0.23971713967621328,
1097
+ "num_tokens": 45841913.0,
1098
+ "mean_token_accuracy": 0.9428269326686859,
1099
+ "epoch": 0.9734513274336283,
1100
+ "step": 1100
1101
+ },
1102
+ {
1103
+ "loss": 0.254,
1104
+ "grad_norm": 0.21293021738529205,
1105
+ "learning_rate": 1.0253105105438865e-07,
1106
+ "entropy": 0.25558883510529995,
1107
+ "num_tokens": 46272798.0,
1108
+ "mean_token_accuracy": 0.9393420517444611,
1109
+ "epoch": 0.9823008849557522,
1110
+ "step": 1110
1111
+ },
1112
+ {
1113
+ "loss": 0.2335,
1114
+ "grad_norm": 0.1932452768087387,
1115
+ "learning_rate": 2.8139082661954307e-08,
1116
+ "entropy": 0.24025008846074342,
1117
+ "num_tokens": 46703041.0,
1118
+ "mean_token_accuracy": 0.9413352489471436,
1119
+ "epoch": 0.9911504424778761,
1120
+ "step": 1120
1121
+ },
1122
+ {
1123
+ "loss": 0.2372,
1124
+ "grad_norm": 0.1784687340259552,
1125
+ "learning_rate": 2.32576038022847e-10,
1126
+ "entropy": 0.23823871053755283,
1127
+ "num_tokens": 47110855.0,
1128
+ "mean_token_accuracy": 0.943606698513031,
1129
+ "epoch": 1.0,
1130
+ "step": 1130
1131
+ },
1132
+ {
1133
+ "train_runtime": 18643.3649,
1134
+ "train_samples_per_second": 1.94,
1135
+ "train_steps_per_second": 0.061,
1136
+ "total_flos": 2.0704071601101865e+19,
1137
+ "train_loss": 0.33413480024422165,
1138
+ "epoch": 1.0,
1139
+ "step": 1130
1140
+ }
1141
+ ]