umer07 commited on
Commit
734a413
·
verified ·
1 Parent(s): e204bc4

Fathom: upload expert-e7-reports/training_log.json

Browse files
adapters/expert-e7-reports/training_log.json ADDED
@@ -0,0 +1,2951 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "loss": 2.0326,
4
+ "grad_norm": 1.0961493253707886,
5
+ "learning_rate": 9e-06,
6
+ "entropy": 1.4562864780426026,
7
+ "num_tokens": 113103.0,
8
+ "mean_token_accuracy": 0.5854019731283188,
9
+ "epoch": 0.0034019391052900155,
10
+ "step": 10
11
+ },
12
+ {
13
+ "loss": 1.9722,
14
+ "grad_norm": 0.5352939963340759,
15
+ "learning_rate": 1.9e-05,
16
+ "entropy": 1.6092135667800904,
17
+ "num_tokens": 221983.0,
18
+ "mean_token_accuracy": 0.5872704654932022,
19
+ "epoch": 0.006803878210580031,
20
+ "step": 20
21
+ },
22
+ {
23
+ "loss": 1.8198,
24
+ "grad_norm": 0.5580214262008667,
25
+ "learning_rate": 2.9e-05,
26
+ "entropy": 1.7635144531726836,
27
+ "num_tokens": 343755.0,
28
+ "mean_token_accuracy": 0.59696926176548,
29
+ "epoch": 0.010205817315870046,
30
+ "step": 30
31
+ },
32
+ {
33
+ "loss": 1.6405,
34
+ "grad_norm": 0.9694721102714539,
35
+ "learning_rate": 3.9000000000000006e-05,
36
+ "entropy": 1.6470328569412231,
37
+ "num_tokens": 462278.0,
38
+ "mean_token_accuracy": 0.6220342874526977,
39
+ "epoch": 0.013607756421160062,
40
+ "step": 40
41
+ },
42
+ {
43
+ "loss": 1.2829,
44
+ "grad_norm": 0.8506905436515808,
45
+ "learning_rate": 4.9e-05,
46
+ "entropy": 1.3260977208614348,
47
+ "num_tokens": 580883.0,
48
+ "mean_token_accuracy": 0.6881176471710205,
49
+ "epoch": 0.017009695526450076,
50
+ "step": 50
51
+ },
52
+ {
53
+ "loss": 1.0615,
54
+ "grad_norm": 0.5665390491485596,
55
+ "learning_rate": 5.9e-05,
56
+ "entropy": 1.0532318294048308,
57
+ "num_tokens": 693719.0,
58
+ "mean_token_accuracy": 0.7430563151836396,
59
+ "epoch": 0.020411634631740092,
60
+ "step": 60
61
+ },
62
+ {
63
+ "loss": 0.9244,
64
+ "grad_norm": 0.5383474230766296,
65
+ "learning_rate": 6.9e-05,
66
+ "entropy": 0.9287566632032395,
67
+ "num_tokens": 800049.0,
68
+ "mean_token_accuracy": 0.7744365096092224,
69
+ "epoch": 0.023813573737030108,
70
+ "step": 70
71
+ },
72
+ {
73
+ "loss": 0.8564,
74
+ "grad_norm": 0.5389857292175293,
75
+ "learning_rate": 7.900000000000001e-05,
76
+ "entropy": 0.8546981632709503,
77
+ "num_tokens": 909347.0,
78
+ "mean_token_accuracy": 0.788225668668747,
79
+ "epoch": 0.027215512842320124,
80
+ "step": 80
81
+ },
82
+ {
83
+ "loss": 0.7884,
84
+ "grad_norm": 0.5869827270507812,
85
+ "learning_rate": 8.900000000000001e-05,
86
+ "entropy": 0.7829572021961212,
87
+ "num_tokens": 1024214.0,
88
+ "mean_token_accuracy": 0.8022073119878769,
89
+ "epoch": 0.030617451947610137,
90
+ "step": 90
91
+ },
92
+ {
93
+ "loss": 0.7827,
94
+ "grad_norm": 0.4458995461463928,
95
+ "learning_rate": 9.900000000000001e-05,
96
+ "entropy": 0.796157357096672,
97
+ "num_tokens": 1145337.0,
98
+ "mean_token_accuracy": 0.8003427386283875,
99
+ "epoch": 0.03401939105290015,
100
+ "step": 100
101
+ },
102
+ {
103
+ "loss": 0.7694,
104
+ "grad_norm": 0.4616456925868988,
105
+ "learning_rate": 9.999752209583493e-05,
106
+ "entropy": 0.7759479373693466,
107
+ "num_tokens": 1255483.0,
108
+ "mean_token_accuracy": 0.8059569209814071,
109
+ "epoch": 0.037421330158190165,
110
+ "step": 110
111
+ },
112
+ {
113
+ "loss": 0.7328,
114
+ "grad_norm": 0.5324557423591614,
115
+ "learning_rate": 9.998895681650353e-05,
116
+ "entropy": 0.7250411480665206,
117
+ "num_tokens": 1376406.0,
118
+ "mean_token_accuracy": 0.8113020449876786,
119
+ "epoch": 0.040823269263480184,
120
+ "step": 120
121
+ },
122
+ {
123
+ "loss": 0.7204,
124
+ "grad_norm": 0.4355196952819824,
125
+ "learning_rate": 9.997427461844063e-05,
126
+ "entropy": 0.7220166236162185,
127
+ "num_tokens": 1487418.0,
128
+ "mean_token_accuracy": 0.8159421294927597,
129
+ "epoch": 0.0442252083687702,
130
+ "step": 130
131
+ },
132
+ {
133
+ "loss": 0.7181,
134
+ "grad_norm": 0.4837367534637451,
135
+ "learning_rate": 9.99534772982393e-05,
136
+ "entropy": 0.7167926013469696,
137
+ "num_tokens": 1603603.0,
138
+ "mean_token_accuracy": 0.8169557631015778,
139
+ "epoch": 0.047627147474060216,
140
+ "step": 140
141
+ },
142
+ {
143
+ "loss": 0.7004,
144
+ "grad_norm": 0.42535701394081116,
145
+ "learning_rate": 9.992656740077193e-05,
146
+ "entropy": 0.7097186267375946,
147
+ "num_tokens": 1719331.0,
148
+ "mean_token_accuracy": 0.8169945359230042,
149
+ "epoch": 0.05102908657935023,
150
+ "step": 150
151
+ },
152
+ {
153
+ "loss": 0.74,
154
+ "grad_norm": 0.4244067966938019,
155
+ "learning_rate": 9.989354821887894e-05,
156
+ "entropy": 0.7415460795164108,
157
+ "num_tokens": 1835165.0,
158
+ "mean_token_accuracy": 0.8098712652921677,
159
+ "epoch": 0.05443102568464025,
160
+ "step": 160
161
+ },
162
+ {
163
+ "loss": 0.7071,
164
+ "grad_norm": 0.46471285820007324,
165
+ "learning_rate": 9.985442379296574e-05,
166
+ "entropy": 0.7142940938472748,
167
+ "num_tokens": 1955878.0,
168
+ "mean_token_accuracy": 0.8144379884004593,
169
+ "epoch": 0.05783296478993026,
170
+ "step": 170
171
+ },
172
+ {
173
+ "loss": 0.6959,
174
+ "grad_norm": 0.48218467831611633,
175
+ "learning_rate": 9.980919891050836e-05,
176
+ "entropy": 0.7002523899078369,
177
+ "num_tokens": 2079862.0,
178
+ "mean_token_accuracy": 0.8176177382469177,
179
+ "epoch": 0.06123490389522027,
180
+ "step": 180
181
+ },
182
+ {
183
+ "loss": 0.6977,
184
+ "grad_norm": 0.43155911564826965,
185
+ "learning_rate": 9.97578791054677e-05,
186
+ "entropy": 0.7015494972467422,
187
+ "num_tokens": 2183469.0,
188
+ "mean_token_accuracy": 0.8218911945819855,
189
+ "epoch": 0.06463684300051029,
190
+ "step": 190
191
+ },
192
+ {
193
+ "loss": 0.7036,
194
+ "grad_norm": 0.4237188398838043,
195
+ "learning_rate": 9.970047065761225e-05,
196
+ "entropy": 0.7062139421701431,
197
+ "num_tokens": 2297979.0,
198
+ "mean_token_accuracy": 0.8169696599245071,
199
+ "epoch": 0.0680387821058003,
200
+ "step": 200
201
+ },
202
+ {
203
+ "loss": 0.6896,
204
+ "grad_norm": 0.4372621178627014,
205
+ "learning_rate": 9.963698059174973e-05,
206
+ "entropy": 0.6874466806650161,
207
+ "num_tokens": 2413698.0,
208
+ "mean_token_accuracy": 0.8205429404973984,
209
+ "epoch": 0.07144072121109032,
210
+ "step": 210
211
+ },
212
+ {
213
+ "loss": 0.6861,
214
+ "grad_norm": 0.42981356382369995,
215
+ "learning_rate": 9.956741667686749e-05,
216
+ "entropy": 0.6936574131250381,
217
+ "num_tokens": 2523689.0,
218
+ "mean_token_accuracy": 0.8207722246646881,
219
+ "epoch": 0.07484266031638033,
220
+ "step": 220
221
+ },
222
+ {
223
+ "loss": 0.6876,
224
+ "grad_norm": 0.4125567078590393,
225
+ "learning_rate": 9.949178742518188e-05,
226
+ "entropy": 0.7051049053668976,
227
+ "num_tokens": 2629062.0,
228
+ "mean_token_accuracy": 0.8210607588291168,
229
+ "epoch": 0.07824459942167035,
230
+ "step": 230
231
+ },
232
+ {
233
+ "loss": 0.7149,
234
+ "grad_norm": 0.3944653868675232,
235
+ "learning_rate": 9.941010209109653e-05,
236
+ "entropy": 0.7121995657682418,
237
+ "num_tokens": 2744221.0,
238
+ "mean_token_accuracy": 0.8171839863061905,
239
+ "epoch": 0.08164653852696037,
240
+ "step": 240
241
+ },
242
+ {
243
+ "loss": 0.7041,
244
+ "grad_norm": 0.4692147970199585,
245
+ "learning_rate": 9.932237067007014e-05,
246
+ "entropy": 0.7010555505752564,
247
+ "num_tokens": 2867672.0,
248
+ "mean_token_accuracy": 0.8154427468776703,
249
+ "epoch": 0.08504847763225039,
250
+ "step": 250
251
+ },
252
+ {
253
+ "loss": 0.6701,
254
+ "grad_norm": 0.36812499165534973,
255
+ "learning_rate": 9.922860389739316e-05,
256
+ "entropy": 0.673400056362152,
257
+ "num_tokens": 2969786.0,
258
+ "mean_token_accuracy": 0.826720854640007,
259
+ "epoch": 0.0884504167375404,
260
+ "step": 260
261
+ },
262
+ {
263
+ "loss": 0.6828,
264
+ "grad_norm": 0.5079509019851685,
265
+ "learning_rate": 9.91288132468743e-05,
266
+ "entropy": 0.6830800861120224,
267
+ "num_tokens": 3078545.0,
268
+ "mean_token_accuracy": 0.8246483653783798,
269
+ "epoch": 0.09185235584283041,
270
+ "step": 270
271
+ },
272
+ {
273
+ "loss": 0.6591,
274
+ "grad_norm": 0.4336896538734436,
275
+ "learning_rate": 9.90230109294365e-05,
276
+ "entropy": 0.6682640433311462,
277
+ "num_tokens": 3202461.0,
278
+ "mean_token_accuracy": 0.8245112657546997,
279
+ "epoch": 0.09525429494812043,
280
+ "step": 280
281
+ },
282
+ {
283
+ "loss": 0.6742,
284
+ "grad_norm": 0.38131922483444214,
285
+ "learning_rate": 9.891120989162277e-05,
286
+ "entropy": 0.6732221603393554,
287
+ "num_tokens": 3317749.0,
288
+ "mean_token_accuracy": 0.8278978168964386,
289
+ "epoch": 0.09865623405341044,
290
+ "step": 290
291
+ },
292
+ {
293
+ "loss": 0.7072,
294
+ "grad_norm": 0.43391740322113037,
295
+ "learning_rate": 9.879342381401183e-05,
296
+ "entropy": 0.719240027666092,
297
+ "num_tokens": 3438813.0,
298
+ "mean_token_accuracy": 0.8140431463718414,
299
+ "epoch": 0.10205817315870046,
300
+ "step": 300
301
+ },
302
+ {
303
+ "loss": 0.6726,
304
+ "grad_norm": 0.37840133905410767,
305
+ "learning_rate": 9.866966710954432e-05,
306
+ "entropy": 0.6736223340034485,
307
+ "num_tokens": 3558578.0,
308
+ "mean_token_accuracy": 0.8215447336435318,
309
+ "epoch": 0.10546011226399048,
310
+ "step": 310
311
+ },
312
+ {
313
+ "loss": 0.6739,
314
+ "grad_norm": 0.36936870217323303,
315
+ "learning_rate": 9.853995492175894e-05,
316
+ "entropy": 0.6741836041212081,
317
+ "num_tokens": 3680797.0,
318
+ "mean_token_accuracy": 0.8207811802625656,
319
+ "epoch": 0.1088620513692805,
320
+ "step": 320
321
+ },
322
+ {
323
+ "loss": 0.6462,
324
+ "grad_norm": 0.3381315767765045,
325
+ "learning_rate": 9.840430312293954e-05,
326
+ "entropy": 0.6454634428024292,
327
+ "num_tokens": 3798203.0,
328
+ "mean_token_accuracy": 0.8290177077054978,
329
+ "epoch": 0.1122639904745705,
330
+ "step": 330
331
+ },
332
+ {
333
+ "loss": 0.6309,
334
+ "grad_norm": 0.4414357841014862,
335
+ "learning_rate": 9.826272831217282e-05,
336
+ "entropy": 0.6358878672122955,
337
+ "num_tokens": 3908285.0,
338
+ "mean_token_accuracy": 0.8346272200345993,
339
+ "epoch": 0.11566592957986052,
340
+ "step": 340
341
+ },
342
+ {
343
+ "loss": 0.662,
344
+ "grad_norm": 0.3412431478500366,
345
+ "learning_rate": 9.811524781331726e-05,
346
+ "entropy": 0.669054564833641,
347
+ "num_tokens": 4027009.0,
348
+ "mean_token_accuracy": 0.823895263671875,
349
+ "epoch": 0.11906786868515054,
350
+ "step": 350
351
+ },
352
+ {
353
+ "loss": 0.6396,
354
+ "grad_norm": 0.4202592670917511,
355
+ "learning_rate": 9.796187967288317e-05,
356
+ "entropy": 0.6534929439425469,
357
+ "num_tokens": 4141379.0,
358
+ "mean_token_accuracy": 0.8286080300807953,
359
+ "epoch": 0.12246980779044055,
360
+ "step": 360
361
+ },
362
+ {
363
+ "loss": 0.6462,
364
+ "grad_norm": 0.4112204313278198,
365
+ "learning_rate": 9.780264265782452e-05,
366
+ "entropy": 0.6516435265541076,
367
+ "num_tokens": 4264298.0,
368
+ "mean_token_accuracy": 0.8284994632005691,
369
+ "epoch": 0.12587174689573058,
370
+ "step": 370
371
+ },
372
+ {
373
+ "loss": 0.6385,
374
+ "grad_norm": 0.4041576683521271,
375
+ "learning_rate": 9.763755625324247e-05,
376
+ "entropy": 0.6425079524517059,
377
+ "num_tokens": 4380764.0,
378
+ "mean_token_accuracy": 0.8293725460767746,
379
+ "epoch": 0.12927368600102057,
380
+ "step": 380
381
+ },
382
+ {
383
+ "loss": 0.6474,
384
+ "grad_norm": 0.40553557872772217,
385
+ "learning_rate": 9.746664066000105e-05,
386
+ "entropy": 0.650575777888298,
387
+ "num_tokens": 4502319.0,
388
+ "mean_token_accuracy": 0.8261306613683701,
389
+ "epoch": 0.1326756251063106,
390
+ "step": 390
391
+ },
392
+ {
393
+ "loss": 0.6208,
394
+ "grad_norm": 0.4033198058605194,
395
+ "learning_rate": 9.728991679225534e-05,
396
+ "entropy": 0.6212592482566833,
397
+ "num_tokens": 4619137.0,
398
+ "mean_token_accuracy": 0.8338732123374939,
399
+ "epoch": 0.1360775642116006,
400
+ "step": 400
401
+ },
402
+ {
403
+ "loss": 0.6588,
404
+ "grad_norm": 0.3973551392555237,
405
+ "learning_rate": 9.71074062748922e-05,
406
+ "entropy": 0.6602439433336258,
407
+ "num_tokens": 4740514.0,
408
+ "mean_token_accuracy": 0.8240435630083084,
409
+ "epoch": 0.13947950331689063,
410
+ "step": 410
411
+ },
412
+ {
413
+ "loss": 0.6231,
414
+ "grad_norm": 0.38988974690437317,
415
+ "learning_rate": 9.691913144088423e-05,
416
+ "entropy": 0.6275951579213143,
417
+ "num_tokens": 4849434.0,
418
+ "mean_token_accuracy": 0.8351284950971604,
419
+ "epoch": 0.14288144242218065,
420
+ "step": 420
421
+ },
422
+ {
423
+ "loss": 0.6486,
424
+ "grad_norm": 0.4000033736228943,
425
+ "learning_rate": 9.672511532855695e-05,
426
+ "entropy": 0.6495797425508499,
427
+ "num_tokens": 4970677.0,
428
+ "mean_token_accuracy": 0.8287895292043685,
429
+ "epoch": 0.14628338152747067,
430
+ "step": 430
431
+ },
432
+ {
433
+ "loss": 0.6369,
434
+ "grad_norm": 0.36982354521751404,
435
+ "learning_rate": 9.652538167876966e-05,
436
+ "entropy": 0.6435147285461426,
437
+ "num_tokens": 5079274.0,
438
+ "mean_token_accuracy": 0.8325352728366852,
439
+ "epoch": 0.14968532063276066,
440
+ "step": 440
441
+ },
442
+ {
443
+ "loss": 0.6589,
444
+ "grad_norm": 0.30258873105049133,
445
+ "learning_rate": 9.63199549320105e-05,
446
+ "entropy": 0.656851077079773,
447
+ "num_tokens": 5199280.0,
448
+ "mean_token_accuracy": 0.8274909108877182,
449
+ "epoch": 0.15308725973805068,
450
+ "step": 450
451
+ },
452
+ {
453
+ "loss": 0.6171,
454
+ "grad_norm": 0.40595975518226624,
455
+ "learning_rate": 9.610886022540558e-05,
456
+ "entropy": 0.6167075544595718,
457
+ "num_tokens": 5314855.0,
458
+ "mean_token_accuracy": 0.8360531657934189,
459
+ "epoch": 0.1564891988433407,
460
+ "step": 460
461
+ },
462
+ {
463
+ "loss": 0.6535,
464
+ "grad_norm": 0.3571644723415375,
465
+ "learning_rate": 9.589212338964331e-05,
466
+ "entropy": 0.6573659986257553,
467
+ "num_tokens": 5427712.0,
468
+ "mean_token_accuracy": 0.826265224814415,
469
+ "epoch": 0.15989113794863072,
470
+ "step": 470
471
+ },
472
+ {
473
+ "loss": 0.6322,
474
+ "grad_norm": 0.3340984880924225,
475
+ "learning_rate": 9.566977094581344e-05,
476
+ "entropy": 0.6363773226737977,
477
+ "num_tokens": 5555811.0,
478
+ "mean_token_accuracy": 0.830034053325653,
479
+ "epoch": 0.16329307705392074,
480
+ "step": 480
481
+ },
482
+ {
483
+ "loss": 0.6505,
484
+ "grad_norm": 0.36710014939308167,
485
+ "learning_rate": 9.544183010216184e-05,
486
+ "entropy": 0.6666623204946518,
487
+ "num_tokens": 5670677.0,
488
+ "mean_token_accuracy": 0.8263821274042129,
489
+ "epoch": 0.16669501615921076,
490
+ "step": 490
491
+ },
492
+ {
493
+ "loss": 0.6484,
494
+ "grad_norm": 0.35856229066848755,
495
+ "learning_rate": 9.520832875076118e-05,
496
+ "entropy": 0.6493088006973267,
497
+ "num_tokens": 5791079.0,
498
+ "mean_token_accuracy": 0.826611652970314,
499
+ "epoch": 0.17009695526450078,
500
+ "step": 500
501
+ },
502
+ {
503
+ "loss": 0.6395,
504
+ "grad_norm": 0.36879998445510864,
505
+ "learning_rate": 9.496929546409791e-05,
506
+ "entropy": 0.6355485945940018,
507
+ "num_tokens": 5915427.0,
508
+ "mean_token_accuracy": 0.8316580206155777,
509
+ "epoch": 0.17349889436979077,
510
+ "step": 510
511
+ },
512
+ {
513
+ "loss": 0.6405,
514
+ "grad_norm": 0.3662184774875641,
515
+ "learning_rate": 9.472475949157591e-05,
516
+ "entropy": 0.6411178290843964,
517
+ "num_tokens": 6042113.0,
518
+ "mean_token_accuracy": 0.8287693798542023,
519
+ "epoch": 0.1769008334750808,
520
+ "step": 520
521
+ },
522
+ {
523
+ "loss": 0.6248,
524
+ "grad_norm": 0.3676609992980957,
525
+ "learning_rate": 9.447475075593746e-05,
526
+ "entropy": 0.6330993086099624,
527
+ "num_tokens": 6158285.0,
528
+ "mean_token_accuracy": 0.8320850521326065,
529
+ "epoch": 0.1803027725803708,
530
+ "step": 530
531
+ },
532
+ {
533
+ "loss": 0.605,
534
+ "grad_norm": 0.3490847945213318,
535
+ "learning_rate": 9.421929984960168e-05,
536
+ "entropy": 0.6043026179075242,
537
+ "num_tokens": 6269273.0,
538
+ "mean_token_accuracy": 0.8386462539434433,
539
+ "epoch": 0.18370471168566083,
540
+ "step": 540
541
+ },
542
+ {
543
+ "loss": 0.6035,
544
+ "grad_norm": 0.3701430857181549,
545
+ "learning_rate": 9.395843803092103e-05,
546
+ "entropy": 0.6117337167263031,
547
+ "num_tokens": 6374978.0,
548
+ "mean_token_accuracy": 0.8389517337083816,
549
+ "epoch": 0.18710665079095085,
550
+ "step": 550
551
+ },
552
+ {
553
+ "loss": 0.667,
554
+ "grad_norm": 0.37965649366378784,
555
+ "learning_rate": 9.369219722035657e-05,
556
+ "entropy": 0.6723731458187103,
557
+ "num_tokens": 6492671.0,
558
+ "mean_token_accuracy": 0.8266532897949219,
559
+ "epoch": 0.19050858989624087,
560
+ "step": 560
561
+ },
562
+ {
563
+ "loss": 0.6473,
564
+ "grad_norm": 0.335111528635025,
565
+ "learning_rate": 9.34206099965717e-05,
566
+ "entropy": 0.6496838480234146,
567
+ "num_tokens": 6605704.0,
568
+ "mean_token_accuracy": 0.8288087368011474,
569
+ "epoch": 0.19391052900153088,
570
+ "step": 570
571
+ },
572
+ {
573
+ "loss": 0.6479,
574
+ "grad_norm": 0.4136720895767212,
575
+ "learning_rate": 9.314370959244589e-05,
576
+ "entropy": 0.6532777503132821,
577
+ "num_tokens": 6724628.0,
578
+ "mean_token_accuracy": 0.8273604065179825,
579
+ "epoch": 0.19731246810682088,
580
+ "step": 580
581
+ },
582
+ {
583
+ "loss": 0.6917,
584
+ "grad_norm": 0.349250465631485,
585
+ "learning_rate": 9.286152989100808e-05,
586
+ "entropy": 0.6956728607416153,
587
+ "num_tokens": 6842793.0,
588
+ "mean_token_accuracy": 0.8188706696033478,
589
+ "epoch": 0.2007144072121109,
590
+ "step": 590
591
+ },
592
+ {
593
+ "loss": 0.6596,
594
+ "grad_norm": 0.37561726570129395,
595
+ "learning_rate": 9.257410542129048e-05,
596
+ "entropy": 0.6637489289045334,
597
+ "num_tokens": 6969269.0,
598
+ "mean_token_accuracy": 0.8233319729566574,
599
+ "epoch": 0.20411634631740092,
600
+ "step": 600
601
+ },
602
+ {
603
+ "loss": 0.6148,
604
+ "grad_norm": 0.35306721925735474,
605
+ "learning_rate": 9.22814713541035e-05,
606
+ "entropy": 0.609418871998787,
607
+ "num_tokens": 7089158.0,
608
+ "mean_token_accuracy": 0.8354996383190155,
609
+ "epoch": 0.20751828542269093,
610
+ "step": 610
611
+ },
612
+ {
613
+ "loss": 0.644,
614
+ "grad_norm": 0.38659054040908813,
615
+ "learning_rate": 9.198366349773205e-05,
616
+ "entropy": 0.6533850967884064,
617
+ "num_tokens": 7214581.0,
618
+ "mean_token_accuracy": 0.8262244701385498,
619
+ "epoch": 0.21092022452798095,
620
+ "step": 620
621
+ },
622
+ {
623
+ "loss": 0.6227,
624
+ "grad_norm": 0.3564472794532776,
625
+ "learning_rate": 9.168071829355376e-05,
626
+ "entropy": 0.6244180202484131,
627
+ "num_tokens": 7332309.0,
628
+ "mean_token_accuracy": 0.8361935257911682,
629
+ "epoch": 0.21432216363327097,
630
+ "step": 630
631
+ },
632
+ {
633
+ "loss": 0.6308,
634
+ "grad_norm": 0.3747089207172394,
635
+ "learning_rate": 9.137267281157999e-05,
636
+ "entropy": 0.6325557798147201,
637
+ "num_tokens": 7450413.0,
638
+ "mean_token_accuracy": 0.8316955536603927,
639
+ "epoch": 0.217724102738561,
640
+ "step": 640
641
+ },
642
+ {
643
+ "loss": 0.616,
644
+ "grad_norm": 0.4263840913772583,
645
+ "learning_rate": 9.105956474591953e-05,
646
+ "entropy": 0.620253998041153,
647
+ "num_tokens": 7557132.0,
648
+ "mean_token_accuracy": 0.8373414397239685,
649
+ "epoch": 0.22112604184385098,
650
+ "step": 650
651
+ },
652
+ {
653
+ "loss": 0.6176,
654
+ "grad_norm": 0.38803404569625854,
655
+ "learning_rate": 9.074143241016631e-05,
656
+ "entropy": 0.6201047956943512,
657
+ "num_tokens": 7671938.0,
658
+ "mean_token_accuracy": 0.8363149344921113,
659
+ "epoch": 0.224527980949141,
660
+ "step": 660
661
+ },
662
+ {
663
+ "loss": 0.6434,
664
+ "grad_norm": 0.3495098054409027,
665
+ "learning_rate": 9.04183147327111e-05,
666
+ "entropy": 0.6501924157142639,
667
+ "num_tokens": 7798607.0,
668
+ "mean_token_accuracy": 0.8257293999195099,
669
+ "epoch": 0.22792992005443102,
670
+ "step": 670
671
+ },
672
+ {
673
+ "loss": 0.6176,
674
+ "grad_norm": 0.34818509221076965,
675
+ "learning_rate": 9.009025125197792e-05,
676
+ "entropy": 0.6221291124820709,
677
+ "num_tokens": 7904767.0,
678
+ "mean_token_accuracy": 0.8381055980920792,
679
+ "epoch": 0.23133185915972104,
680
+ "step": 680
681
+ },
682
+ {
683
+ "loss": 0.6194,
684
+ "grad_norm": 0.35937735438346863,
685
+ "learning_rate": 8.975728211158609e-05,
686
+ "entropy": 0.6223055779933929,
687
+ "num_tokens": 8026821.0,
688
+ "mean_token_accuracy": 0.8331304669380188,
689
+ "epoch": 0.23473379826501106,
690
+ "step": 690
691
+ },
692
+ {
693
+ "loss": 0.6343,
694
+ "grad_norm": 0.3995741307735443,
695
+ "learning_rate": 8.941944805543788e-05,
696
+ "entropy": 0.6313437297940254,
697
+ "num_tokens": 8137346.0,
698
+ "mean_token_accuracy": 0.8347465574741364,
699
+ "epoch": 0.23813573737030108,
700
+ "step": 700
701
+ },
702
+ {
703
+ "loss": 0.6447,
704
+ "grad_norm": 0.38594338297843933,
705
+ "learning_rate": 8.907679042273293e-05,
706
+ "entropy": 0.6474226981401443,
707
+ "num_tokens": 8241289.0,
708
+ "mean_token_accuracy": 0.8319136798381805,
709
+ "epoch": 0.24153767647559107,
710
+ "step": 710
711
+ },
712
+ {
713
+ "loss": 0.605,
714
+ "grad_norm": 0.3661476969718933,
715
+ "learning_rate": 8.872935114290979e-05,
716
+ "entropy": 0.6085086613893509,
717
+ "num_tokens": 8357072.0,
718
+ "mean_token_accuracy": 0.8373961597681046,
719
+ "epoch": 0.2449396155808811,
720
+ "step": 720
721
+ },
722
+ {
723
+ "loss": 0.5994,
724
+ "grad_norm": 0.3534446954727173,
725
+ "learning_rate": 8.837717273051515e-05,
726
+ "entropy": 0.5992837458848953,
727
+ "num_tokens": 8480791.0,
728
+ "mean_token_accuracy": 0.8372073084115982,
729
+ "epoch": 0.2483415546861711,
730
+ "step": 730
731
+ },
732
+ {
733
+ "loss": 0.591,
734
+ "grad_norm": 0.3611646294593811,
735
+ "learning_rate": 8.802029828000156e-05,
736
+ "entropy": 0.5920722305774688,
737
+ "num_tokens": 8593584.0,
738
+ "mean_token_accuracy": 0.840802788734436,
739
+ "epoch": 0.25174349379146116,
740
+ "step": 740
741
+ },
742
+ {
743
+ "loss": 0.6379,
744
+ "grad_norm": 0.3454264998435974,
745
+ "learning_rate": 8.765877146045413e-05,
746
+ "entropy": 0.645842906832695,
747
+ "num_tokens": 8697045.0,
748
+ "mean_token_accuracy": 0.832975035905838,
749
+ "epoch": 0.2551454328967511,
750
+ "step": 750
751
+ },
752
+ {
753
+ "loss": 0.588,
754
+ "grad_norm": 0.36970141530036926,
755
+ "learning_rate": 8.729263651024705e-05,
756
+ "entropy": 0.589533594250679,
757
+ "num_tokens": 8810383.0,
758
+ "mean_token_accuracy": 0.8419877678155899,
759
+ "epoch": 0.25854737200204114,
760
+ "step": 760
761
+ },
762
+ {
763
+ "loss": 0.6126,
764
+ "grad_norm": 0.30673205852508545,
765
+ "learning_rate": 8.692193823163017e-05,
766
+ "entropy": 0.603627099096775,
767
+ "num_tokens": 8926623.0,
768
+ "mean_token_accuracy": 0.8384459555149079,
769
+ "epoch": 0.26194931110733116,
770
+ "step": 770
771
+ },
772
+ {
773
+ "loss": 0.6416,
774
+ "grad_norm": 0.40019306540489197,
775
+ "learning_rate": 8.654672198524692e-05,
776
+ "entropy": 0.6472606122493744,
777
+ "num_tokens": 9037995.0,
778
+ "mean_token_accuracy": 0.8314683675765991,
779
+ "epoch": 0.2653512502126212,
780
+ "step": 780
781
+ },
782
+ {
783
+ "loss": 0.6028,
784
+ "grad_norm": 0.3364422917366028,
785
+ "learning_rate": 8.616703368458361e-05,
786
+ "entropy": 0.6071313232183456,
787
+ "num_tokens": 9159140.0,
788
+ "mean_token_accuracy": 0.8371691346168518,
789
+ "epoch": 0.2687531893179112,
790
+ "step": 790
791
+ },
792
+ {
793
+ "loss": 0.6054,
794
+ "grad_norm": 0.3635218143463135,
795
+ "learning_rate": 8.578291979035132e-05,
796
+ "entropy": 0.615416657924652,
797
+ "num_tokens": 9267853.0,
798
+ "mean_token_accuracy": 0.8379261702299118,
799
+ "epoch": 0.2721551284232012,
800
+ "step": 800
801
+ },
802
+ {
803
+ "loss": 0.6104,
804
+ "grad_norm": 0.33966347575187683,
805
+ "learning_rate": 8.539442730480061e-05,
806
+ "entropy": 0.6133037626743316,
807
+ "num_tokens": 9383976.0,
808
+ "mean_token_accuracy": 0.835821345448494,
809
+ "epoch": 0.27555706752849124,
810
+ "step": 810
811
+ },
812
+ {
813
+ "loss": 0.5883,
814
+ "grad_norm": 0.3680882155895233,
815
+ "learning_rate": 8.500160376597016e-05,
816
+ "entropy": 0.5889941453933716,
817
+ "num_tokens": 9490262.0,
818
+ "mean_token_accuracy": 0.8436674028635025,
819
+ "epoch": 0.27895900663378126,
820
+ "step": 820
821
+ },
822
+ {
823
+ "loss": 0.6011,
824
+ "grad_norm": 0.3962339758872986,
825
+ "learning_rate": 8.46044972418697e-05,
826
+ "entropy": 0.599892795085907,
827
+ "num_tokens": 9607939.0,
828
+ "mean_token_accuracy": 0.8401912897825241,
829
+ "epoch": 0.2823609457390713,
830
+ "step": 830
831
+ },
832
+ {
833
+ "loss": 0.6268,
834
+ "grad_norm": 0.3375062644481659,
835
+ "learning_rate": 8.420315632459817e-05,
836
+ "entropy": 0.6311625123023987,
837
+ "num_tokens": 9722935.0,
838
+ "mean_token_accuracy": 0.8338307291269302,
839
+ "epoch": 0.2857628848443613,
840
+ "step": 840
841
+ },
842
+ {
843
+ "loss": 0.638,
844
+ "grad_norm": 0.36880624294281006,
845
+ "learning_rate": 8.379763012439771e-05,
846
+ "entropy": 0.641839075088501,
847
+ "num_tokens": 9835999.0,
848
+ "mean_token_accuracy": 0.831251859664917,
849
+ "epoch": 0.2891648239496513,
850
+ "step": 850
851
+ },
852
+ {
853
+ "loss": 0.5823,
854
+ "grad_norm": 0.35837697982788086,
855
+ "learning_rate": 8.338796826364437e-05,
856
+ "entropy": 0.5842991501092911,
857
+ "num_tokens": 9946974.0,
858
+ "mean_token_accuracy": 0.8434822201728821,
859
+ "epoch": 0.29256676305494134,
860
+ "step": 860
861
+ },
862
+ {
863
+ "loss": 0.6356,
864
+ "grad_norm": 0.39383432269096375,
865
+ "learning_rate": 8.29742208707758e-05,
866
+ "entropy": 0.6413630664348602,
867
+ "num_tokens": 10067421.0,
868
+ "mean_token_accuracy": 0.8298591256141663,
869
+ "epoch": 0.29596870216023136,
870
+ "step": 870
871
+ },
872
+ {
873
+ "loss": 0.6434,
874
+ "grad_norm": 0.38471469283103943,
875
+ "learning_rate": 8.255643857415758e-05,
876
+ "entropy": 0.6492842510342598,
877
+ "num_tokens": 10172152.0,
878
+ "mean_token_accuracy": 0.8317259013652801,
879
+ "epoch": 0.2993706412655213,
880
+ "step": 880
881
+ },
882
+ {
883
+ "loss": 0.6135,
884
+ "grad_norm": 0.3366308808326721,
885
+ "learning_rate": 8.213467249588783e-05,
886
+ "entropy": 0.6186555951833725,
887
+ "num_tokens": 10290628.0,
888
+ "mean_token_accuracy": 0.833742293715477,
889
+ "epoch": 0.30277258037081134,
890
+ "step": 890
891
+ },
892
+ {
893
+ "loss": 0.5787,
894
+ "grad_norm": 0.40084564685821533,
895
+ "learning_rate": 8.170897424554171e-05,
896
+ "entropy": 0.5826229274272918,
897
+ "num_tokens": 10407951.0,
898
+ "mean_token_accuracy": 0.8439725250005722,
899
+ "epoch": 0.30617451947610136,
900
+ "step": 900
901
+ },
902
+ {
903
+ "loss": 0.628,
904
+ "grad_norm": 0.4093911051750183,
905
+ "learning_rate": 8.127939591385623e-05,
906
+ "entropy": 0.639702819287777,
907
+ "num_tokens": 10516552.0,
908
+ "mean_token_accuracy": 0.8321585863828659,
909
+ "epoch": 0.3095764585813914,
910
+ "step": 910
911
+ },
912
+ {
913
+ "loss": 0.6622,
914
+ "grad_norm": 0.3402785360813141,
915
+ "learning_rate": 8.084599006635602e-05,
916
+ "entropy": 0.6612581133842468,
917
+ "num_tokens": 10622685.0,
918
+ "mean_token_accuracy": 0.8280930787324905,
919
+ "epoch": 0.3129783976866814,
920
+ "step": 920
921
+ },
922
+ {
923
+ "loss": 0.5945,
924
+ "grad_norm": 0.381832093000412,
925
+ "learning_rate": 8.040880973692133e-05,
926
+ "entropy": 0.5946418136358261,
927
+ "num_tokens": 10740084.0,
928
+ "mean_token_accuracy": 0.8418374478816986,
929
+ "epoch": 0.3163803367919714,
930
+ "step": 930
931
+ },
932
+ {
933
+ "loss": 0.6012,
934
+ "grad_norm": 0.36465200781822205,
935
+ "learning_rate": 7.996790842129834e-05,
936
+ "entropy": 0.6030470594763756,
937
+ "num_tokens": 10858519.0,
938
+ "mean_token_accuracy": 0.8368221133947372,
939
+ "epoch": 0.31978227589726144,
940
+ "step": 940
941
+ },
942
+ {
943
+ "loss": 0.6109,
944
+ "grad_norm": 0.34183382987976074,
945
+ "learning_rate": 7.952334007055317e-05,
946
+ "entropy": 0.6180869042873383,
947
+ "num_tokens": 10982717.0,
948
+ "mean_token_accuracy": 0.835748416185379,
949
+ "epoch": 0.32318421500255146,
950
+ "step": 950
951
+ },
952
+ {
953
+ "loss": 0.6427,
954
+ "grad_norm": 0.3375698924064636,
955
+ "learning_rate": 7.907515908447026e-05,
956
+ "entropy": 0.6360474318265915,
957
+ "num_tokens": 11109821.0,
958
+ "mean_token_accuracy": 0.8303357720375061,
959
+ "epoch": 0.3265861541078415,
960
+ "step": 960
961
+ },
962
+ {
963
+ "loss": 0.6125,
964
+ "grad_norm": 0.33526158332824707,
965
+ "learning_rate": 7.862342030489548e-05,
966
+ "entropy": 0.6055434793233871,
967
+ "num_tokens": 11222141.0,
968
+ "mean_token_accuracy": 0.839334124326706,
969
+ "epoch": 0.3299880932131315,
970
+ "step": 970
971
+ },
972
+ {
973
+ "loss": 0.601,
974
+ "grad_norm": 0.40864965319633484,
975
+ "learning_rate": 7.816817900902569e-05,
976
+ "entropy": 0.6122152507305145,
977
+ "num_tokens": 11334721.0,
978
+ "mean_token_accuracy": 0.8393468946218491,
979
+ "epoch": 0.3333900323184215,
980
+ "step": 980
981
+ },
982
+ {
983
+ "loss": 0.6341,
984
+ "grad_norm": 0.35346099734306335,
985
+ "learning_rate": 7.77094909026444e-05,
986
+ "entropy": 0.6331113621592521,
987
+ "num_tokens": 11442226.0,
988
+ "mean_token_accuracy": 0.8354059338569642,
989
+ "epoch": 0.33679197142371153,
990
+ "step": 990
991
+ },
992
+ {
993
+ "loss": 0.6021,
994
+ "grad_norm": 0.3349364101886749,
995
+ "learning_rate": 7.724741211330561e-05,
996
+ "entropy": 0.609099206328392,
997
+ "num_tokens": 11559196.0,
998
+ "mean_token_accuracy": 0.8378647118806839,
999
+ "epoch": 0.34019391052900155,
1000
+ "step": 1000
1001
+ },
1002
+ {
1003
+ "loss": 0.596,
1004
+ "grad_norm": 0.3482501804828644,
1005
+ "learning_rate": 7.67819991834655e-05,
1006
+ "entropy": 0.6021771982312203,
1007
+ "num_tokens": 11674831.0,
1008
+ "mean_token_accuracy": 0.8393041074275971,
1009
+ "epoch": 0.3435958496342916,
1010
+ "step": 1010
1011
+ },
1012
+ {
1013
+ "loss": 0.5712,
1014
+ "grad_norm": 0.3629717528820038,
1015
+ "learning_rate": 7.631330906356371e-05,
1016
+ "entropy": 0.5844026952981949,
1017
+ "num_tokens": 11795718.0,
1018
+ "mean_token_accuracy": 0.8434269249439239,
1019
+ "epoch": 0.34699778873958154,
1020
+ "step": 1020
1021
+ },
1022
+ {
1023
+ "loss": 0.6058,
1024
+ "grad_norm": 0.3741706311702728,
1025
+ "learning_rate": 7.584139910505458e-05,
1026
+ "entropy": 0.6056048899888993,
1027
+ "num_tokens": 11911642.0,
1028
+ "mean_token_accuracy": 0.8380466908216476,
1029
+ "epoch": 0.35039972784487156,
1030
+ "step": 1030
1031
+ },
1032
+ {
1033
+ "loss": 0.5877,
1034
+ "grad_norm": 0.30570724606513977,
1035
+ "learning_rate": 7.536632705338929e-05,
1036
+ "entropy": 0.5891546666622162,
1037
+ "num_tokens": 12029598.0,
1038
+ "mean_token_accuracy": 0.8402005821466446,
1039
+ "epoch": 0.3538016669501616,
1040
+ "step": 1040
1041
+ },
1042
+ {
1043
+ "loss": 0.6213,
1044
+ "grad_norm": 0.362284779548645,
1045
+ "learning_rate": 7.488815104094977e-05,
1046
+ "entropy": 0.6238945558667183,
1047
+ "num_tokens": 12138307.0,
1048
+ "mean_token_accuracy": 0.8360639154911041,
1049
+ "epoch": 0.3572036060554516,
1050
+ "step": 1050
1051
+ },
1052
+ {
1053
+ "loss": 0.575,
1054
+ "grad_norm": 0.33959442377090454,
1055
+ "learning_rate": 7.440692957993543e-05,
1056
+ "entropy": 0.5791127294301986,
1057
+ "num_tokens": 12253996.0,
1058
+ "mean_token_accuracy": 0.8450855612754822,
1059
+ "epoch": 0.3606055451607416,
1060
+ "step": 1060
1061
+ },
1062
+ {
1063
+ "loss": 0.6387,
1064
+ "grad_norm": 0.35954946279525757,
1065
+ "learning_rate": 7.392272155520313e-05,
1066
+ "entropy": 0.6434630990028382,
1067
+ "num_tokens": 12367363.0,
1068
+ "mean_token_accuracy": 0.8307264894247055,
1069
+ "epoch": 0.36400748426603163,
1070
+ "step": 1070
1071
+ },
1072
+ {
1073
+ "loss": 0.6154,
1074
+ "grad_norm": 0.3226819932460785,
1075
+ "learning_rate": 7.343558621706186e-05,
1076
+ "entropy": 0.6265482395887375,
1077
+ "num_tokens": 12487636.0,
1078
+ "mean_token_accuracy": 0.8338571727275849,
1079
+ "epoch": 0.36740942337132165,
1080
+ "step": 1080
1081
+ },
1082
+ {
1083
+ "loss": 0.6088,
1084
+ "grad_norm": 0.3557349145412445,
1085
+ "learning_rate": 7.294558317402249e-05,
1086
+ "entropy": 0.6144857376813888,
1087
+ "num_tokens": 12598626.0,
1088
+ "mean_token_accuracy": 0.8359712421894073,
1089
+ "epoch": 0.3708113624766117,
1090
+ "step": 1090
1091
+ },
1092
+ {
1093
+ "loss": 0.6009,
1094
+ "grad_norm": 0.3251045346260071,
1095
+ "learning_rate": 7.24527723855037e-05,
1096
+ "entropy": 0.6028687685728074,
1097
+ "num_tokens": 12711531.0,
1098
+ "mean_token_accuracy": 0.8390907049179077,
1099
+ "epoch": 0.3742133015819017,
1100
+ "step": 1100
1101
+ },
1102
+ {
1103
+ "loss": 0.6078,
1104
+ "grad_norm": 0.3440212905406952,
1105
+ "learning_rate": 7.195721415449508e-05,
1106
+ "entropy": 0.61930713057518,
1107
+ "num_tokens": 12839137.0,
1108
+ "mean_token_accuracy": 0.8350177526473999,
1109
+ "epoch": 0.3776152406871917,
1110
+ "step": 1110
1111
+ },
1112
+ {
1113
+ "loss": 0.626,
1114
+ "grad_norm": 0.3647129535675049,
1115
+ "learning_rate": 7.14589691201782e-05,
1116
+ "entropy": 0.6314490288496017,
1117
+ "num_tokens": 12947358.0,
1118
+ "mean_token_accuracy": 0.8336542189121247,
1119
+ "epoch": 0.38101717979248173,
1120
+ "step": 1120
1121
+ },
1122
+ {
1123
+ "loss": 0.5787,
1124
+ "grad_norm": 0.3373214304447174,
1125
+ "learning_rate": 7.095809825050626e-05,
1126
+ "entropy": 0.5762265086174011,
1127
+ "num_tokens": 13066371.0,
1128
+ "mean_token_accuracy": 0.8430596113204956,
1129
+ "epoch": 0.38441911889777175,
1130
+ "step": 1130
1131
+ },
1132
+ {
1133
+ "loss": 0.6219,
1134
+ "grad_norm": 0.3417420983314514,
1135
+ "learning_rate": 7.045466283474397e-05,
1136
+ "entropy": 0.6200504913926125,
1137
+ "num_tokens": 13187362.0,
1138
+ "mean_token_accuracy": 0.8362764894962311,
1139
+ "epoch": 0.38782105800306177,
1140
+ "step": 1140
1141
+ },
1142
+ {
1143
+ "loss": 0.6018,
1144
+ "grad_norm": 0.37871527671813965,
1145
+ "learning_rate": 6.99487244759677e-05,
1146
+ "entropy": 0.6063272505998611,
1147
+ "num_tokens": 13299146.0,
1148
+ "mean_token_accuracy": 0.8369143933057785,
1149
+ "epoch": 0.39122299710835173,
1150
+ "step": 1150
1151
+ },
1152
+ {
1153
+ "loss": 0.6311,
1154
+ "grad_norm": 0.33044958114624023,
1155
+ "learning_rate": 6.944034508352745e-05,
1156
+ "entropy": 0.6377698928117752,
1157
+ "num_tokens": 13418154.0,
1158
+ "mean_token_accuracy": 0.831796658039093,
1159
+ "epoch": 0.39462493621364175,
1160
+ "step": 1160
1161
+ },
1162
+ {
1163
+ "loss": 0.5661,
1164
+ "grad_norm": 0.34921640157699585,
1165
+ "learning_rate": 6.892958686547129e-05,
1166
+ "entropy": 0.5706936493515968,
1167
+ "num_tokens": 13532234.0,
1168
+ "mean_token_accuracy": 0.8478675246238708,
1169
+ "epoch": 0.39802687531893177,
1170
+ "step": 1170
1171
+ },
1172
+ {
1173
+ "loss": 0.6017,
1174
+ "grad_norm": 0.3906354010105133,
1175
+ "learning_rate": 6.841651232093321e-05,
1176
+ "entropy": 0.604531979560852,
1177
+ "num_tokens": 13647265.0,
1178
+ "mean_token_accuracy": 0.8396757930517197,
1179
+ "epoch": 0.4014288144242218,
1180
+ "step": 1180
1181
+ },
1182
+ {
1183
+ "loss": 0.5594,
1184
+ "grad_norm": 0.35730934143066406,
1185
+ "learning_rate": 6.790118423248545e-05,
1186
+ "entropy": 0.5594036534428597,
1187
+ "num_tokens": 13762753.0,
1188
+ "mean_token_accuracy": 0.8494072288274765,
1189
+ "epoch": 0.4048307535295118,
1190
+ "step": 1190
1191
+ },
1192
+ {
1193
+ "loss": 0.6035,
1194
+ "grad_norm": 0.40569987893104553,
1195
+ "learning_rate": 6.738366565845609e-05,
1196
+ "entropy": 0.6062624737620353,
1197
+ "num_tokens": 13887573.0,
1198
+ "mean_token_accuracy": 0.8375277072191238,
1199
+ "epoch": 0.40823269263480183,
1200
+ "step": 1200
1201
+ },
1202
+ {
1203
+ "loss": 0.6,
1204
+ "grad_norm": 0.3749787509441376,
1205
+ "learning_rate": 6.686401992521274e-05,
1206
+ "entropy": 0.5981257140636445,
1207
+ "num_tokens": 14009093.0,
1208
+ "mean_token_accuracy": 0.839872732758522,
1209
+ "epoch": 0.41163463174009185,
1210
+ "step": 1210
1211
+ },
1212
+ {
1213
+ "loss": 0.5752,
1214
+ "grad_norm": 0.34199485182762146,
1215
+ "learning_rate": 6.634231061941383e-05,
1216
+ "entropy": 0.5763212725520134,
1217
+ "num_tokens": 14133014.0,
1218
+ "mean_token_accuracy": 0.8439627707004547,
1219
+ "epoch": 0.41503657084538187,
1220
+ "step": 1220
1221
+ },
1222
+ {
1223
+ "loss": 0.5881,
1224
+ "grad_norm": 0.38227760791778564,
1225
+ "learning_rate": 6.581860158022757e-05,
1226
+ "entropy": 0.5911096960306168,
1227
+ "num_tokens": 14243866.0,
1228
+ "mean_token_accuracy": 0.8422502607107163,
1229
+ "epoch": 0.4184385099506719,
1230
+ "step": 1230
1231
+ },
1232
+ {
1233
+ "loss": 0.5924,
1234
+ "grad_norm": 0.31732606887817383,
1235
+ "learning_rate": 6.529295689152042e-05,
1236
+ "entropy": 0.6007756769657135,
1237
+ "num_tokens": 14354926.0,
1238
+ "mean_token_accuracy": 0.841707193851471,
1239
+ "epoch": 0.4218404490559619,
1240
+ "step": 1240
1241
+ },
1242
+ {
1243
+ "loss": 0.564,
1244
+ "grad_norm": 0.3495018780231476,
1245
+ "learning_rate": 6.476544087401532e-05,
1246
+ "entropy": 0.5664652720093727,
1247
+ "num_tokens": 14472132.0,
1248
+ "mean_token_accuracy": 0.845953956246376,
1249
+ "epoch": 0.4252423881612519,
1250
+ "step": 1250
1251
+ },
1252
+ {
1253
+ "loss": 0.6371,
1254
+ "grad_norm": 0.39551210403442383,
1255
+ "learning_rate": 6.423611807742116e-05,
1256
+ "entropy": 0.6342313542962075,
1257
+ "num_tokens": 14595618.0,
1258
+ "mean_token_accuracy": 0.8320615291595459,
1259
+ "epoch": 0.42864432726654195,
1260
+ "step": 1260
1261
+ },
1262
+ {
1263
+ "loss": 0.5862,
1264
+ "grad_norm": 0.3622702360153198,
1265
+ "learning_rate": 6.3705053272534e-05,
1266
+ "entropy": 0.5873361572623252,
1267
+ "num_tokens": 14715491.0,
1268
+ "mean_token_accuracy": 0.8399784296751023,
1269
+ "epoch": 0.43204626637183197,
1270
+ "step": 1270
1271
+ },
1272
+ {
1273
+ "loss": 0.5878,
1274
+ "grad_norm": 0.3421778082847595,
1275
+ "learning_rate": 6.317231144331153e-05,
1276
+ "entropy": 0.591562668979168,
1277
+ "num_tokens": 14829063.0,
1278
+ "mean_token_accuracy": 0.8422017335891724,
1279
+ "epoch": 0.435448205477122,
1280
+ "step": 1280
1281
+ },
1282
+ {
1283
+ "loss": 0.556,
1284
+ "grad_norm": 0.33399707078933716,
1285
+ "learning_rate": 6.263795777892115e-05,
1286
+ "entropy": 0.5609270349144936,
1287
+ "num_tokens": 14942097.0,
1288
+ "mean_token_accuracy": 0.8491690754890442,
1289
+ "epoch": 0.43885014458241195,
1290
+ "step": 1290
1291
+ },
1292
+ {
1293
+ "loss": 0.598,
1294
+ "grad_norm": 0.33439555764198303,
1295
+ "learning_rate": 6.210205766576308e-05,
1296
+ "entropy": 0.6013634830713273,
1297
+ "num_tokens": 15063270.0,
1298
+ "mean_token_accuracy": 0.8379765659570694,
1299
+ "epoch": 0.44225208368770197,
1300
+ "step": 1300
1301
+ },
1302
+ {
1303
+ "loss": 0.5739,
1304
+ "grad_norm": 0.34527984261512756,
1305
+ "learning_rate": 6.156467667946944e-05,
1306
+ "entropy": 0.5805606454610824,
1307
+ "num_tokens": 15182142.0,
1308
+ "mean_token_accuracy": 0.8448573082685471,
1309
+ "epoch": 0.445654022792992,
1310
+ "step": 1310
1311
+ },
1312
+ {
1313
+ "loss": 0.6355,
1314
+ "grad_norm": 0.32778117060661316,
1315
+ "learning_rate": 6.1025880576879934e-05,
1316
+ "entropy": 0.6364081218838692,
1317
+ "num_tokens": 15301327.0,
1318
+ "mean_token_accuracy": 0.8308557868003845,
1319
+ "epoch": 0.449055961898282,
1320
+ "step": 1320
1321
+ },
1322
+ {
1323
+ "loss": 0.6181,
1324
+ "grad_norm": 0.34031936526298523,
1325
+ "learning_rate": 6.048573528799556e-05,
1326
+ "entropy": 0.6205928295850753,
1327
+ "num_tokens": 15420327.0,
1328
+ "mean_token_accuracy": 0.834335333108902,
1329
+ "epoch": 0.452457901003572,
1330
+ "step": 1330
1331
+ },
1332
+ {
1333
+ "loss": 0.5708,
1334
+ "grad_norm": 0.40162762999534607,
1335
+ "learning_rate": 5.994430690791102e-05,
1336
+ "entropy": 0.572858938574791,
1337
+ "num_tokens": 15527631.0,
1338
+ "mean_token_accuracy": 0.8488004267215729,
1339
+ "epoch": 0.45585984010886205,
1340
+ "step": 1340
1341
+ },
1342
+ {
1343
+ "loss": 0.6002,
1344
+ "grad_norm": 0.3677639961242676,
1345
+ "learning_rate": 5.9401661688726986e-05,
1346
+ "entropy": 0.6025476709008217,
1347
+ "num_tokens": 15645227.0,
1348
+ "mean_token_accuracy": 0.8378155291080475,
1349
+ "epoch": 0.45926177921415207,
1350
+ "step": 1350
1351
+ },
1352
+ {
1353
+ "loss": 0.6391,
1354
+ "grad_norm": 0.3611242175102234,
1355
+ "learning_rate": 5.8857866031443155e-05,
1356
+ "entropy": 0.6453688651323318,
1357
+ "num_tokens": 15755942.0,
1358
+ "mean_token_accuracy": 0.8302028566598892,
1359
+ "epoch": 0.4626637183194421,
1360
+ "step": 1360
1361
+ },
1362
+ {
1363
+ "loss": 0.5891,
1364
+ "grad_norm": 0.3595288097858429,
1365
+ "learning_rate": 5.8312986477833035e-05,
1366
+ "entropy": 0.6027933269739151,
1367
+ "num_tokens": 15862604.0,
1368
+ "mean_token_accuracy": 0.842942762374878,
1369
+ "epoch": 0.4660656574247321,
1370
+ "step": 1370
1371
+ },
1372
+ {
1373
+ "loss": 0.5901,
1374
+ "grad_norm": 0.3880114257335663,
1375
+ "learning_rate": 5.7767089702301526e-05,
1376
+ "entropy": 0.5783761963248253,
1377
+ "num_tokens": 15974317.0,
1378
+ "mean_token_accuracy": 0.8423945069313049,
1379
+ "epoch": 0.4694675965300221,
1380
+ "step": 1380
1381
+ },
1382
+ {
1383
+ "loss": 0.5735,
1384
+ "grad_norm": 0.33461010456085205,
1385
+ "learning_rate": 5.722024250372632e-05,
1386
+ "entropy": 0.5808866828680038,
1387
+ "num_tokens": 16083489.0,
1388
+ "mean_token_accuracy": 0.8458300411701203,
1389
+ "epoch": 0.47286953563531214,
1390
+ "step": 1390
1391
+ },
1392
+ {
1393
+ "loss": 0.5967,
1394
+ "grad_norm": 0.3592625856399536,
1395
+ "learning_rate": 5.667251179728398e-05,
1396
+ "entropy": 0.5973397344350815,
1397
+ "num_tokens": 16209893.0,
1398
+ "mean_token_accuracy": 0.8369288891553879,
1399
+ "epoch": 0.47627147474060216,
1400
+ "step": 1400
1401
+ },
1402
+ {
1403
+ "loss": 0.6051,
1404
+ "grad_norm": 0.356492817401886,
1405
+ "learning_rate": 5.612396460626188e-05,
1406
+ "entropy": 0.6047371000051498,
1407
+ "num_tokens": 16328112.0,
1408
+ "mean_token_accuracy": 0.8397100865840912,
1409
+ "epoch": 0.4796734138458922,
1410
+ "step": 1410
1411
+ },
1412
+ {
1413
+ "loss": 0.6526,
1414
+ "grad_norm": 0.3834919333457947,
1415
+ "learning_rate": 5.557466805385685e-05,
1416
+ "entropy": 0.6492822825908661,
1417
+ "num_tokens": 16439800.0,
1418
+ "mean_token_accuracy": 0.829655796289444,
1419
+ "epoch": 0.48307535295118215,
1420
+ "step": 1420
1421
+ },
1422
+ {
1423
+ "loss": 0.5926,
1424
+ "grad_norm": 0.35866421461105347,
1425
+ "learning_rate": 5.502468935496164e-05,
1426
+ "entropy": 0.6050204545259475,
1427
+ "num_tokens": 16553741.0,
1428
+ "mean_token_accuracy": 0.8413259029388428,
1429
+ "epoch": 0.48647729205647217,
1430
+ "step": 1430
1431
+ },
1432
+ {
1433
+ "loss": 0.5734,
1434
+ "grad_norm": 0.3319409191608429,
1435
+ "learning_rate": 5.4474095807940094e-05,
1436
+ "entropy": 0.5755202546715736,
1437
+ "num_tokens": 16673973.0,
1438
+ "mean_token_accuracy": 0.8435852289199829,
1439
+ "epoch": 0.4898792311617622,
1440
+ "step": 1440
1441
+ },
1442
+ {
1443
+ "loss": 0.586,
1444
+ "grad_norm": 0.3485705554485321,
1445
+ "learning_rate": 5.392295478639225e-05,
1446
+ "entropy": 0.5858001261949539,
1447
+ "num_tokens": 16783732.0,
1448
+ "mean_token_accuracy": 0.8417124092578888,
1449
+ "epoch": 0.4932811702670522,
1450
+ "step": 1450
1451
+ },
1452
+ {
1453
+ "loss": 0.612,
1454
+ "grad_norm": 0.3999135494232178,
1455
+ "learning_rate": 5.3371333730910035e-05,
1456
+ "entropy": 0.6123771637678146,
1457
+ "num_tokens": 16899108.0,
1458
+ "mean_token_accuracy": 0.8361808478832244,
1459
+ "epoch": 0.4966831093723422,
1460
+ "step": 1460
1461
+ },
1462
+ {
1463
+ "loss": 0.5768,
1464
+ "grad_norm": 0.37186917662620544,
1465
+ "learning_rate": 5.2819300140824924e-05,
1466
+ "entropy": 0.5769275650382042,
1467
+ "num_tokens": 17015498.0,
1468
+ "mean_token_accuracy": 0.8437288701534271,
1469
+ "epoch": 0.5000850484776322,
1470
+ "step": 1470
1471
+ },
1472
+ {
1473
+ "loss": 0.6068,
1474
+ "grad_norm": 0.36763256788253784,
1475
+ "learning_rate": 5.22669215659484e-05,
1476
+ "entropy": 0.6088968113064765,
1477
+ "num_tokens": 17122826.0,
1478
+ "mean_token_accuracy": 0.8394175559282303,
1479
+ "epoch": 0.5034869875829223,
1480
+ "step": 1480
1481
+ },
1482
+ {
1483
+ "loss": 0.6024,
1484
+ "grad_norm": 0.33930572867393494,
1485
+ "learning_rate": 5.17142655983061e-05,
1486
+ "entropy": 0.6036048322916031,
1487
+ "num_tokens": 17248073.0,
1488
+ "mean_token_accuracy": 0.838006791472435,
1489
+ "epoch": 0.5068889266882123,
1490
+ "step": 1490
1491
+ },
1492
+ {
1493
+ "loss": 0.5755,
1494
+ "grad_norm": 0.355539470911026,
1495
+ "learning_rate": 5.116139986386695e-05,
1496
+ "entropy": 0.57216976583004,
1497
+ "num_tokens": 17353849.0,
1498
+ "mean_token_accuracy": 0.8462014138698578,
1499
+ "epoch": 0.5102908657935022,
1500
+ "step": 1500
1501
+ },
1502
+ {
1503
+ "loss": 0.5785,
1504
+ "grad_norm": 0.3130170702934265,
1505
+ "learning_rate": 5.0608392014268093e-05,
1506
+ "entropy": 0.5841953173279762,
1507
+ "num_tokens": 17465362.0,
1508
+ "mean_token_accuracy": 0.8455700755119324,
1509
+ "epoch": 0.5136928048987923,
1510
+ "step": 1510
1511
+ },
1512
+ {
1513
+ "loss": 0.5854,
1514
+ "grad_norm": 0.34390658140182495,
1515
+ "learning_rate": 5.005530971853661e-05,
1516
+ "entropy": 0.588604536652565,
1517
+ "num_tokens": 17583990.0,
1518
+ "mean_token_accuracy": 0.8393468260765076,
1519
+ "epoch": 0.5170947440040823,
1520
+ "step": 1520
1521
+ },
1522
+ {
1523
+ "loss": 0.5412,
1524
+ "grad_norm": 0.3938881456851959,
1525
+ "learning_rate": 4.950222065480926e-05,
1526
+ "entropy": 0.5458601832389831,
1527
+ "num_tokens": 17695756.0,
1528
+ "mean_token_accuracy": 0.8516582667827606,
1529
+ "epoch": 0.5204966831093724,
1530
+ "step": 1530
1531
+ },
1532
+ {
1533
+ "loss": 0.6066,
1534
+ "grad_norm": 0.36709338426589966,
1535
+ "learning_rate": 4.894919250205095e-05,
1536
+ "entropy": 0.6080822870135307,
1537
+ "num_tokens": 17819124.0,
1538
+ "mean_token_accuracy": 0.8351925998926163,
1539
+ "epoch": 0.5238986222146623,
1540
+ "step": 1540
1541
+ },
1542
+ {
1543
+ "loss": 0.603,
1544
+ "grad_norm": 0.3591138422489166,
1545
+ "learning_rate": 4.8396292931773194e-05,
1546
+ "entropy": 0.6067529514431953,
1547
+ "num_tokens": 17942151.0,
1548
+ "mean_token_accuracy": 0.8357031852006912,
1549
+ "epoch": 0.5273005613199524,
1550
+ "step": 1550
1551
+ },
1552
+ {
1553
+ "loss": 0.5889,
1554
+ "grad_norm": 0.32743504643440247,
1555
+ "learning_rate": 4.7843589599753436e-05,
1556
+ "entropy": 0.5867392048239708,
1557
+ "num_tokens": 18051775.0,
1558
+ "mean_token_accuracy": 0.8451169669628144,
1559
+ "epoch": 0.5307025004252424,
1560
+ "step": 1560
1561
+ },
1562
+ {
1563
+ "loss": 0.5827,
1564
+ "grad_norm": 0.34111034870147705,
1565
+ "learning_rate": 4.729115013775639e-05,
1566
+ "entropy": 0.5817334815859795,
1567
+ "num_tokens": 18174207.0,
1568
+ "mean_token_accuracy": 0.8418365478515625,
1569
+ "epoch": 0.5341044395305324,
1570
+ "step": 1570
1571
+ },
1572
+ {
1573
+ "loss": 0.5605,
1574
+ "grad_norm": 0.40375614166259766,
1575
+ "learning_rate": 4.673904214525817e-05,
1576
+ "entropy": 0.5690050527453423,
1577
+ "num_tokens": 18296847.0,
1578
+ "mean_token_accuracy": 0.8457064211368561,
1579
+ "epoch": 0.5375063786358224,
1580
+ "step": 1580
1581
+ },
1582
+ {
1583
+ "loss": 0.6214,
1584
+ "grad_norm": 0.34886664152145386,
1585
+ "learning_rate": 4.618733318117453e-05,
1586
+ "entropy": 0.613763065636158,
1587
+ "num_tokens": 18411175.0,
1588
+ "mean_token_accuracy": 0.8373794168233871,
1589
+ "epoch": 0.5409083177411125,
1590
+ "step": 1590
1591
+ },
1592
+ {
1593
+ "loss": 0.5932,
1594
+ "grad_norm": 0.4052237868309021,
1595
+ "learning_rate": 4.56360907555939e-05,
1596
+ "entropy": 0.5990845367312432,
1597
+ "num_tokens": 18523679.0,
1598
+ "mean_token_accuracy": 0.8415322303771973,
1599
+ "epoch": 0.5443102568464024,
1600
+ "step": 1600
1601
+ },
1602
+ {
1603
+ "loss": 0.5587,
1604
+ "grad_norm": 0.33428770303726196,
1605
+ "learning_rate": 4.508538232151659e-05,
1606
+ "entropy": 0.5686810597777366,
1607
+ "num_tokens": 18638795.0,
1608
+ "mean_token_accuracy": 0.8485085606575012,
1609
+ "epoch": 0.5477121959516925,
1610
+ "step": 1610
1611
+ },
1612
+ {
1613
+ "loss": 0.5528,
1614
+ "grad_norm": 0.3429810106754303,
1615
+ "learning_rate": 4.453527526660079e-05,
1616
+ "entropy": 0.5541219785809517,
1617
+ "num_tokens": 18757690.0,
1618
+ "mean_token_accuracy": 0.8475034803152084,
1619
+ "epoch": 0.5511141350569825,
1620
+ "step": 1620
1621
+ },
1622
+ {
1623
+ "loss": 0.5896,
1624
+ "grad_norm": 0.36002102494239807,
1625
+ "learning_rate": 4.398583690491673e-05,
1626
+ "entropy": 0.594765892624855,
1627
+ "num_tokens": 18870237.0,
1628
+ "mean_token_accuracy": 0.841697883605957,
1629
+ "epoch": 0.5545160741622724,
1630
+ "step": 1630
1631
+ },
1632
+ {
1633
+ "loss": 0.5583,
1634
+ "grad_norm": 0.3443513810634613,
1635
+ "learning_rate": 4.3437134468709684e-05,
1636
+ "entropy": 0.5576672151684761,
1637
+ "num_tokens": 18989654.0,
1638
+ "mean_token_accuracy": 0.8470352560281753,
1639
+ "epoch": 0.5579180132675625,
1640
+ "step": 1640
1641
+ },
1642
+ {
1643
+ "loss": 0.5715,
1644
+ "grad_norm": 0.3210230767726898,
1645
+ "learning_rate": 4.2889235100173117e-05,
1646
+ "entropy": 0.5803326085209847,
1647
+ "num_tokens": 19111873.0,
1648
+ "mean_token_accuracy": 0.8444038182497025,
1649
+ "epoch": 0.5613199523728525,
1650
+ "step": 1650
1651
+ },
1652
+ {
1653
+ "loss": 0.579,
1654
+ "grad_norm": 0.3320671617984772,
1655
+ "learning_rate": 4.2342205843232815e-05,
1656
+ "entropy": 0.5725411295890808,
1657
+ "num_tokens": 19227912.0,
1658
+ "mean_token_accuracy": 0.8457438528537751,
1659
+ "epoch": 0.5647218914781426,
1660
+ "step": 1660
1661
+ },
1662
+ {
1663
+ "loss": 0.587,
1664
+ "grad_norm": 0.38086163997650146,
1665
+ "learning_rate": 4.1796113635342995e-05,
1666
+ "entropy": 0.5874329909682274,
1667
+ "num_tokens": 19349977.0,
1668
+ "mean_token_accuracy": 0.840248167514801,
1669
+ "epoch": 0.5681238305834325,
1670
+ "step": 1670
1671
+ },
1672
+ {
1673
+ "loss": 0.5847,
1674
+ "grad_norm": 0.3494766652584076,
1675
+ "learning_rate": 4.1251025299295484e-05,
1676
+ "entropy": 0.5962051749229431,
1677
+ "num_tokens": 19463971.0,
1678
+ "mean_token_accuracy": 0.8411266356706619,
1679
+ "epoch": 0.5715257696887226,
1680
+ "step": 1680
1681
+ },
1682
+ {
1683
+ "loss": 0.5886,
1684
+ "grad_norm": 0.34530606865882874,
1685
+ "learning_rate": 4.0707007535042965e-05,
1686
+ "entropy": 0.5932681888341904,
1687
+ "num_tokens": 19584720.0,
1688
+ "mean_token_accuracy": 0.8420757114887237,
1689
+ "epoch": 0.5749277087940126,
1690
+ "step": 1690
1691
+ },
1692
+ {
1693
+ "loss": 0.6255,
1694
+ "grad_norm": 0.39939698576927185,
1695
+ "learning_rate": 4.0164126911537124e-05,
1696
+ "entropy": 0.6298390612006187,
1697
+ "num_tokens": 19707943.0,
1698
+ "mean_token_accuracy": 0.8327343821525574,
1699
+ "epoch": 0.5783296478993026,
1700
+ "step": 1700
1701
+ },
1702
+ {
1703
+ "loss": 0.5829,
1704
+ "grad_norm": 0.39982014894485474,
1705
+ "learning_rate": 3.9622449858583e-05,
1706
+ "entropy": 0.5805615946650505,
1707
+ "num_tokens": 19821092.0,
1708
+ "mean_token_accuracy": 0.8441908627748489,
1709
+ "epoch": 0.5817315870045926,
1710
+ "step": 1710
1711
+ },
1712
+ {
1713
+ "loss": 0.5546,
1714
+ "grad_norm": 0.3315586447715759,
1715
+ "learning_rate": 3.908204265871024e-05,
1716
+ "entropy": 0.5536176040768623,
1717
+ "num_tokens": 19926967.0,
1718
+ "mean_token_accuracy": 0.8505498200654984,
1719
+ "epoch": 0.5851335261098827,
1720
+ "step": 1720
1721
+ },
1722
+ {
1723
+ "loss": 0.6144,
1724
+ "grad_norm": 0.38606083393096924,
1725
+ "learning_rate": 3.8542971439062375e-05,
1726
+ "entropy": 0.6170039817690849,
1727
+ "num_tokens": 20050273.0,
1728
+ "mean_token_accuracy": 0.8357849180698395,
1729
+ "epoch": 0.5885354652151726,
1730
+ "step": 1730
1731
+ },
1732
+ {
1733
+ "loss": 0.6018,
1734
+ "grad_norm": 0.35552018880844116,
1735
+ "learning_rate": 3.800530216330522e-05,
1736
+ "entropy": 0.6068514689803124,
1737
+ "num_tokens": 20168367.0,
1738
+ "mean_token_accuracy": 0.8391161382198333,
1739
+ "epoch": 0.5919374043204627,
1740
+ "step": 1740
1741
+ },
1742
+ {
1743
+ "loss": 0.6014,
1744
+ "grad_norm": 0.408333957195282,
1745
+ "learning_rate": 3.746910062355514e-05,
1746
+ "entropy": 0.6115496993064881,
1747
+ "num_tokens": 20275208.0,
1748
+ "mean_token_accuracy": 0.837997031211853,
1749
+ "epoch": 0.5953393434257527,
1750
+ "step": 1750
1751
+ },
1752
+ {
1753
+ "loss": 0.5656,
1754
+ "grad_norm": 0.3675805628299713,
1755
+ "learning_rate": 3.693443243232839e-05,
1756
+ "entropy": 0.5659057840704917,
1757
+ "num_tokens": 20395513.0,
1758
+ "mean_token_accuracy": 0.846270963549614,
1759
+ "epoch": 0.5987412825310426,
1760
+ "step": 1760
1761
+ },
1762
+ {
1763
+ "loss": 0.5876,
1764
+ "grad_norm": 0.34130120277404785,
1765
+ "learning_rate": 3.6401363014512465e-05,
1766
+ "entropy": 0.5852897882461547,
1767
+ "num_tokens": 20513840.0,
1768
+ "mean_token_accuracy": 0.8406417429447174,
1769
+ "epoch": 0.6021432216363327,
1770
+ "step": 1770
1771
+ },
1772
+ {
1773
+ "loss": 0.5846,
1774
+ "grad_norm": 0.3448680341243744,
1775
+ "learning_rate": 3.586995759936027e-05,
1776
+ "entropy": 0.5861192226409913,
1777
+ "num_tokens": 20626418.0,
1778
+ "mean_token_accuracy": 0.8443343281745911,
1779
+ "epoch": 0.6055451607416227,
1780
+ "step": 1780
1781
+ },
1782
+ {
1783
+ "loss": 0.5629,
1784
+ "grad_norm": 0.33498626947402954,
1785
+ "learning_rate": 3.5340281212508355e-05,
1786
+ "entropy": 0.5744142100214958,
1787
+ "num_tokens": 20743700.0,
1788
+ "mean_token_accuracy": 0.8447044044733047,
1789
+ "epoch": 0.6089470998469128,
1790
+ "step": 1790
1791
+ },
1792
+ {
1793
+ "loss": 0.5491,
1794
+ "grad_norm": 0.3723412752151489,
1795
+ "learning_rate": 3.481239866802003e-05,
1796
+ "entropy": 0.5503455892205238,
1797
+ "num_tokens": 20848890.0,
1798
+ "mean_token_accuracy": 0.8521771311759949,
1799
+ "epoch": 0.6123490389522027,
1800
+ "step": 1800
1801
+ },
1802
+ {
1803
+ "loss": 0.5728,
1804
+ "grad_norm": 0.37624675035476685,
1805
+ "learning_rate": 3.428637456045438e-05,
1806
+ "entropy": 0.5796245783567429,
1807
+ "num_tokens": 20959919.0,
1808
+ "mean_token_accuracy": 0.8451456815004349,
1809
+ "epoch": 0.6157509780574928,
1810
+ "step": 1810
1811
+ },
1812
+ {
1813
+ "loss": 0.6005,
1814
+ "grad_norm": 0.4184034466743469,
1815
+ "learning_rate": 3.3762273256962115e-05,
1816
+ "entropy": 0.6072939783334732,
1817
+ "num_tokens": 21068427.0,
1818
+ "mean_token_accuracy": 0.8402355641126633,
1819
+ "epoch": 0.6191529171627828,
1820
+ "step": 1820
1821
+ },
1822
+ {
1823
+ "loss": 0.6061,
1824
+ "grad_norm": 0.38452163338661194,
1825
+ "learning_rate": 3.324015888940925e-05,
1826
+ "entropy": 0.6065735876560211,
1827
+ "num_tokens": 21184885.0,
1828
+ "mean_token_accuracy": 0.8360267549753189,
1829
+ "epoch": 0.6225548562680728,
1830
+ "step": 1830
1831
+ },
1832
+ {
1833
+ "loss": 0.5824,
1834
+ "grad_norm": 0.4377152919769287,
1835
+ "learning_rate": 3.2720095346529566e-05,
1836
+ "entropy": 0.5836168915033341,
1837
+ "num_tokens": 21298789.0,
1838
+ "mean_token_accuracy": 0.8420229732990265,
1839
+ "epoch": 0.6259567953733628,
1840
+ "step": 1840
1841
+ },
1842
+ {
1843
+ "loss": 0.5761,
1844
+ "grad_norm": 0.3785684406757355,
1845
+ "learning_rate": 3.220214626610689e-05,
1846
+ "entropy": 0.5765029028058052,
1847
+ "num_tokens": 21424296.0,
1848
+ "mean_token_accuracy": 0.8433551669120789,
1849
+ "epoch": 0.6293587344786529,
1850
+ "step": 1850
1851
+ },
1852
+ {
1853
+ "loss": 0.572,
1854
+ "grad_norm": 0.37199345231056213,
1855
+ "learning_rate": 3.168637502718798e-05,
1856
+ "entropy": 0.5756848677992821,
1857
+ "num_tokens": 21534093.0,
1858
+ "mean_token_accuracy": 0.8463689625263214,
1859
+ "epoch": 0.6327606735839428,
1860
+ "step": 1860
1861
+ },
1862
+ {
1863
+ "loss": 0.5863,
1864
+ "grad_norm": 0.38243773579597473,
1865
+ "learning_rate": 3.117284474232717e-05,
1866
+ "entropy": 0.5788663312792778,
1867
+ "num_tokens": 21656220.0,
1868
+ "mean_token_accuracy": 0.8417115867137909,
1869
+ "epoch": 0.6361626126892329,
1870
+ "step": 1870
1871
+ },
1872
+ {
1873
+ "loss": 0.5986,
1874
+ "grad_norm": 0.3821103274822235,
1875
+ "learning_rate": 3.066161824986352e-05,
1876
+ "entropy": 0.5999485164880752,
1877
+ "num_tokens": 21770257.0,
1878
+ "mean_token_accuracy": 0.8380926698446274,
1879
+ "epoch": 0.6395645517945229,
1880
+ "step": 1880
1881
+ },
1882
+ {
1883
+ "loss": 0.577,
1884
+ "grad_norm": 0.33614298701286316,
1885
+ "learning_rate": 3.0152758106231642e-05,
1886
+ "entropy": 0.5815630614757538,
1887
+ "num_tokens": 21896451.0,
1888
+ "mean_token_accuracy": 0.8430263727903367,
1889
+ "epoch": 0.642966490899813,
1890
+ "step": 1890
1891
+ },
1892
+ {
1893
+ "loss": 0.6131,
1894
+ "grad_norm": 0.39611414074897766,
1895
+ "learning_rate": 2.9646326578306915e-05,
1896
+ "entropy": 0.6189764007925987,
1897
+ "num_tokens": 22003560.0,
1898
+ "mean_token_accuracy": 0.8391340941190719,
1899
+ "epoch": 0.6463684300051029,
1900
+ "step": 1900
1901
+ },
1902
+ {
1903
+ "loss": 0.5443,
1904
+ "grad_norm": 0.33414164185523987,
1905
+ "learning_rate": 2.914238563578618e-05,
1906
+ "entropy": 0.5429508209228515,
1907
+ "num_tokens": 22117872.0,
1908
+ "mean_token_accuracy": 0.8526928842067718,
1909
+ "epoch": 0.6497703691103929,
1910
+ "step": 1910
1911
+ },
1912
+ {
1913
+ "loss": 0.6123,
1914
+ "grad_norm": 0.4099610447883606,
1915
+ "learning_rate": 2.864099694360479e-05,
1916
+ "entropy": 0.6225771173834801,
1917
+ "num_tokens": 22222509.0,
1918
+ "mean_token_accuracy": 0.8378654181957245,
1919
+ "epoch": 0.653172308215683,
1920
+ "step": 1920
1921
+ },
1922
+ {
1923
+ "loss": 0.5621,
1924
+ "grad_norm": 0.3818487823009491,
1925
+ "learning_rate": 2.814222185439098e-05,
1926
+ "entropy": 0.5593055367469788,
1927
+ "num_tokens": 22335582.0,
1928
+ "mean_token_accuracy": 0.8501265943050385,
1929
+ "epoch": 0.6565742473209729,
1930
+ "step": 1930
1931
+ },
1932
+ {
1933
+ "loss": 0.6008,
1934
+ "grad_norm": 0.4001128673553467,
1935
+ "learning_rate": 2.764612140095839e-05,
1936
+ "entropy": 0.6034526154398918,
1937
+ "num_tokens": 22448701.0,
1938
+ "mean_token_accuracy": 0.8390175521373748,
1939
+ "epoch": 0.659976186426263,
1940
+ "step": 1940
1941
+ },
1942
+ {
1943
+ "loss": 0.5682,
1944
+ "grad_norm": 0.3717249035835266,
1945
+ "learning_rate": 2.715275628883781e-05,
1946
+ "entropy": 0.5757593274116516,
1947
+ "num_tokens": 22565429.0,
1948
+ "mean_token_accuracy": 0.8461737126111984,
1949
+ "epoch": 0.663378125531553,
1950
+ "step": 1950
1951
+ },
1952
+ {
1953
+ "loss": 0.5819,
1954
+ "grad_norm": 0.31234627962112427,
1955
+ "learning_rate": 2.6662186888848862e-05,
1956
+ "entropy": 0.5898503661155701,
1957
+ "num_tokens": 22690809.0,
1958
+ "mean_token_accuracy": 0.8421528667211533,
1959
+ "epoch": 0.666780064636843,
1960
+ "step": 1960
1961
+ },
1962
+ {
1963
+ "loss": 0.5795,
1964
+ "grad_norm": 0.43498075008392334,
1965
+ "learning_rate": 2.6174473229712758e-05,
1966
+ "entropy": 0.5725113779306412,
1967
+ "num_tokens": 22808337.0,
1968
+ "mean_token_accuracy": 0.8442942827939988,
1969
+ "epoch": 0.670182003742133,
1970
+ "step": 1970
1971
+ },
1972
+ {
1973
+ "loss": 0.5602,
1974
+ "grad_norm": 0.35489723086357117,
1975
+ "learning_rate": 2.5689674990706813e-05,
1976
+ "entropy": 0.5673380747437478,
1977
+ "num_tokens": 22924233.0,
1978
+ "mean_token_accuracy": 0.8477986544370651,
1979
+ "epoch": 0.6735839428474231,
1980
+ "step": 1980
1981
+ },
1982
+ {
1983
+ "loss": 0.616,
1984
+ "grad_norm": 0.3546064496040344,
1985
+ "learning_rate": 2.520785149436181e-05,
1986
+ "entropy": 0.6025599181652069,
1987
+ "num_tokens": 23040709.0,
1988
+ "mean_token_accuracy": 0.8388166666030884,
1989
+ "epoch": 0.676985881952713,
1990
+ "step": 1990
1991
+ },
1992
+ {
1993
+ "loss": 0.5866,
1994
+ "grad_norm": 0.3466935455799103,
1995
+ "learning_rate": 2.4729061699202933e-05,
1996
+ "entropy": 0.5931081950664521,
1997
+ "num_tokens": 23168281.0,
1998
+ "mean_token_accuracy": 0.8412128478288651,
1999
+ "epoch": 0.6803878210580031,
2000
+ "step": 2000
2001
+ },
2002
+ {
2003
+ "loss": 0.5745,
2004
+ "grad_norm": 0.36624687910079956,
2005
+ "learning_rate": 2.4253364192535348e-05,
2006
+ "entropy": 0.5797276377677918,
2007
+ "num_tokens": 23265650.0,
2008
+ "mean_token_accuracy": 0.8477521270513535,
2009
+ "epoch": 0.6837897601632931,
2010
+ "step": 2010
2011
+ },
2012
+ {
2013
+ "loss": 0.6003,
2014
+ "grad_norm": 0.3339534401893616,
2015
+ "learning_rate": 2.3780817183275057e-05,
2016
+ "entropy": 0.5999033793807029,
2017
+ "num_tokens": 23380810.0,
2018
+ "mean_token_accuracy": 0.8395819336175918,
2019
+ "epoch": 0.6871916992685831,
2020
+ "step": 2020
2021
+ },
2022
+ {
2023
+ "loss": 0.6001,
2024
+ "grad_norm": 0.3682357370853424,
2025
+ "learning_rate": 2.331147849482619e-05,
2026
+ "entropy": 0.6057513281702995,
2027
+ "num_tokens": 23488800.0,
2028
+ "mean_token_accuracy": 0.8401612430810929,
2029
+ "epoch": 0.6905936383738731,
2030
+ "step": 2030
2031
+ },
2032
+ {
2033
+ "loss": 0.5681,
2034
+ "grad_norm": 0.40995046496391296,
2035
+ "learning_rate": 2.2845405558005428e-05,
2036
+ "entropy": 0.5787547707557679,
2037
+ "num_tokens": 23603230.0,
2038
+ "mean_token_accuracy": 0.8451647937297821,
2039
+ "epoch": 0.6939955774791631,
2040
+ "step": 2040
2041
+ },
2042
+ {
2043
+ "loss": 0.5821,
2044
+ "grad_norm": 0.37340083718299866,
2045
+ "learning_rate": 2.2382655404014447e-05,
2046
+ "entropy": 0.5805535346269608,
2047
+ "num_tokens": 23722605.0,
2048
+ "mean_token_accuracy": 0.844177857041359,
2049
+ "epoch": 0.6973975165844531,
2050
+ "step": 2050
2051
+ },
2052
+ {
2053
+ "loss": 0.5787,
2054
+ "grad_norm": 0.4327142536640167,
2055
+ "learning_rate": 2.1923284657461258e-05,
2056
+ "entropy": 0.5828775107860565,
2057
+ "num_tokens": 23835961.0,
2058
+ "mean_token_accuracy": 0.842606109380722,
2059
+ "epoch": 0.7007994556897431,
2060
+ "step": 2060
2061
+ },
2062
+ {
2063
+ "loss": 0.5982,
2064
+ "grad_norm": 0.4304671287536621,
2065
+ "learning_rate": 2.1467349529431317e-05,
2066
+ "entropy": 0.5941863551735878,
2067
+ "num_tokens": 23949131.0,
2068
+ "mean_token_accuracy": 0.8400673478841781,
2069
+ "epoch": 0.7042013947950332,
2070
+ "step": 2070
2071
+ },
2072
+ {
2073
+ "loss": 0.5679,
2074
+ "grad_norm": 0.37865591049194336,
2075
+ "learning_rate": 2.10149058106093e-05,
2076
+ "entropy": 0.5765653476119041,
2077
+ "num_tokens": 24070739.0,
2078
+ "mean_token_accuracy": 0.8449007272720337,
2079
+ "epoch": 0.7076033339003232,
2080
+ "step": 2080
2081
+ },
2082
+ {
2083
+ "loss": 0.5886,
2084
+ "grad_norm": 0.3672359585762024,
2085
+ "learning_rate": 2.056600886445213e-05,
2086
+ "entropy": 0.5881167098879814,
2087
+ "num_tokens": 24178837.0,
2088
+ "mean_token_accuracy": 0.8434468775987625,
2089
+ "epoch": 0.7110052730056132,
2090
+ "step": 2090
2091
+ },
2092
+ {
2093
+ "loss": 0.5956,
2094
+ "grad_norm": 0.36181676387786865,
2095
+ "learning_rate": 2.01207136204145e-05,
2096
+ "entropy": 0.591875921189785,
2097
+ "num_tokens": 24297729.0,
2098
+ "mean_token_accuracy": 0.8414703428745269,
2099
+ "epoch": 0.7144072121109032,
2100
+ "step": 2100
2101
+ },
2102
+ {
2103
+ "loss": 0.5599,
2104
+ "grad_norm": 0.35588833689689636,
2105
+ "learning_rate": 1.967907456722738e-05,
2106
+ "entropy": 0.5663163229823113,
2107
+ "num_tokens": 24408243.0,
2108
+ "mean_token_accuracy": 0.8491099238395691,
2109
+ "epoch": 0.7178091512161933,
2110
+ "step": 2110
2111
+ },
2112
+ {
2113
+ "loss": 0.5644,
2114
+ "grad_norm": 0.38050028681755066,
2115
+ "learning_rate": 1.9241145746230478e-05,
2116
+ "entropy": 0.5726602435111999,
2117
+ "num_tokens": 24512247.0,
2118
+ "mean_token_accuracy": 0.8495869427919388,
2119
+ "epoch": 0.7212110903214832,
2120
+ "step": 2120
2121
+ },
2122
+ {
2123
+ "loss": 0.5693,
2124
+ "grad_norm": 0.3819943964481354,
2125
+ "learning_rate": 1.8806980744759445e-05,
2126
+ "entropy": 0.5744548887014389,
2127
+ "num_tokens": 24627020.0,
2128
+ "mean_token_accuracy": 0.842121011018753,
2129
+ "epoch": 0.7246130294267733,
2130
+ "step": 2130
2131
+ },
2132
+ {
2133
+ "loss": 0.6092,
2134
+ "grad_norm": 0.3454561233520508,
2135
+ "learning_rate": 1.837663268958867e-05,
2136
+ "entropy": 0.6237739801406861,
2137
+ "num_tokens": 24736398.0,
2138
+ "mean_token_accuracy": 0.8370961755514145,
2139
+ "epoch": 0.7280149685320633,
2140
+ "step": 2140
2141
+ },
2142
+ {
2143
+ "loss": 0.6223,
2144
+ "grad_norm": 0.35335883498191833,
2145
+ "learning_rate": 1.795015424043035e-05,
2146
+ "entropy": 0.6208432152867317,
2147
+ "num_tokens": 24861135.0,
2148
+ "mean_token_accuracy": 0.8354604303836822,
2149
+ "epoch": 0.7314169076373533,
2150
+ "step": 2150
2151
+ },
2152
+ {
2153
+ "loss": 0.5919,
2154
+ "grad_norm": 0.3348969519138336,
2155
+ "learning_rate": 1.7527597583490822e-05,
2156
+ "entropy": 0.5919017851352691,
2157
+ "num_tokens": 24974197.0,
2158
+ "mean_token_accuracy": 0.840225774049759,
2159
+ "epoch": 0.7348188467426433,
2160
+ "step": 2160
2161
+ },
2162
+ {
2163
+ "loss": 0.5706,
2164
+ "grad_norm": 0.3716595470905304,
2165
+ "learning_rate": 1.7109014425084725e-05,
2166
+ "entropy": 0.5833111733198166,
2167
+ "num_tokens": 25091846.0,
2168
+ "mean_token_accuracy": 0.8445227384567261,
2169
+ "epoch": 0.7382207858479333,
2170
+ "step": 2170
2171
+ },
2172
+ {
2173
+ "loss": 0.5679,
2174
+ "grad_norm": 0.32632264494895935,
2175
+ "learning_rate": 1.669445598530796e-05,
2176
+ "entropy": 0.5681742131710052,
2177
+ "num_tokens": 25210289.0,
2178
+ "mean_token_accuracy": 0.8463738948106766,
2179
+ "epoch": 0.7416227249532233,
2180
+ "step": 2180
2181
+ },
2182
+ {
2183
+ "loss": 0.6147,
2184
+ "grad_norm": 0.3877134621143341,
2185
+ "learning_rate": 1.628397299177013e-05,
2186
+ "entropy": 0.6165782377123833,
2187
+ "num_tokens": 25334926.0,
2188
+ "mean_token_accuracy": 0.8343934565782547,
2189
+ "epoch": 0.7450246640585133,
2190
+ "step": 2190
2191
+ },
2192
+ {
2193
+ "loss": 0.5806,
2194
+ "grad_norm": 0.38453352451324463,
2195
+ "learning_rate": 1.5877615673387215e-05,
2196
+ "entropy": 0.5809252455830574,
2197
+ "num_tokens": 25453005.0,
2198
+ "mean_token_accuracy": 0.8436485230922699,
2199
+ "epoch": 0.7484266031638034,
2200
+ "step": 2200
2201
+ },
2202
+ {
2203
+ "loss": 0.5584,
2204
+ "grad_norm": 0.3705204725265503,
2205
+ "learning_rate": 1.5475433754235312e-05,
2206
+ "entropy": 0.5579630300402642,
2207
+ "num_tokens": 25568378.0,
2208
+ "mean_token_accuracy": 0.8482485413551331,
2209
+ "epoch": 0.7518285422690933,
2210
+ "step": 2210
2211
+ },
2212
+ {
2213
+ "loss": 0.5846,
2214
+ "grad_norm": 0.4126192331314087,
2215
+ "learning_rate": 1.5077476447466115e-05,
2216
+ "entropy": 0.5949768751859665,
2217
+ "num_tokens": 25679312.0,
2218
+ "mean_token_accuracy": 0.8417560040950776,
2219
+ "epoch": 0.7552304813743834,
2220
+ "step": 2220
2221
+ },
2222
+ {
2223
+ "loss": 0.5992,
2224
+ "grad_norm": 0.369003564119339,
2225
+ "learning_rate": 1.4683792449284922e-05,
2226
+ "entropy": 0.5958976864814758,
2227
+ "num_tokens": 25802476.0,
2228
+ "mean_token_accuracy": 0.8378436207771301,
2229
+ "epoch": 0.7586324204796734,
2230
+ "step": 2230
2231
+ },
2232
+ {
2233
+ "loss": 0.5845,
2234
+ "grad_norm": 0.3608800768852234,
2235
+ "learning_rate": 1.429442993299191e-05,
2236
+ "entropy": 0.5774736061692238,
2237
+ "num_tokens": 25917526.0,
2238
+ "mean_token_accuracy": 0.8448121935129166,
2239
+ "epoch": 0.7620343595849635,
2240
+ "step": 2240
2241
+ },
2242
+ {
2243
+ "loss": 0.533,
2244
+ "grad_norm": 0.35715219378471375,
2245
+ "learning_rate": 1.3909436543087428e-05,
2246
+ "entropy": 0.5421435713768006,
2247
+ "num_tokens": 26022101.0,
2248
+ "mean_token_accuracy": 0.8561456590890885,
2249
+ "epoch": 0.7654362986902534,
2250
+ "step": 2250
2251
+ },
2252
+ {
2253
+ "loss": 0.5595,
2254
+ "grad_norm": 0.36197429895401,
2255
+ "learning_rate": 1.352885938944189e-05,
2256
+ "entropy": 0.564231875538826,
2257
+ "num_tokens": 26132618.0,
2258
+ "mean_token_accuracy": 0.8467835336923599,
2259
+ "epoch": 0.7688382377955435,
2260
+ "step": 2260
2261
+ },
2262
+ {
2263
+ "loss": 0.5394,
2264
+ "grad_norm": 0.34464767575263977,
2265
+ "learning_rate": 1.3152745041531201e-05,
2266
+ "entropy": 0.5447842225432395,
2267
+ "num_tokens": 26251170.0,
2268
+ "mean_token_accuracy": 0.852738481760025,
2269
+ "epoch": 0.7722401769008335,
2270
+ "step": 2270
2271
+ },
2272
+ {
2273
+ "loss": 0.6081,
2274
+ "grad_norm": 0.34223926067352295,
2275
+ "learning_rate": 1.2781139522738239e-05,
2276
+ "entropy": 0.6099189713597297,
2277
+ "num_tokens": 26366238.0,
2278
+ "mean_token_accuracy": 0.8368343204259873,
2279
+ "epoch": 0.7756421160061235,
2280
+ "step": 2280
2281
+ },
2282
+ {
2283
+ "loss": 0.5608,
2284
+ "grad_norm": 0.40201887488365173,
2285
+ "learning_rate": 1.2414088304721234e-05,
2286
+ "entropy": 0.5682180404663086,
2287
+ "num_tokens": 26481121.0,
2288
+ "mean_token_accuracy": 0.8468090683221817,
2289
+ "epoch": 0.7790440551114135,
2290
+ "step": 2290
2291
+ },
2292
+ {
2293
+ "loss": 0.5831,
2294
+ "grad_norm": 0.3707852363586426,
2295
+ "learning_rate": 1.2051636301849539e-05,
2296
+ "entropy": 0.5816125243902206,
2297
+ "num_tokens": 26591544.0,
2298
+ "mean_token_accuracy": 0.8441601723432541,
2299
+ "epoch": 0.7824459942167035,
2300
+ "step": 2300
2301
+ },
2302
+ {
2303
+ "loss": 0.644,
2304
+ "grad_norm": 0.42641323804855347,
2305
+ "learning_rate": 1.1693827865707729e-05,
2306
+ "entropy": 0.6465028315782547,
2307
+ "num_tokens": 26701423.0,
2308
+ "mean_token_accuracy": 0.8323942631483078,
2309
+ "epoch": 0.7858479333219935,
2310
+ "step": 2310
2311
+ },
2312
+ {
2313
+ "loss": 0.5624,
2314
+ "grad_norm": 0.3720209300518036,
2315
+ "learning_rate": 1.1340706779668464e-05,
2316
+ "entropy": 0.568995788693428,
2317
+ "num_tokens": 26821407.0,
2318
+ "mean_token_accuracy": 0.844905823469162,
2319
+ "epoch": 0.7892498724272835,
2320
+ "step": 2320
2321
+ },
2322
+ {
2323
+ "loss": 0.562,
2324
+ "grad_norm": 0.33470767736434937,
2325
+ "learning_rate": 1.099231625353494e-05,
2326
+ "entropy": 0.5680811256170273,
2327
+ "num_tokens": 26942232.0,
2328
+ "mean_token_accuracy": 0.8472270101308823,
2329
+ "epoch": 0.7926518115325736,
2330
+ "step": 2330
2331
+ },
2332
+ {
2333
+ "loss": 0.5871,
2334
+ "grad_norm": 0.35473379492759705,
2335
+ "learning_rate": 1.0648698918253474e-05,
2336
+ "entropy": 0.5944129660725593,
2337
+ "num_tokens": 27054063.0,
2338
+ "mean_token_accuracy": 0.8423294246196746,
2339
+ "epoch": 0.7960537506378635,
2340
+ "step": 2340
2341
+ },
2342
+ {
2343
+ "loss": 0.5813,
2344
+ "grad_norm": 0.336117684841156,
2345
+ "learning_rate": 1.0309896820697002e-05,
2346
+ "entropy": 0.5846550479531288,
2347
+ "num_tokens": 27170010.0,
2348
+ "mean_token_accuracy": 0.8445548832416534,
2349
+ "epoch": 0.7994556897431536,
2350
+ "step": 2350
2351
+ },
2352
+ {
2353
+ "loss": 0.5587,
2354
+ "grad_norm": 0.36509907245635986,
2355
+ "learning_rate": 9.975951418519941e-06,
2356
+ "entropy": 0.565837450325489,
2357
+ "num_tokens": 27288934.0,
2358
+ "mean_token_accuracy": 0.8460572957992554,
2359
+ "epoch": 0.8028576288484436,
2360
+ "step": 2360
2361
+ },
2362
+ {
2363
+ "loss": 0.5744,
2364
+ "grad_norm": 0.37850871682167053,
2365
+ "learning_rate": 9.646903575085236e-06,
2366
+ "entropy": 0.5862236008048057,
2367
+ "num_tokens": 27405172.0,
2368
+ "mean_token_accuracy": 0.8447532653808594,
2369
+ "epoch": 0.8062595679537337,
2370
+ "step": 2370
2371
+ },
2372
+ {
2373
+ "loss": 0.5813,
2374
+ "grad_norm": 0.3511059582233429,
2375
+ "learning_rate": 9.32279355446411e-06,
2376
+ "entropy": 0.5788961902260781,
2377
+ "num_tokens": 27518997.0,
2378
+ "mean_token_accuracy": 0.8418603748083114,
2379
+ "epoch": 0.8096615070590236,
2380
+ "step": 2380
2381
+ },
2382
+ {
2383
+ "loss": 0.5855,
2384
+ "grad_norm": 0.3253343999385834,
2385
+ "learning_rate": 9.003661016509102e-06,
2386
+ "entropy": 0.5858073383569717,
2387
+ "num_tokens": 27644298.0,
2388
+ "mean_token_accuracy": 0.8415432572364807,
2389
+ "epoch": 0.8130634461643137,
2390
+ "step": 2390
2391
+ },
2392
+ {
2393
+ "loss": 0.5595,
2394
+ "grad_norm": 0.36334091424942017,
2395
+ "learning_rate": 8.689545012001083e-06,
2396
+ "entropy": 0.5582752287387848,
2397
+ "num_tokens": 27753242.0,
2398
+ "mean_token_accuracy": 0.8509968429803848,
2399
+ "epoch": 0.8164653852696037,
2400
+ "step": 2400
2401
+ },
2402
+ {
2403
+ "loss": 0.5706,
2404
+ "grad_norm": 0.32456502318382263,
2405
+ "learning_rate": 8.380483977870834e-06,
2406
+ "entropy": 0.5666798010468483,
2407
+ "num_tokens": 27882376.0,
2408
+ "mean_token_accuracy": 0.84336639046669,
2409
+ "epoch": 0.8198673243748937,
2410
+ "step": 2410
2411
+ },
2412
+ {
2413
+ "loss": 0.5394,
2414
+ "grad_norm": 0.3463776707649231,
2415
+ "learning_rate": 8.076515732495626e-06,
2416
+ "entropy": 0.5584284573793411,
2417
+ "num_tokens": 28000308.0,
2418
+ "mean_token_accuracy": 0.8512558758258819,
2419
+ "epoch": 0.8232692634801837,
2420
+ "step": 2420
2421
+ },
2422
+ {
2423
+ "loss": 0.6075,
2424
+ "grad_norm": 0.35784080624580383,
2425
+ "learning_rate": 7.777677471071614e-06,
2426
+ "entropy": 0.6100872978568077,
2427
+ "num_tokens": 28115648.0,
2428
+ "mean_token_accuracy": 0.8378289490938187,
2429
+ "epoch": 0.8266712025854738,
2430
+ "step": 2430
2431
+ },
2432
+ {
2433
+ "loss": 0.5815,
2434
+ "grad_norm": 0.351990282535553,
2435
+ "learning_rate": 7.484005761062391e-06,
2436
+ "entropy": 0.5808347776532173,
2437
+ "num_tokens": 28222592.0,
2438
+ "mean_token_accuracy": 0.8474200069904327,
2439
+ "epoch": 0.8300731416907637,
2440
+ "step": 2440
2441
+ },
2442
+ {
2443
+ "loss": 0.5889,
2444
+ "grad_norm": 0.3908417224884033,
2445
+ "learning_rate": 7.195536537724429e-06,
2446
+ "entropy": 0.5958179652690887,
2447
+ "num_tokens": 28325239.0,
2448
+ "mean_token_accuracy": 0.845181006193161,
2449
+ "epoch": 0.8334750807960537,
2450
+ "step": 2450
2451
+ },
2452
+ {
2453
+ "loss": 0.5846,
2454
+ "grad_norm": 0.38566815853118896,
2455
+ "learning_rate": 6.912305099709831e-06,
2456
+ "entropy": 0.5907109215855598,
2457
+ "num_tokens": 28445952.0,
2458
+ "mean_token_accuracy": 0.84410340487957,
2459
+ "epoch": 0.8368770199013438,
2460
+ "step": 2460
2461
+ },
2462
+ {
2463
+ "loss": 0.5876,
2464
+ "grad_norm": 0.4085163474082947,
2465
+ "learning_rate": 6.634346104746997e-06,
2466
+ "entropy": 0.5830874457955361,
2467
+ "num_tokens": 28554611.0,
2468
+ "mean_token_accuracy": 0.8452066659927369,
2469
+ "epoch": 0.8402789590066337,
2470
+ "step": 2470
2471
+ },
2472
+ {
2473
+ "loss": 0.5575,
2474
+ "grad_norm": 0.352285236120224,
2475
+ "learning_rate": 6.36169356539974e-06,
2476
+ "entropy": 0.5569544598460198,
2477
+ "num_tokens": 28659685.0,
2478
+ "mean_token_accuracy": 0.851721253991127,
2479
+ "epoch": 0.8436808981119238,
2480
+ "step": 2480
2481
+ },
2482
+ {
2483
+ "loss": 0.5667,
2484
+ "grad_norm": 0.365980863571167,
2485
+ "learning_rate": 6.094380844905278e-06,
2486
+ "entropy": 0.5703560188412666,
2487
+ "num_tokens": 28768833.0,
2488
+ "mean_token_accuracy": 0.8493399202823639,
2489
+ "epoch": 0.8470828372172138,
2490
+ "step": 2490
2491
+ },
2492
+ {
2493
+ "loss": 0.6135,
2494
+ "grad_norm": 0.4027390480041504,
2495
+ "learning_rate": 5.832440653091775e-06,
2496
+ "entropy": 0.6129578873515129,
2497
+ "num_tokens": 28885843.0,
2498
+ "mean_token_accuracy": 0.8367623895406723,
2499
+ "epoch": 0.8504847763225039,
2500
+ "step": 2500
2501
+ },
2502
+ {
2503
+ "loss": 0.559,
2504
+ "grad_norm": 0.37476834654808044,
2505
+ "learning_rate": 5.575905042375751e-06,
2506
+ "entropy": 0.5681184902787209,
2507
+ "num_tokens": 29003923.0,
2508
+ "mean_token_accuracy": 0.8465311348438262,
2509
+ "epoch": 0.8538867154277938,
2510
+ "step": 2510
2511
+ },
2512
+ {
2513
+ "loss": 0.5571,
2514
+ "grad_norm": 0.3746230900287628,
2515
+ "learning_rate": 5.324805403840006e-06,
2516
+ "entropy": 0.5609279617667198,
2517
+ "num_tokens": 29114669.0,
2518
+ "mean_token_accuracy": 0.8491218000650406,
2519
+ "epoch": 0.8572886545330839,
2520
+ "step": 2520
2521
+ },
2522
+ {
2523
+ "loss": 0.6017,
2524
+ "grad_norm": 0.3442933261394501,
2525
+ "learning_rate": 5.079172463392434e-06,
2526
+ "entropy": 0.6076221778988838,
2527
+ "num_tokens": 29237873.0,
2528
+ "mean_token_accuracy": 0.8369300574064255,
2529
+ "epoch": 0.8606905936383739,
2530
+ "step": 2530
2531
+ },
2532
+ {
2533
+ "loss": 0.5838,
2534
+ "grad_norm": 0.34637150168418884,
2535
+ "learning_rate": 4.839036278006215e-06,
2536
+ "entropy": 0.5847443833947181,
2537
+ "num_tokens": 29356084.0,
2538
+ "mean_token_accuracy": 0.8435145944356919,
2539
+ "epoch": 0.8640925327436639,
2540
+ "step": 2540
2541
+ },
2542
+ {
2543
+ "loss": 0.5735,
2544
+ "grad_norm": 0.3903183937072754,
2545
+ "learning_rate": 4.6044262320418915e-06,
2546
+ "entropy": 0.5809785157442093,
2547
+ "num_tokens": 29466199.0,
2548
+ "mean_token_accuracy": 0.8461235493421555,
2549
+ "epoch": 0.8674944718489539,
2550
+ "step": 2550
2551
+ },
2552
+ {
2553
+ "loss": 0.5468,
2554
+ "grad_norm": 0.38433846831321716,
2555
+ "learning_rate": 4.375371033651754e-06,
2556
+ "entropy": 0.5522323742508888,
2557
+ "num_tokens": 29585003.0,
2558
+ "mean_token_accuracy": 0.8522881597280503,
2559
+ "epoch": 0.870896410954244,
2560
+ "step": 2560
2561
+ },
2562
+ {
2563
+ "loss": 0.5747,
2564
+ "grad_norm": 0.38316574692726135,
2565
+ "learning_rate": 4.151898711266932e-06,
2566
+ "entropy": 0.5752411767840385,
2567
+ "num_tokens": 29697951.0,
2568
+ "mean_token_accuracy": 0.8458108186721802,
2569
+ "epoch": 0.8742983500595339,
2570
+ "step": 2570
2571
+ },
2572
+ {
2573
+ "loss": 0.5561,
2574
+ "grad_norm": 0.37862345576286316,
2575
+ "learning_rate": 3.934036610167696e-06,
2576
+ "entropy": 0.5635033801198006,
2577
+ "num_tokens": 29814806.0,
2578
+ "mean_token_accuracy": 0.8486339598894119,
2579
+ "epoch": 0.8777002891648239,
2580
+ "step": 2580
2581
+ },
2582
+ {
2583
+ "loss": 0.5879,
2584
+ "grad_norm": 0.40011560916900635,
2585
+ "learning_rate": 3.721811389137353e-06,
2586
+ "entropy": 0.5901739224791527,
2587
+ "num_tokens": 29938190.0,
2588
+ "mean_token_accuracy": 0.8400143891572952,
2589
+ "epoch": 0.881102228270114,
2590
+ "step": 2590
2591
+ },
2592
+ {
2593
+ "loss": 0.5741,
2594
+ "grad_norm": 0.36540430784225464,
2595
+ "learning_rate": 3.5152490172001116e-06,
2596
+ "entropy": 0.5728347212076187,
2597
+ "num_tokens": 30057305.0,
2598
+ "mean_token_accuracy": 0.8454901963472367,
2599
+ "epoch": 0.8845041673754039,
2600
+ "step": 2600
2601
+ },
2602
+ {
2603
+ "loss": 0.6221,
2604
+ "grad_norm": 0.3713492751121521,
2605
+ "learning_rate": 3.314374770443379e-06,
2606
+ "entropy": 0.6258899614214897,
2607
+ "num_tokens": 30172939.0,
2608
+ "mean_token_accuracy": 0.8357624322175979,
2609
+ "epoch": 0.887906106480694,
2610
+ "step": 2610
2611
+ },
2612
+ {
2613
+ "loss": 0.5946,
2614
+ "grad_norm": 0.3445519208908081,
2615
+ "learning_rate": 3.1192132289248497e-06,
2616
+ "entropy": 0.596692156791687,
2617
+ "num_tokens": 30300574.0,
2618
+ "mean_token_accuracy": 0.8411296546459198,
2619
+ "epoch": 0.891308045585984,
2620
+ "step": 2620
2621
+ },
2622
+ {
2623
+ "loss": 0.5774,
2624
+ "grad_norm": 0.37687185406684875,
2625
+ "learning_rate": 2.9297882736647574e-06,
2626
+ "entropy": 0.5851211875677109,
2627
+ "num_tokens": 30411366.0,
2628
+ "mean_token_accuracy": 0.8454822093248368,
2629
+ "epoch": 0.894709984691274,
2630
+ "step": 2630
2631
+ },
2632
+ {
2633
+ "loss": 0.6138,
2634
+ "grad_norm": 0.3559032678604126,
2635
+ "learning_rate": 2.746123083723656e-06,
2636
+ "entropy": 0.612051984667778,
2637
+ "num_tokens": 30533680.0,
2638
+ "mean_token_accuracy": 0.8369055807590484,
2639
+ "epoch": 0.898111923796564,
2640
+ "step": 2640
2641
+ },
2642
+ {
2643
+ "loss": 0.5639,
2644
+ "grad_norm": 0.37055504322052,
2645
+ "learning_rate": 2.5682401333661067e-06,
2646
+ "entropy": 0.5725692048668861,
2647
+ "num_tokens": 30643544.0,
2648
+ "mean_token_accuracy": 0.8459548681974411,
2649
+ "epoch": 0.9015138629018541,
2650
+ "step": 2650
2651
+ },
2652
+ {
2653
+ "loss": 0.5329,
2654
+ "grad_norm": 0.3881671130657196,
2655
+ "learning_rate": 2.396161189310603e-06,
2656
+ "entropy": 0.5357650727033615,
2657
+ "num_tokens": 30753201.0,
2658
+ "mean_token_accuracy": 0.8559219360351562,
2659
+ "epoch": 0.904915802007144,
2660
+ "step": 2660
2661
+ },
2662
+ {
2663
+ "loss": 0.5647,
2664
+ "grad_norm": 0.3779488205909729,
2665
+ "learning_rate": 2.2299073080660926e-06,
2666
+ "entropy": 0.5651383563876152,
2667
+ "num_tokens": 30866789.0,
2668
+ "mean_token_accuracy": 0.8465316832065582,
2669
+ "epoch": 0.9083177411124341,
2670
+ "step": 2670
2671
+ },
2672
+ {
2673
+ "loss": 0.5846,
2674
+ "grad_norm": 0.3553813397884369,
2675
+ "learning_rate": 2.069498833355371e-06,
2676
+ "entropy": 0.5878252282738685,
2677
+ "num_tokens": 30980359.0,
2678
+ "mean_token_accuracy": 0.846244278550148,
2679
+ "epoch": 0.9117196802177241,
2680
+ "step": 2680
2681
+ },
2682
+ {
2683
+ "loss": 0.5718,
2684
+ "grad_norm": 0.36092498898506165,
2685
+ "learning_rate": 1.914955393625717e-06,
2686
+ "entropy": 0.5731380596756935,
2687
+ "num_tokens": 31096112.0,
2688
+ "mean_token_accuracy": 0.8443791717290878,
2689
+ "epoch": 0.9151216193230142,
2690
+ "step": 2690
2691
+ },
2692
+ {
2693
+ "loss": 0.5533,
2694
+ "grad_norm": 0.39117369055747986,
2695
+ "learning_rate": 1.7662958996470635e-06,
2696
+ "entropy": 0.5589064419269562,
2697
+ "num_tokens": 31211039.0,
2698
+ "mean_token_accuracy": 0.8504908055067062,
2699
+ "epoch": 0.9185235584283041,
2700
+ "step": 2700
2701
+ },
2702
+ {
2703
+ "loss": 0.5676,
2704
+ "grad_norm": 0.37049877643585205,
2705
+ "learning_rate": 1.6235385421979555e-06,
2706
+ "entropy": 0.5745102554559708,
2707
+ "num_tokens": 31323209.0,
2708
+ "mean_token_accuracy": 0.8450854837894439,
2709
+ "epoch": 0.9219254975335941,
2710
+ "step": 2710
2711
+ },
2712
+ {
2713
+ "loss": 0.5449,
2714
+ "grad_norm": 0.36490964889526367,
2715
+ "learning_rate": 1.4867007898396457e-06,
2716
+ "entropy": 0.5510489344596863,
2717
+ "num_tokens": 31436438.0,
2718
+ "mean_token_accuracy": 0.8506697177886963,
2719
+ "epoch": 0.9253274366388842,
2720
+ "step": 2720
2721
+ },
2722
+ {
2723
+ "loss": 0.5585,
2724
+ "grad_norm": 0.3628520965576172,
2725
+ "learning_rate": 1.3557993867785335e-06,
2726
+ "entropy": 0.5683807358145714,
2727
+ "num_tokens": 31553624.0,
2728
+ "mean_token_accuracy": 0.8480296969413758,
2729
+ "epoch": 0.9287293757441741,
2730
+ "step": 2730
2731
+ },
2732
+ {
2733
+ "loss": 0.5956,
2734
+ "grad_norm": 0.34894460439682007,
2735
+ "learning_rate": 1.230850350817253e-06,
2736
+ "entropy": 0.6040398806333542,
2737
+ "num_tokens": 31669977.0,
2738
+ "mean_token_accuracy": 0.8400009006261826,
2739
+ "epoch": 0.9321313148494642,
2740
+ "step": 2740
2741
+ },
2742
+ {
2743
+ "loss": 0.5967,
2744
+ "grad_norm": 0.3939131200313568,
2745
+ "learning_rate": 1.1118689713946529e-06,
2746
+ "entropy": 0.5958102196455002,
2747
+ "num_tokens": 31790332.0,
2748
+ "mean_token_accuracy": 0.8398540407419205,
2749
+ "epoch": 0.9355332539547542,
2750
+ "step": 2750
2751
+ },
2752
+ {
2753
+ "loss": 0.5696,
2754
+ "grad_norm": 0.38736289739608765,
2755
+ "learning_rate": 9.98869807714914e-07,
2756
+ "entropy": 0.573664478957653,
2757
+ "num_tokens": 31897948.0,
2758
+ "mean_token_accuracy": 0.8483406931161881,
2759
+ "epoch": 0.9389351930600442,
2760
+ "step": 2760
2761
+ },
2762
+ {
2763
+ "loss": 0.6001,
2764
+ "grad_norm": 0.35055264830589294,
2765
+ "learning_rate": 8.918666869659642e-07,
2766
+ "entropy": 0.6001556605100632,
2767
+ "num_tokens": 32026504.0,
2768
+ "mean_token_accuracy": 0.8378526538610458,
2769
+ "epoch": 0.9423371321653342,
2770
+ "step": 2770
2771
+ },
2772
+ {
2773
+ "loss": 0.5759,
2774
+ "grad_norm": 0.39700019359588623,
2775
+ "learning_rate": 7.908727026275597e-07,
2776
+ "entropy": 0.5858756378293037,
2777
+ "num_tokens": 32140548.0,
2778
+ "mean_token_accuracy": 0.8406315892934799,
2779
+ "epoch": 0.9457390712706243,
2780
+ "step": 2780
2781
+ },
2782
+ {
2783
+ "loss": 0.5991,
2784
+ "grad_norm": 0.39454156160354614,
2785
+ "learning_rate": 6.959002128690606e-07,
2786
+ "entropy": 0.5967293009161949,
2787
+ "num_tokens": 32258820.0,
2788
+ "mean_token_accuracy": 0.8434277713298798,
2789
+ "epoch": 0.9491410103759143,
2790
+ "step": 2790
2791
+ },
2792
+ {
2793
+ "loss": 0.5897,
2794
+ "grad_norm": 0.42352262139320374,
2795
+ "learning_rate": 6.069608390372294e-07,
2796
+ "entropy": 0.5910201713442802,
2797
+ "num_tokens": 32370310.0,
2798
+ "mean_token_accuracy": 0.8437776923179626,
2799
+ "epoch": 0.9525429494812043,
2800
+ "step": 2800
2801
+ },
2802
+ {
2803
+ "loss": 0.5704,
2804
+ "grad_norm": 0.33332422375679016,
2805
+ "learning_rate": 5.240654642341913e-07,
2806
+ "entropy": 0.5700356841087342,
2807
+ "num_tokens": 32488521.0,
2808
+ "mean_token_accuracy": 0.8457501322031021,
2809
+ "epoch": 0.9559448885864943,
2810
+ "step": 2810
2811
+ },
2812
+ {
2813
+ "loss": 0.5863,
2814
+ "grad_norm": 0.3759448826313019,
2815
+ "learning_rate": 4.4722423198568785e-07,
2816
+ "entropy": 0.5873485445976258,
2817
+ "num_tokens": 32604586.0,
2818
+ "mean_token_accuracy": 0.8435898423194885,
2819
+ "epoch": 0.9593468276917844,
2820
+ "step": 2820
2821
+ },
2822
+ {
2823
+ "loss": 0.5712,
2824
+ "grad_norm": 0.3626145124435425,
2825
+ "learning_rate": 3.764465449999033e-07,
2826
+ "entropy": 0.5771880641579628,
2827
+ "num_tokens": 32720534.0,
2828
+ "mean_token_accuracy": 0.8457781076431274,
2829
+ "epoch": 0.9627487667970743,
2830
+ "step": 2830
2831
+ },
2832
+ {
2833
+ "loss": 0.5665,
2834
+ "grad_norm": 0.3542146384716034,
2835
+ "learning_rate": 3.1174106401686274e-07,
2836
+ "entropy": 0.5678144678473472,
2837
+ "num_tokens": 32842877.0,
2838
+ "mean_token_accuracy": 0.8448899626731873,
2839
+ "epoch": 0.9661507059023643,
2840
+ "step": 2840
2841
+ },
2842
+ {
2843
+ "loss": 0.5471,
2844
+ "grad_norm": 0.33239731192588806,
2845
+ "learning_rate": 2.5311570674867444e-07,
2846
+ "entropy": 0.5530853852629661,
2847
+ "num_tokens": 32951936.0,
2848
+ "mean_token_accuracy": 0.850860059261322,
2849
+ "epoch": 0.9695526450076544,
2850
+ "step": 2850
2851
+ },
2852
+ {
2853
+ "loss": 0.5992,
2854
+ "grad_norm": 0.3345488905906677,
2855
+ "learning_rate": 2.0057764691067127e-07,
2856
+ "entropy": 0.5983069583773613,
2857
+ "num_tokens": 33071421.0,
2858
+ "mean_token_accuracy": 0.839891204237938,
2859
+ "epoch": 0.9729545841129443,
2860
+ "step": 2860
2861
+ },
2862
+ {
2863
+ "loss": 0.5621,
2864
+ "grad_norm": 0.3342396020889282,
2865
+ "learning_rate": 1.5413331334360182e-07,
2866
+ "entropy": 0.5691981092095375,
2867
+ "num_tokens": 33189424.0,
2868
+ "mean_token_accuracy": 0.8471813201904297,
2869
+ "epoch": 0.9763565232182344,
2870
+ "step": 2870
2871
+ },
2872
+ {
2873
+ "loss": 0.6094,
2874
+ "grad_norm": 0.3459620475769043,
2875
+ "learning_rate": 1.1378838922694868e-07,
2876
+ "entropy": 0.6119711980223655,
2877
+ "num_tokens": 33306193.0,
2878
+ "mean_token_accuracy": 0.8373264789581298,
2879
+ "epoch": 0.9797584623235244,
2880
+ "step": 2880
2881
+ },
2882
+ {
2883
+ "loss": 0.5924,
2884
+ "grad_norm": 0.39813292026519775,
2885
+ "learning_rate": 7.954781138351241e-08,
2886
+ "entropy": 0.5946994125843048,
2887
+ "num_tokens": 33423283.0,
2888
+ "mean_token_accuracy": 0.8413377672433853,
2889
+ "epoch": 0.9831604014288144,
2890
+ "step": 2890
2891
+ },
2892
+ {
2893
+ "loss": 0.559,
2894
+ "grad_norm": 0.38787519931793213,
2895
+ "learning_rate": 5.141576967533368e-08,
2896
+ "entropy": 0.5609085857868195,
2897
+ "num_tokens": 33538638.0,
2898
+ "mean_token_accuracy": 0.8481513082981109,
2899
+ "epoch": 0.9865623405341044,
2900
+ "step": 2900
2901
+ },
2902
+ {
2903
+ "loss": 0.5985,
2904
+ "grad_norm": 0.4220837354660034,
2905
+ "learning_rate": 2.9395706490958907e-08,
2906
+ "entropy": 0.5978995203971863,
2907
+ "num_tokens": 33653798.0,
2908
+ "mean_token_accuracy": 0.8422107011079788,
2909
+ "epoch": 0.9899642796393945,
2910
+ "step": 2910
2911
+ },
2912
+ {
2913
+ "loss": 0.5694,
2914
+ "grad_norm": 0.3267785906791687,
2915
+ "learning_rate": 1.3490316324249463e-08,
2916
+ "entropy": 0.5705890819430351,
2917
+ "num_tokens": 33766910.0,
2918
+ "mean_token_accuracy": 0.8483996599912643,
2919
+ "epoch": 0.9933662187446844,
2920
+ "step": 2920
2921
+ },
2922
+ {
2923
+ "loss": 0.5656,
2924
+ "grad_norm": 0.38315078616142273,
2925
+ "learning_rate": 3.7015454446398536e-09,
2926
+ "entropy": 0.563205997645855,
2927
+ "num_tokens": 33880183.0,
2928
+ "mean_token_accuracy": 0.8477971255779266,
2929
+ "epoch": 0.9967681578499745,
2930
+ "step": 2930
2931
+ },
2932
+ {
2933
+ "loss": 0.565,
2934
+ "grad_norm": 0.5500030517578125,
2935
+ "learning_rate": 3.059165900598515e-11,
2936
+ "entropy": 0.5684452543133184,
2937
+ "num_tokens": 33988110.0,
2938
+ "mean_token_accuracy": 0.8478908413334897,
2939
+ "epoch": 1.0,
2940
+ "step": 2940
2941
+ },
2942
+ {
2943
+ "train_runtime": 22288.508,
2944
+ "train_samples_per_second": 4.22,
2945
+ "train_steps_per_second": 0.132,
2946
+ "total_flos": 2.46979541812514e+19,
2947
+ "train_loss": 0.6280506309197873,
2948
+ "epoch": 1.0,
2949
+ "step": 2940
2950
+ }
2951
+ ]