mikeumus-divincian commited on
Commit
c2b5063
·
verified ·
1 Parent(s): 2637e2e

Add phase1_moe_svd.json

Browse files
Files changed (1) hide show
  1. phase1_moe_svd.json +1814 -0
phase1_moe_svd.json ADDED
@@ -0,0 +1,1814 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "0": {
3
+ "layer": 0,
4
+ "is_moe": false,
5
+ "dense": {
6
+ "var64": 0.0373,
7
+ "s0": 10.7,
8
+ "shape": [
9
+ 7168,
10
+ 18432
11
+ ]
12
+ }
13
+ },
14
+ "1": {
15
+ "layer": 1,
16
+ "is_moe": true,
17
+ "routed_experts": {
18
+ "median_var64": 0.0824,
19
+ "q25_var64": 0.0758,
20
+ "q75_var64": 0.0866,
21
+ "mean_s0": 4.69,
22
+ "std_s0": 0.95,
23
+ "mean_s0_ratio": 1.11,
24
+ "n_experts": 384
25
+ },
26
+ "shared_expert": {
27
+ "var64": 0.1127,
28
+ "s0": 8.05,
29
+ "shape": [
30
+ 7168,
31
+ 2048
32
+ ]
33
+ },
34
+ "router": {
35
+ "var64": 0.5682,
36
+ "s0": 26.89,
37
+ "s0_s1": 1.25,
38
+ "shape": [
39
+ 384,
40
+ 7168
41
+ ]
42
+ }
43
+ },
44
+ "2": {
45
+ "layer": 2,
46
+ "is_moe": true,
47
+ "routed_experts": {
48
+ "median_var64": 0.0883,
49
+ "q25_var64": 0.0827,
50
+ "q75_var64": 0.0918,
51
+ "mean_s0": 5.09,
52
+ "std_s0": 0.81,
53
+ "mean_s0_ratio": 1.14,
54
+ "n_experts": 384
55
+ },
56
+ "shared_expert": {
57
+ "var64": 0.1041,
58
+ "s0": 7.36,
59
+ "shape": [
60
+ 7168,
61
+ 2048
62
+ ]
63
+ },
64
+ "router": {
65
+ "var64": 0.393,
66
+ "s0": 14.9,
67
+ "s0_s1": 1.0,
68
+ "shape": [
69
+ 384,
70
+ 7168
71
+ ]
72
+ }
73
+ },
74
+ "3": {
75
+ "layer": 3,
76
+ "is_moe": true,
77
+ "routed_experts": {
78
+ "median_var64": 0.0901,
79
+ "q25_var64": 0.0868,
80
+ "q75_var64": 0.0934,
81
+ "mean_s0": 5.05,
82
+ "std_s0": 0.61,
83
+ "mean_s0_ratio": 1.12,
84
+ "n_experts": 384
85
+ },
86
+ "shared_expert": {
87
+ "var64": 0.127,
88
+ "s0": 7.47,
89
+ "shape": [
90
+ 7168,
91
+ 2048
92
+ ]
93
+ },
94
+ "router": {
95
+ "var64": 0.3403,
96
+ "s0": 15.09,
97
+ "s0_s1": 1.06,
98
+ "shape": [
99
+ 384,
100
+ 7168
101
+ ]
102
+ }
103
+ },
104
+ "4": {
105
+ "layer": 4,
106
+ "is_moe": true,
107
+ "routed_experts": {
108
+ "median_var64": 0.0906,
109
+ "q25_var64": 0.0869,
110
+ "q75_var64": 0.0966,
111
+ "mean_s0": 5.3,
112
+ "std_s0": 0.68,
113
+ "mean_s0_ratio": 1.15,
114
+ "n_experts": 384
115
+ },
116
+ "shared_expert": {
117
+ "var64": 0.1212,
118
+ "s0": 6.56,
119
+ "shape": [
120
+ 7168,
121
+ 2048
122
+ ]
123
+ },
124
+ "router": {
125
+ "var64": 0.35,
126
+ "s0": 15.06,
127
+ "s0_s1": 1.08,
128
+ "shape": [
129
+ 384,
130
+ 7168
131
+ ]
132
+ }
133
+ },
134
+ "5": {
135
+ "layer": 5,
136
+ "is_moe": true,
137
+ "routed_experts": {
138
+ "median_var64": 0.0888,
139
+ "q25_var64": 0.0849,
140
+ "q75_var64": 0.0935,
141
+ "mean_s0": 5.25,
142
+ "std_s0": 0.71,
143
+ "mean_s0_ratio": 1.15,
144
+ "n_experts": 384
145
+ },
146
+ "shared_expert": {
147
+ "var64": 0.1513,
148
+ "s0": 7.33,
149
+ "shape": [
150
+ 7168,
151
+ 2048
152
+ ]
153
+ },
154
+ "router": {
155
+ "var64": 0.3367,
156
+ "s0": 15.28,
157
+ "s0_s1": 1.21,
158
+ "shape": [
159
+ 384,
160
+ 7168
161
+ ]
162
+ }
163
+ },
164
+ "6": {
165
+ "layer": 6,
166
+ "is_moe": true,
167
+ "routed_experts": {
168
+ "median_var64": 0.0865,
169
+ "q25_var64": 0.083,
170
+ "q75_var64": 0.0911,
171
+ "mean_s0": 5.0,
172
+ "std_s0": 0.67,
173
+ "mean_s0_ratio": 1.15,
174
+ "n_experts": 384
175
+ },
176
+ "shared_expert": {
177
+ "var64": 0.1625,
178
+ "s0": 7.12,
179
+ "shape": [
180
+ 7168,
181
+ 2048
182
+ ]
183
+ },
184
+ "router": {
185
+ "var64": 0.3258,
186
+ "s0": 13.32,
187
+ "s0_s1": 1.08,
188
+ "shape": [
189
+ 384,
190
+ 7168
191
+ ]
192
+ }
193
+ },
194
+ "7": {
195
+ "layer": 7,
196
+ "is_moe": true,
197
+ "routed_experts": {
198
+ "median_var64": 0.0885,
199
+ "q25_var64": 0.0845,
200
+ "q75_var64": 0.0929,
201
+ "mean_s0": 5.11,
202
+ "std_s0": 0.66,
203
+ "mean_s0_ratio": 1.15,
204
+ "n_experts": 384
205
+ },
206
+ "shared_expert": {
207
+ "var64": 0.1665,
208
+ "s0": 6.61,
209
+ "shape": [
210
+ 7168,
211
+ 2048
212
+ ]
213
+ },
214
+ "router": {
215
+ "var64": 0.3163,
216
+ "s0": 12.09,
217
+ "s0_s1": 1.05,
218
+ "shape": [
219
+ 384,
220
+ 7168
221
+ ]
222
+ }
223
+ },
224
+ "8": {
225
+ "layer": 8,
226
+ "is_moe": true,
227
+ "routed_experts": {
228
+ "median_var64": 0.0898,
229
+ "q25_var64": 0.085,
230
+ "q75_var64": 0.0943,
231
+ "mean_s0": 5.17,
232
+ "std_s0": 0.66,
233
+ "mean_s0_ratio": 1.16,
234
+ "n_experts": 384
235
+ },
236
+ "shared_expert": {
237
+ "var64": 0.1639,
238
+ "s0": 6.76,
239
+ "shape": [
240
+ 7168,
241
+ 2048
242
+ ]
243
+ },
244
+ "router": {
245
+ "var64": 0.3089,
246
+ "s0": 12.21,
247
+ "s0_s1": 1.1,
248
+ "shape": [
249
+ 384,
250
+ 7168
251
+ ]
252
+ }
253
+ },
254
+ "9": {
255
+ "layer": 9,
256
+ "is_moe": true,
257
+ "routed_experts": {
258
+ "median_var64": 0.0912,
259
+ "q25_var64": 0.0867,
260
+ "q75_var64": 0.0967,
261
+ "mean_s0": 5.28,
262
+ "std_s0": 0.71,
263
+ "mean_s0_ratio": 1.17,
264
+ "n_experts": 384
265
+ },
266
+ "shared_expert": {
267
+ "var64": 0.1638,
268
+ "s0": 6.9,
269
+ "shape": [
270
+ 7168,
271
+ 2048
272
+ ]
273
+ },
274
+ "router": {
275
+ "var64": 0.3083,
276
+ "s0": 12.38,
277
+ "s0_s1": 1.1,
278
+ "shape": [
279
+ 384,
280
+ 7168
281
+ ]
282
+ }
283
+ },
284
+ "10": {
285
+ "layer": 10,
286
+ "is_moe": true,
287
+ "routed_experts": {
288
+ "median_var64": 0.0919,
289
+ "q25_var64": 0.0873,
290
+ "q75_var64": 0.0985,
291
+ "mean_s0": 5.42,
292
+ "std_s0": 0.8,
293
+ "mean_s0_ratio": 1.18,
294
+ "n_experts": 384
295
+ },
296
+ "shared_expert": {
297
+ "var64": 0.1611,
298
+ "s0": 6.26,
299
+ "shape": [
300
+ 7168,
301
+ 2048
302
+ ]
303
+ },
304
+ "router": {
305
+ "var64": 0.3106,
306
+ "s0": 11.63,
307
+ "s0_s1": 1.08,
308
+ "shape": [
309
+ 384,
310
+ 7168
311
+ ]
312
+ }
313
+ },
314
+ "11": {
315
+ "layer": 11,
316
+ "is_moe": true,
317
+ "routed_experts": {
318
+ "median_var64": 0.0946,
319
+ "q25_var64": 0.0885,
320
+ "q75_var64": 0.1006,
321
+ "mean_s0": 5.56,
322
+ "std_s0": 0.89,
323
+ "mean_s0_ratio": 1.19,
324
+ "n_experts": 384
325
+ },
326
+ "shared_expert": {
327
+ "var64": 0.1712,
328
+ "s0": 6.68,
329
+ "shape": [
330
+ 7168,
331
+ 2048
332
+ ]
333
+ },
334
+ "router": {
335
+ "var64": 0.3061,
336
+ "s0": 11.2,
337
+ "s0_s1": 1.04,
338
+ "shape": [
339
+ 384,
340
+ 7168
341
+ ]
342
+ }
343
+ },
344
+ "12": {
345
+ "layer": 12,
346
+ "is_moe": true,
347
+ "routed_experts": {
348
+ "median_var64": 0.098,
349
+ "q25_var64": 0.0915,
350
+ "q75_var64": 0.1053,
351
+ "mean_s0": 5.8,
352
+ "std_s0": 0.91,
353
+ "mean_s0_ratio": 1.2,
354
+ "n_experts": 384
355
+ },
356
+ "shared_expert": {
357
+ "var64": 0.1517,
358
+ "s0": 6.36,
359
+ "shape": [
360
+ 7168,
361
+ 2048
362
+ ]
363
+ },
364
+ "router": {
365
+ "var64": 0.3161,
366
+ "s0": 10.69,
367
+ "s0_s1": 1.04,
368
+ "shape": [
369
+ 384,
370
+ 7168
371
+ ]
372
+ }
373
+ },
374
+ "13": {
375
+ "layer": 13,
376
+ "is_moe": true,
377
+ "routed_experts": {
378
+ "median_var64": 0.1008,
379
+ "q25_var64": 0.0926,
380
+ "q75_var64": 0.1085,
381
+ "mean_s0": 6.03,
382
+ "std_s0": 1.01,
383
+ "mean_s0_ratio": 1.23,
384
+ "n_experts": 384
385
+ },
386
+ "shared_expert": {
387
+ "var64": 0.1701,
388
+ "s0": 7.06,
389
+ "shape": [
390
+ 7168,
391
+ 2048
392
+ ]
393
+ },
394
+ "router": {
395
+ "var64": 0.3245,
396
+ "s0": 10.72,
397
+ "s0_s1": 1.06,
398
+ "shape": [
399
+ 384,
400
+ 7168
401
+ ]
402
+ }
403
+ },
404
+ "14": {
405
+ "layer": 14,
406
+ "is_moe": true,
407
+ "routed_experts": {
408
+ "median_var64": 0.1007,
409
+ "q25_var64": 0.0912,
410
+ "q75_var64": 0.109,
411
+ "mean_s0": 6.11,
412
+ "std_s0": 1.04,
413
+ "mean_s0_ratio": 1.24,
414
+ "n_experts": 384
415
+ },
416
+ "shared_expert": {
417
+ "var64": 0.1624,
418
+ "s0": 6.23,
419
+ "shape": [
420
+ 7168,
421
+ 2048
422
+ ]
423
+ },
424
+ "router": {
425
+ "var64": 0.3245,
426
+ "s0": 10.31,
427
+ "s0_s1": 1.04,
428
+ "shape": [
429
+ 384,
430
+ 7168
431
+ ]
432
+ }
433
+ },
434
+ "15": {
435
+ "layer": 15,
436
+ "is_moe": true,
437
+ "routed_experts": {
438
+ "median_var64": 0.1006,
439
+ "q25_var64": 0.0902,
440
+ "q75_var64": 0.1102,
441
+ "mean_s0": 6.21,
442
+ "std_s0": 1.14,
443
+ "mean_s0_ratio": 1.24,
444
+ "n_experts": 384
445
+ },
446
+ "shared_expert": {
447
+ "var64": 0.1904,
448
+ "s0": 7.38,
449
+ "shape": [
450
+ 7168,
451
+ 2048
452
+ ]
453
+ },
454
+ "router": {
455
+ "var64": 0.3269,
456
+ "s0": 10.12,
457
+ "s0_s1": 1.02,
458
+ "shape": [
459
+ 384,
460
+ 7168
461
+ ]
462
+ }
463
+ },
464
+ "16": {
465
+ "layer": 16,
466
+ "is_moe": true,
467
+ "routed_experts": {
468
+ "median_var64": 0.0989,
469
+ "q25_var64": 0.0897,
470
+ "q75_var64": 0.1117,
471
+ "mean_s0": 6.12,
472
+ "std_s0": 1.13,
473
+ "mean_s0_ratio": 1.23,
474
+ "n_experts": 384
475
+ },
476
+ "shared_expert": {
477
+ "var64": 0.1764,
478
+ "s0": 7.03,
479
+ "shape": [
480
+ 7168,
481
+ 2048
482
+ ]
483
+ },
484
+ "router": {
485
+ "var64": 0.328,
486
+ "s0": 10.11,
487
+ "s0_s1": 1.08,
488
+ "shape": [
489
+ 384,
490
+ 7168
491
+ ]
492
+ }
493
+ },
494
+ "17": {
495
+ "layer": 17,
496
+ "is_moe": true,
497
+ "routed_experts": {
498
+ "median_var64": 0.1036,
499
+ "q25_var64": 0.0912,
500
+ "q75_var64": 0.1152,
501
+ "mean_s0": 6.38,
502
+ "std_s0": 1.24,
503
+ "mean_s0_ratio": 1.24,
504
+ "n_experts": 384
505
+ },
506
+ "shared_expert": {
507
+ "var64": 0.1793,
508
+ "s0": 7.02,
509
+ "shape": [
510
+ 7168,
511
+ 2048
512
+ ]
513
+ },
514
+ "router": {
515
+ "var64": 0.3172,
516
+ "s0": 9.52,
517
+ "s0_s1": 1.07,
518
+ "shape": [
519
+ 384,
520
+ 7168
521
+ ]
522
+ }
523
+ },
524
+ "18": {
525
+ "layer": 18,
526
+ "is_moe": true,
527
+ "routed_experts": {
528
+ "median_var64": 0.1039,
529
+ "q25_var64": 0.0895,
530
+ "q75_var64": 0.1176,
531
+ "mean_s0": 6.36,
532
+ "std_s0": 1.3,
533
+ "mean_s0_ratio": 1.23,
534
+ "n_experts": 384
535
+ },
536
+ "shared_expert": {
537
+ "var64": 0.2005,
538
+ "s0": 7.53,
539
+ "shape": [
540
+ 7168,
541
+ 2048
542
+ ]
543
+ },
544
+ "router": {
545
+ "var64": 0.3238,
546
+ "s0": 9.68,
547
+ "s0_s1": 1.1,
548
+ "shape": [
549
+ 384,
550
+ 7168
551
+ ]
552
+ }
553
+ },
554
+ "19": {
555
+ "layer": 19,
556
+ "is_moe": true,
557
+ "routed_experts": {
558
+ "median_var64": 0.1056,
559
+ "q25_var64": 0.0925,
560
+ "q75_var64": 0.121,
561
+ "mean_s0": 6.44,
562
+ "std_s0": 1.32,
563
+ "mean_s0_ratio": 1.22,
564
+ "n_experts": 384
565
+ },
566
+ "shared_expert": {
567
+ "var64": 0.1877,
568
+ "s0": 7.54,
569
+ "shape": [
570
+ 7168,
571
+ 2048
572
+ ]
573
+ },
574
+ "router": {
575
+ "var64": 0.317,
576
+ "s0": 9.43,
577
+ "s0_s1": 1.12,
578
+ "shape": [
579
+ 384,
580
+ 7168
581
+ ]
582
+ }
583
+ },
584
+ "20": {
585
+ "layer": 20,
586
+ "is_moe": true,
587
+ "routed_experts": {
588
+ "median_var64": 0.1053,
589
+ "q25_var64": 0.0901,
590
+ "q75_var64": 0.122,
591
+ "mean_s0": 6.56,
592
+ "std_s0": 1.42,
593
+ "mean_s0_ratio": 1.25,
594
+ "n_experts": 384
595
+ },
596
+ "shared_expert": {
597
+ "var64": 0.1874,
598
+ "s0": 7.33,
599
+ "shape": [
600
+ 7168,
601
+ 2048
602
+ ]
603
+ },
604
+ "router": {
605
+ "var64": 0.3196,
606
+ "s0": 9.25,
607
+ "s0_s1": 1.14,
608
+ "shape": [
609
+ 384,
610
+ 7168
611
+ ]
612
+ }
613
+ },
614
+ "21": {
615
+ "layer": 21,
616
+ "is_moe": true,
617
+ "routed_experts": {
618
+ "median_var64": 0.107,
619
+ "q25_var64": 0.0927,
620
+ "q75_var64": 0.124,
621
+ "mean_s0": 6.66,
622
+ "std_s0": 1.46,
623
+ "mean_s0_ratio": 1.24,
624
+ "n_experts": 384
625
+ },
626
+ "shared_expert": {
627
+ "var64": 0.1809,
628
+ "s0": 8.29,
629
+ "shape": [
630
+ 7168,
631
+ 2048
632
+ ]
633
+ },
634
+ "router": {
635
+ "var64": 0.3178,
636
+ "s0": 8.62,
637
+ "s0_s1": 1.08,
638
+ "shape": [
639
+ 384,
640
+ 7168
641
+ ]
642
+ }
643
+ },
644
+ "22": {
645
+ "layer": 22,
646
+ "is_moe": true,
647
+ "routed_experts": {
648
+ "median_var64": 0.1082,
649
+ "q25_var64": 0.0912,
650
+ "q75_var64": 0.1254,
651
+ "mean_s0": 6.84,
652
+ "std_s0": 1.6,
653
+ "mean_s0_ratio": 1.24,
654
+ "n_experts": 384
655
+ },
656
+ "shared_expert": {
657
+ "var64": 0.1963,
658
+ "s0": 8.41,
659
+ "shape": [
660
+ 7168,
661
+ 2048
662
+ ]
663
+ },
664
+ "router": {
665
+ "var64": 0.3277,
666
+ "s0": 9.11,
667
+ "s0_s1": 1.2,
668
+ "shape": [
669
+ 384,
670
+ 7168
671
+ ]
672
+ }
673
+ },
674
+ "23": {
675
+ "layer": 23,
676
+ "is_moe": true,
677
+ "routed_experts": {
678
+ "median_var64": 0.105,
679
+ "q25_var64": 0.0857,
680
+ "q75_var64": 0.1226,
681
+ "mean_s0": 6.57,
682
+ "std_s0": 1.63,
683
+ "mean_s0_ratio": 1.23,
684
+ "n_experts": 384
685
+ },
686
+ "shared_expert": {
687
+ "var64": 0.1836,
688
+ "s0": 7.33,
689
+ "shape": [
690
+ 7168,
691
+ 2048
692
+ ]
693
+ },
694
+ "router": {
695
+ "var64": 0.3275,
696
+ "s0": 9.25,
697
+ "s0_s1": 1.25,
698
+ "shape": [
699
+ 384,
700
+ 7168
701
+ ]
702
+ }
703
+ },
704
+ "24": {
705
+ "layer": 24,
706
+ "is_moe": true,
707
+ "routed_experts": {
708
+ "median_var64": 0.1066,
709
+ "q25_var64": 0.0887,
710
+ "q75_var64": 0.1278,
711
+ "mean_s0": 6.81,
712
+ "std_s0": 1.67,
713
+ "mean_s0_ratio": 1.23,
714
+ "n_experts": 384
715
+ },
716
+ "shared_expert": {
717
+ "var64": 0.201,
718
+ "s0": 8.72,
719
+ "shape": [
720
+ 7168,
721
+ 2048
722
+ ]
723
+ },
724
+ "router": {
725
+ "var64": 0.331,
726
+ "s0": 8.97,
727
+ "s0_s1": 1.15,
728
+ "shape": [
729
+ 384,
730
+ 7168
731
+ ]
732
+ }
733
+ },
734
+ "25": {
735
+ "layer": 25,
736
+ "is_moe": true,
737
+ "routed_experts": {
738
+ "median_var64": 0.1016,
739
+ "q25_var64": 0.0864,
740
+ "q75_var64": 0.1222,
741
+ "mean_s0": 6.68,
742
+ "std_s0": 1.76,
743
+ "mean_s0_ratio": 1.23,
744
+ "n_experts": 384
745
+ },
746
+ "shared_expert": {
747
+ "var64": 0.2033,
748
+ "s0": 8.59,
749
+ "shape": [
750
+ 7168,
751
+ 2048
752
+ ]
753
+ },
754
+ "router": {
755
+ "var64": 0.3307,
756
+ "s0": 8.75,
757
+ "s0_s1": 1.09,
758
+ "shape": [
759
+ 384,
760
+ 7168
761
+ ]
762
+ }
763
+ },
764
+ "26": {
765
+ "layer": 26,
766
+ "is_moe": true,
767
+ "routed_experts": {
768
+ "median_var64": 0.1003,
769
+ "q25_var64": 0.0852,
770
+ "q75_var64": 0.1226,
771
+ "mean_s0": 6.65,
772
+ "std_s0": 1.71,
773
+ "mean_s0_ratio": 1.24,
774
+ "n_experts": 384
775
+ },
776
+ "shared_expert": {
777
+ "var64": 0.1798,
778
+ "s0": 7.47,
779
+ "shape": [
780
+ 7168,
781
+ 2048
782
+ ]
783
+ },
784
+ "router": {
785
+ "var64": 0.3257,
786
+ "s0": 8.48,
787
+ "s0_s1": 1.11,
788
+ "shape": [
789
+ 384,
790
+ 7168
791
+ ]
792
+ }
793
+ },
794
+ "27": {
795
+ "layer": 27,
796
+ "is_moe": true,
797
+ "routed_experts": {
798
+ "median_var64": 0.0973,
799
+ "q25_var64": 0.0849,
800
+ "q75_var64": 0.1213,
801
+ "mean_s0": 6.56,
802
+ "std_s0": 1.75,
803
+ "mean_s0_ratio": 1.21,
804
+ "n_experts": 384
805
+ },
806
+ "shared_expert": {
807
+ "var64": 0.2018,
808
+ "s0": 9.15,
809
+ "shape": [
810
+ 7168,
811
+ 2048
812
+ ]
813
+ },
814
+ "router": {
815
+ "var64": 0.3246,
816
+ "s0": 8.54,
817
+ "s0_s1": 1.09,
818
+ "shape": [
819
+ 384,
820
+ 7168
821
+ ]
822
+ }
823
+ },
824
+ "28": {
825
+ "layer": 28,
826
+ "is_moe": true,
827
+ "routed_experts": {
828
+ "median_var64": 0.0938,
829
+ "q25_var64": 0.084,
830
+ "q75_var64": 0.1134,
831
+ "mean_s0": 6.38,
832
+ "std_s0": 1.69,
833
+ "mean_s0_ratio": 1.22,
834
+ "n_experts": 384
835
+ },
836
+ "shared_expert": {
837
+ "var64": 0.1913,
838
+ "s0": 8.05,
839
+ "shape": [
840
+ 7168,
841
+ 2048
842
+ ]
843
+ },
844
+ "router": {
845
+ "var64": 0.3196,
846
+ "s0": 8.99,
847
+ "s0_s1": 1.21,
848
+ "shape": [
849
+ 384,
850
+ 7168
851
+ ]
852
+ }
853
+ },
854
+ "29": {
855
+ "layer": 29,
856
+ "is_moe": true,
857
+ "routed_experts": {
858
+ "median_var64": 0.0923,
859
+ "q25_var64": 0.0816,
860
+ "q75_var64": 0.1099,
861
+ "mean_s0": 6.17,
862
+ "std_s0": 1.64,
863
+ "mean_s0_ratio": 1.2,
864
+ "n_experts": 384
865
+ },
866
+ "shared_expert": {
867
+ "var64": 0.2067,
868
+ "s0": 9.68,
869
+ "shape": [
870
+ 7168,
871
+ 2048
872
+ ]
873
+ },
874
+ "router": {
875
+ "var64": 0.3241,
876
+ "s0": 8.27,
877
+ "s0_s1": 1.11,
878
+ "shape": [
879
+ 384,
880
+ 7168
881
+ ]
882
+ }
883
+ },
884
+ "30": {
885
+ "layer": 30,
886
+ "is_moe": true,
887
+ "routed_experts": {
888
+ "median_var64": 0.0887,
889
+ "q25_var64": 0.0808,
890
+ "q75_var64": 0.1053,
891
+ "mean_s0": 5.97,
892
+ "std_s0": 1.51,
893
+ "mean_s0_ratio": 1.19,
894
+ "n_experts": 384
895
+ },
896
+ "shared_expert": {
897
+ "var64": 0.198,
898
+ "s0": 8.47,
899
+ "shape": [
900
+ 7168,
901
+ 2048
902
+ ]
903
+ },
904
+ "router": {
905
+ "var64": 0.3189,
906
+ "s0": 7.42,
907
+ "s0_s1": 1.03,
908
+ "shape": [
909
+ 384,
910
+ 7168
911
+ ]
912
+ }
913
+ },
914
+ "31": {
915
+ "layer": 31,
916
+ "is_moe": true,
917
+ "routed_experts": {
918
+ "median_var64": 0.0882,
919
+ "q25_var64": 0.0805,
920
+ "q75_var64": 0.1005,
921
+ "mean_s0": 5.87,
922
+ "std_s0": 1.46,
923
+ "mean_s0_ratio": 1.2,
924
+ "n_experts": 384
925
+ },
926
+ "shared_expert": {
927
+ "var64": 0.1928,
928
+ "s0": 7.81,
929
+ "shape": [
930
+ 7168,
931
+ 2048
932
+ ]
933
+ },
934
+ "router": {
935
+ "var64": 0.3221,
936
+ "s0": 7.39,
937
+ "s0_s1": 1.04,
938
+ "shape": [
939
+ 384,
940
+ 7168
941
+ ]
942
+ }
943
+ },
944
+ "32": {
945
+ "layer": 32,
946
+ "is_moe": true,
947
+ "routed_experts": {
948
+ "median_var64": 0.0854,
949
+ "q25_var64": 0.08,
950
+ "q75_var64": 0.099,
951
+ "mean_s0": 5.76,
952
+ "std_s0": 1.42,
953
+ "mean_s0_ratio": 1.2,
954
+ "n_experts": 384
955
+ },
956
+ "shared_expert": {
957
+ "var64": 0.1734,
958
+ "s0": 6.74,
959
+ "shape": [
960
+ 7168,
961
+ 2048
962
+ ]
963
+ },
964
+ "router": {
965
+ "var64": 0.3212,
966
+ "s0": 7.2,
967
+ "s0_s1": 1.01,
968
+ "shape": [
969
+ 384,
970
+ 7168
971
+ ]
972
+ }
973
+ },
974
+ "33": {
975
+ "layer": 33,
976
+ "is_moe": true,
977
+ "routed_experts": {
978
+ "median_var64": 0.0863,
979
+ "q25_var64": 0.0797,
980
+ "q75_var64": 0.0977,
981
+ "mean_s0": 5.69,
982
+ "std_s0": 1.39,
983
+ "mean_s0_ratio": 1.18,
984
+ "n_experts": 384
985
+ },
986
+ "shared_expert": {
987
+ "var64": 0.1844,
988
+ "s0": 7.76,
989
+ "shape": [
990
+ 7168,
991
+ 2048
992
+ ]
993
+ },
994
+ "router": {
995
+ "var64": 0.3126,
996
+ "s0": 6.95,
997
+ "s0_s1": 1.02,
998
+ "shape": [
999
+ 384,
1000
+ 7168
1001
+ ]
1002
+ }
1003
+ },
1004
+ "34": {
1005
+ "layer": 34,
1006
+ "is_moe": true,
1007
+ "routed_experts": {
1008
+ "median_var64": 0.0855,
1009
+ "q25_var64": 0.0797,
1010
+ "q75_var64": 0.0937,
1011
+ "mean_s0": 5.47,
1012
+ "std_s0": 1.27,
1013
+ "mean_s0_ratio": 1.16,
1014
+ "n_experts": 384
1015
+ },
1016
+ "shared_expert": {
1017
+ "var64": 0.1759,
1018
+ "s0": 7.64,
1019
+ "shape": [
1020
+ 7168,
1021
+ 2048
1022
+ ]
1023
+ },
1024
+ "router": {
1025
+ "var64": 0.3174,
1026
+ "s0": 7.07,
1027
+ "s0_s1": 1.04,
1028
+ "shape": [
1029
+ 384,
1030
+ 7168
1031
+ ]
1032
+ }
1033
+ },
1034
+ "35": {
1035
+ "layer": 35,
1036
+ "is_moe": true,
1037
+ "routed_experts": {
1038
+ "median_var64": 0.0835,
1039
+ "q25_var64": 0.0798,
1040
+ "q75_var64": 0.0937,
1041
+ "mean_s0": 5.36,
1042
+ "std_s0": 1.24,
1043
+ "mean_s0_ratio": 1.15,
1044
+ "n_experts": 384
1045
+ },
1046
+ "shared_expert": {
1047
+ "var64": 0.1869,
1048
+ "s0": 8.0,
1049
+ "shape": [
1050
+ 7168,
1051
+ 2048
1052
+ ]
1053
+ },
1054
+ "router": {
1055
+ "var64": 0.3224,
1056
+ "s0": 7.25,
1057
+ "s0_s1": 1.08,
1058
+ "shape": [
1059
+ 384,
1060
+ 7168
1061
+ ]
1062
+ }
1063
+ },
1064
+ "36": {
1065
+ "layer": 36,
1066
+ "is_moe": true,
1067
+ "routed_experts": {
1068
+ "median_var64": 0.0838,
1069
+ "q25_var64": 0.0797,
1070
+ "q75_var64": 0.0944,
1071
+ "mean_s0": 5.35,
1072
+ "std_s0": 1.29,
1073
+ "mean_s0_ratio": 1.15,
1074
+ "n_experts": 384
1075
+ },
1076
+ "shared_expert": {
1077
+ "var64": 0.2021,
1078
+ "s0": 8.27,
1079
+ "shape": [
1080
+ 7168,
1081
+ 2048
1082
+ ]
1083
+ },
1084
+ "router": {
1085
+ "var64": 0.3192,
1086
+ "s0": 7.19,
1087
+ "s0_s1": 1.08,
1088
+ "shape": [
1089
+ 384,
1090
+ 7168
1091
+ ]
1092
+ }
1093
+ },
1094
+ "37": {
1095
+ "layer": 37,
1096
+ "is_moe": true,
1097
+ "routed_experts": {
1098
+ "median_var64": 0.0834,
1099
+ "q25_var64": 0.0794,
1100
+ "q75_var64": 0.0939,
1101
+ "mean_s0": 5.24,
1102
+ "std_s0": 1.25,
1103
+ "mean_s0_ratio": 1.13,
1104
+ "n_experts": 384
1105
+ },
1106
+ "shared_expert": {
1107
+ "var64": 0.1936,
1108
+ "s0": 8.42,
1109
+ "shape": [
1110
+ 7168,
1111
+ 2048
1112
+ ]
1113
+ },
1114
+ "router": {
1115
+ "var64": 0.3227,
1116
+ "s0": 6.63,
1117
+ "s0_s1": 1.02,
1118
+ "shape": [
1119
+ 384,
1120
+ 7168
1121
+ ]
1122
+ }
1123
+ },
1124
+ "38": {
1125
+ "layer": 38,
1126
+ "is_moe": true,
1127
+ "routed_experts": {
1128
+ "median_var64": 0.083,
1129
+ "q25_var64": 0.0789,
1130
+ "q75_var64": 0.0904,
1131
+ "mean_s0": 5.13,
1132
+ "std_s0": 1.17,
1133
+ "mean_s0_ratio": 1.13,
1134
+ "n_experts": 384
1135
+ },
1136
+ "shared_expert": {
1137
+ "var64": 0.1899,
1138
+ "s0": 7.06,
1139
+ "shape": [
1140
+ 7168,
1141
+ 2048
1142
+ ]
1143
+ },
1144
+ "router": {
1145
+ "var64": 0.3145,
1146
+ "s0": 6.21,
1147
+ "s0_s1": 1.06,
1148
+ "shape": [
1149
+ 384,
1150
+ 7168
1151
+ ]
1152
+ }
1153
+ },
1154
+ "39": {
1155
+ "layer": 39,
1156
+ "is_moe": true,
1157
+ "routed_experts": {
1158
+ "median_var64": 0.0823,
1159
+ "q25_var64": 0.079,
1160
+ "q75_var64": 0.0911,
1161
+ "mean_s0": 5.02,
1162
+ "std_s0": 1.16,
1163
+ "mean_s0_ratio": 1.11,
1164
+ "n_experts": 384
1165
+ },
1166
+ "shared_expert": {
1167
+ "var64": 0.1876,
1168
+ "s0": 6.88,
1169
+ "shape": [
1170
+ 7168,
1171
+ 2048
1172
+ ]
1173
+ },
1174
+ "router": {
1175
+ "var64": 0.3198,
1176
+ "s0": 6.36,
1177
+ "s0_s1": 1.07,
1178
+ "shape": [
1179
+ 384,
1180
+ 7168
1181
+ ]
1182
+ }
1183
+ },
1184
+ "40": {
1185
+ "layer": 40,
1186
+ "is_moe": true,
1187
+ "routed_experts": {
1188
+ "median_var64": 0.0823,
1189
+ "q25_var64": 0.0796,
1190
+ "q75_var64": 0.0878,
1191
+ "mean_s0": 4.89,
1192
+ "std_s0": 1.02,
1193
+ "mean_s0_ratio": 1.1,
1194
+ "n_experts": 384
1195
+ },
1196
+ "shared_expert": {
1197
+ "var64": 0.1709,
1198
+ "s0": 6.94,
1199
+ "shape": [
1200
+ 7168,
1201
+ 2048
1202
+ ]
1203
+ },
1204
+ "router": {
1205
+ "var64": 0.3158,
1206
+ "s0": 6.25,
1207
+ "s0_s1": 1.07,
1208
+ "shape": [
1209
+ 384,
1210
+ 7168
1211
+ ]
1212
+ }
1213
+ },
1214
+ "41": {
1215
+ "layer": 41,
1216
+ "is_moe": true,
1217
+ "routed_experts": {
1218
+ "median_var64": 0.0829,
1219
+ "q25_var64": 0.0795,
1220
+ "q75_var64": 0.0903,
1221
+ "mean_s0": 4.91,
1222
+ "std_s0": 1.04,
1223
+ "mean_s0_ratio": 1.1,
1224
+ "n_experts": 384
1225
+ },
1226
+ "shared_expert": {
1227
+ "var64": 0.1735,
1228
+ "s0": 7.57,
1229
+ "shape": [
1230
+ 7168,
1231
+ 2048
1232
+ ]
1233
+ },
1234
+ "router": {
1235
+ "var64": 0.3159,
1236
+ "s0": 6.03,
1237
+ "s0_s1": 1.1,
1238
+ "shape": [
1239
+ 384,
1240
+ 7168
1241
+ ]
1242
+ }
1243
+ },
1244
+ "42": {
1245
+ "layer": 42,
1246
+ "is_moe": true,
1247
+ "routed_experts": {
1248
+ "median_var64": 0.0835,
1249
+ "q25_var64": 0.0802,
1250
+ "q75_var64": 0.089,
1251
+ "mean_s0": 4.93,
1252
+ "std_s0": 1.05,
1253
+ "mean_s0_ratio": 1.1,
1254
+ "n_experts": 384
1255
+ },
1256
+ "shared_expert": {
1257
+ "var64": 0.1951,
1258
+ "s0": 7.85,
1259
+ "shape": [
1260
+ 7168,
1261
+ 2048
1262
+ ]
1263
+ },
1264
+ "router": {
1265
+ "var64": 0.3156,
1266
+ "s0": 5.87,
1267
+ "s0_s1": 1.12,
1268
+ "shape": [
1269
+ 384,
1270
+ 7168
1271
+ ]
1272
+ }
1273
+ },
1274
+ "43": {
1275
+ "layer": 43,
1276
+ "is_moe": true,
1277
+ "routed_experts": {
1278
+ "median_var64": 0.0824,
1279
+ "q25_var64": 0.0786,
1280
+ "q75_var64": 0.0907,
1281
+ "mean_s0": 4.92,
1282
+ "std_s0": 1.05,
1283
+ "mean_s0_ratio": 1.1,
1284
+ "n_experts": 384
1285
+ },
1286
+ "shared_expert": {
1287
+ "var64": 0.1947,
1288
+ "s0": 7.69,
1289
+ "shape": [
1290
+ 7168,
1291
+ 2048
1292
+ ]
1293
+ },
1294
+ "router": {
1295
+ "var64": 0.3135,
1296
+ "s0": 5.64,
1297
+ "s0_s1": 1.09,
1298
+ "shape": [
1299
+ 384,
1300
+ 7168
1301
+ ]
1302
+ }
1303
+ },
1304
+ "44": {
1305
+ "layer": 44,
1306
+ "is_moe": true,
1307
+ "routed_experts": {
1308
+ "median_var64": 0.0827,
1309
+ "q25_var64": 0.0791,
1310
+ "q75_var64": 0.0905,
1311
+ "mean_s0": 5.0,
1312
+ "std_s0": 1.14,
1313
+ "mean_s0_ratio": 1.11,
1314
+ "n_experts": 384
1315
+ },
1316
+ "shared_expert": {
1317
+ "var64": 0.2133,
1318
+ "s0": 8.59,
1319
+ "shape": [
1320
+ 7168,
1321
+ 2048
1322
+ ]
1323
+ },
1324
+ "router": {
1325
+ "var64": 0.3076,
1326
+ "s0": 5.35,
1327
+ "s0_s1": 1.08,
1328
+ "shape": [
1329
+ 384,
1330
+ 7168
1331
+ ]
1332
+ }
1333
+ },
1334
+ "45": {
1335
+ "layer": 45,
1336
+ "is_moe": true,
1337
+ "routed_experts": {
1338
+ "median_var64": 0.0826,
1339
+ "q25_var64": 0.0792,
1340
+ "q75_var64": 0.0883,
1341
+ "mean_s0": 4.85,
1342
+ "std_s0": 1.01,
1343
+ "mean_s0_ratio": 1.1,
1344
+ "n_experts": 384
1345
+ },
1346
+ "shared_expert": {
1347
+ "var64": 0.2006,
1348
+ "s0": 7.54,
1349
+ "shape": [
1350
+ 7168,
1351
+ 2048
1352
+ ]
1353
+ },
1354
+ "router": {
1355
+ "var64": 0.312,
1356
+ "s0": 5.27,
1357
+ "s0_s1": 1.09,
1358
+ "shape": [
1359
+ 384,
1360
+ 7168
1361
+ ]
1362
+ }
1363
+ },
1364
+ "46": {
1365
+ "layer": 46,
1366
+ "is_moe": true,
1367
+ "routed_experts": {
1368
+ "median_var64": 0.0833,
1369
+ "q25_var64": 0.0795,
1370
+ "q75_var64": 0.0896,
1371
+ "mean_s0": 4.89,
1372
+ "std_s0": 1.05,
1373
+ "mean_s0_ratio": 1.09,
1374
+ "n_experts": 384
1375
+ },
1376
+ "shared_expert": {
1377
+ "var64": 0.1878,
1378
+ "s0": 7.28,
1379
+ "shape": [
1380
+ 7168,
1381
+ 2048
1382
+ ]
1383
+ },
1384
+ "router": {
1385
+ "var64": 0.3125,
1386
+ "s0": 5.16,
1387
+ "s0_s1": 1.1,
1388
+ "shape": [
1389
+ 384,
1390
+ 7168
1391
+ ]
1392
+ }
1393
+ },
1394
+ "47": {
1395
+ "layer": 47,
1396
+ "is_moe": true,
1397
+ "routed_experts": {
1398
+ "median_var64": 0.0833,
1399
+ "q25_var64": 0.0799,
1400
+ "q75_var64": 0.0893,
1401
+ "mean_s0": 4.79,
1402
+ "std_s0": 0.91,
1403
+ "mean_s0_ratio": 1.09,
1404
+ "n_experts": 384
1405
+ },
1406
+ "shared_expert": {
1407
+ "var64": 0.1801,
1408
+ "s0": 7.55,
1409
+ "shape": [
1410
+ 7168,
1411
+ 2048
1412
+ ]
1413
+ },
1414
+ "router": {
1415
+ "var64": 0.3063,
1416
+ "s0": 4.89,
1417
+ "s0_s1": 1.07,
1418
+ "shape": [
1419
+ 384,
1420
+ 7168
1421
+ ]
1422
+ }
1423
+ },
1424
+ "48": {
1425
+ "layer": 48,
1426
+ "is_moe": true,
1427
+ "routed_experts": {
1428
+ "median_var64": 0.0831,
1429
+ "q25_var64": 0.0806,
1430
+ "q75_var64": 0.0897,
1431
+ "mean_s0": 4.78,
1432
+ "std_s0": 0.94,
1433
+ "mean_s0_ratio": 1.08,
1434
+ "n_experts": 384
1435
+ },
1436
+ "shared_expert": {
1437
+ "var64": 0.181,
1438
+ "s0": 8.07,
1439
+ "shape": [
1440
+ 7168,
1441
+ 2048
1442
+ ]
1443
+ },
1444
+ "router": {
1445
+ "var64": 0.3137,
1446
+ "s0": 5.08,
1447
+ "s0_s1": 1.08,
1448
+ "shape": [
1449
+ 384,
1450
+ 7168
1451
+ ]
1452
+ }
1453
+ },
1454
+ "49": {
1455
+ "layer": 49,
1456
+ "is_moe": true,
1457
+ "routed_experts": {
1458
+ "median_var64": 0.0851,
1459
+ "q25_var64": 0.0812,
1460
+ "q75_var64": 0.0917,
1461
+ "mean_s0": 4.93,
1462
+ "std_s0": 1.03,
1463
+ "mean_s0_ratio": 1.09,
1464
+ "n_experts": 384
1465
+ },
1466
+ "shared_expert": {
1467
+ "var64": 0.1804,
1468
+ "s0": 7.38,
1469
+ "shape": [
1470
+ 7168,
1471
+ 2048
1472
+ ]
1473
+ },
1474
+ "router": {
1475
+ "var64": 0.3127,
1476
+ "s0": 4.81,
1477
+ "s0_s1": 1.07,
1478
+ "shape": [
1479
+ 384,
1480
+ 7168
1481
+ ]
1482
+ }
1483
+ },
1484
+ "50": {
1485
+ "layer": 50,
1486
+ "is_moe": true,
1487
+ "routed_experts": {
1488
+ "median_var64": 0.0848,
1489
+ "q25_var64": 0.0818,
1490
+ "q75_var64": 0.0917,
1491
+ "mean_s0": 5.01,
1492
+ "std_s0": 1.19,
1493
+ "mean_s0_ratio": 1.09,
1494
+ "n_experts": 384
1495
+ },
1496
+ "shared_expert": {
1497
+ "var64": 0.1817,
1498
+ "s0": 7.56,
1499
+ "shape": [
1500
+ 7168,
1501
+ 2048
1502
+ ]
1503
+ },
1504
+ "router": {
1505
+ "var64": 0.3103,
1506
+ "s0": 4.57,
1507
+ "s0_s1": 1.07,
1508
+ "shape": [
1509
+ 384,
1510
+ 7168
1511
+ ]
1512
+ }
1513
+ },
1514
+ "51": {
1515
+ "layer": 51,
1516
+ "is_moe": true,
1517
+ "routed_experts": {
1518
+ "median_var64": 0.085,
1519
+ "q25_var64": 0.0818,
1520
+ "q75_var64": 0.0929,
1521
+ "mean_s0": 5.01,
1522
+ "std_s0": 1.18,
1523
+ "mean_s0_ratio": 1.1,
1524
+ "n_experts": 384
1525
+ },
1526
+ "shared_expert": {
1527
+ "var64": 0.1766,
1528
+ "s0": 7.63,
1529
+ "shape": [
1530
+ 7168,
1531
+ 2048
1532
+ ]
1533
+ },
1534
+ "router": {
1535
+ "var64": 0.308,
1536
+ "s0": 4.49,
1537
+ "s0_s1": 1.12,
1538
+ "shape": [
1539
+ 384,
1540
+ 7168
1541
+ ]
1542
+ }
1543
+ },
1544
+ "52": {
1545
+ "layer": 52,
1546
+ "is_moe": true,
1547
+ "routed_experts": {
1548
+ "median_var64": 0.0876,
1549
+ "q25_var64": 0.0833,
1550
+ "q75_var64": 0.0938,
1551
+ "mean_s0": 5.25,
1552
+ "std_s0": 1.65,
1553
+ "mean_s0_ratio": 1.12,
1554
+ "n_experts": 384
1555
+ },
1556
+ "shared_expert": {
1557
+ "var64": 0.1841,
1558
+ "s0": 7.68,
1559
+ "shape": [
1560
+ 7168,
1561
+ 2048
1562
+ ]
1563
+ },
1564
+ "router": {
1565
+ "var64": 0.3099,
1566
+ "s0": 4.48,
1567
+ "s0_s1": 1.11,
1568
+ "shape": [
1569
+ 384,
1570
+ 7168
1571
+ ]
1572
+ }
1573
+ },
1574
+ "53": {
1575
+ "layer": 53,
1576
+ "is_moe": true,
1577
+ "routed_experts": {
1578
+ "median_var64": 0.087,
1579
+ "q25_var64": 0.0833,
1580
+ "q75_var64": 0.0947,
1581
+ "mean_s0": 5.26,
1582
+ "std_s0": 1.46,
1583
+ "mean_s0_ratio": 1.11,
1584
+ "n_experts": 384
1585
+ },
1586
+ "shared_expert": {
1587
+ "var64": 0.1912,
1588
+ "s0": 8.26,
1589
+ "shape": [
1590
+ 7168,
1591
+ 2048
1592
+ ]
1593
+ },
1594
+ "router": {
1595
+ "var64": 0.3049,
1596
+ "s0": 4.14,
1597
+ "s0_s1": 1.08,
1598
+ "shape": [
1599
+ 384,
1600
+ 7168
1601
+ ]
1602
+ }
1603
+ },
1604
+ "54": {
1605
+ "layer": 54,
1606
+ "is_moe": true,
1607
+ "routed_experts": {
1608
+ "median_var64": 0.0887,
1609
+ "q25_var64": 0.0843,
1610
+ "q75_var64": 0.0965,
1611
+ "mean_s0": 5.51,
1612
+ "std_s0": 1.41,
1613
+ "mean_s0_ratio": 1.14,
1614
+ "n_experts": 384
1615
+ },
1616
+ "shared_expert": {
1617
+ "var64": 0.2034,
1618
+ "s0": 8.45,
1619
+ "shape": [
1620
+ 7168,
1621
+ 2048
1622
+ ]
1623
+ },
1624
+ "router": {
1625
+ "var64": 0.3038,
1626
+ "s0": 4.05,
1627
+ "s0_s1": 1.05,
1628
+ "shape": [
1629
+ 384,
1630
+ 7168
1631
+ ]
1632
+ }
1633
+ },
1634
+ "55": {
1635
+ "layer": 55,
1636
+ "is_moe": true,
1637
+ "routed_experts": {
1638
+ "median_var64": 0.086,
1639
+ "q25_var64": 0.0811,
1640
+ "q75_var64": 0.0957,
1641
+ "mean_s0": 5.6,
1642
+ "std_s0": 1.82,
1643
+ "mean_s0_ratio": 1.15,
1644
+ "n_experts": 384
1645
+ },
1646
+ "shared_expert": {
1647
+ "var64": 0.2073,
1648
+ "s0": 7.98,
1649
+ "shape": [
1650
+ 7168,
1651
+ 2048
1652
+ ]
1653
+ },
1654
+ "router": {
1655
+ "var64": 0.3091,
1656
+ "s0": 4.33,
1657
+ "s0_s1": 1.15,
1658
+ "shape": [
1659
+ 384,
1660
+ 7168
1661
+ ]
1662
+ }
1663
+ },
1664
+ "56": {
1665
+ "layer": 56,
1666
+ "is_moe": true,
1667
+ "routed_experts": {
1668
+ "median_var64": 0.0863,
1669
+ "q25_var64": 0.0815,
1670
+ "q75_var64": 0.0965,
1671
+ "mean_s0": 5.69,
1672
+ "std_s0": 1.93,
1673
+ "mean_s0_ratio": 1.16,
1674
+ "n_experts": 384
1675
+ },
1676
+ "shared_expert": {
1677
+ "var64": 0.2141,
1678
+ "s0": 8.61,
1679
+ "shape": [
1680
+ 7168,
1681
+ 2048
1682
+ ]
1683
+ },
1684
+ "router": {
1685
+ "var64": 0.3167,
1686
+ "s0": 4.19,
1687
+ "s0_s1": 1.11,
1688
+ "shape": [
1689
+ 384,
1690
+ 7168
1691
+ ]
1692
+ }
1693
+ },
1694
+ "57": {
1695
+ "layer": 57,
1696
+ "is_moe": true,
1697
+ "routed_experts": {
1698
+ "median_var64": 0.086,
1699
+ "q25_var64": 0.0797,
1700
+ "q75_var64": 0.0987,
1701
+ "mean_s0": 5.79,
1702
+ "std_s0": 2.19,
1703
+ "mean_s0_ratio": 1.16,
1704
+ "n_experts": 384
1705
+ },
1706
+ "shared_expert": {
1707
+ "var64": 0.2212,
1708
+ "s0": 10.11,
1709
+ "shape": [
1710
+ 7168,
1711
+ 2048
1712
+ ]
1713
+ },
1714
+ "router": {
1715
+ "var64": 0.3266,
1716
+ "s0": 4.64,
1717
+ "s0_s1": 1.2,
1718
+ "shape": [
1719
+ 384,
1720
+ 7168
1721
+ ]
1722
+ }
1723
+ },
1724
+ "58": {
1725
+ "layer": 58,
1726
+ "is_moe": true,
1727
+ "routed_experts": {
1728
+ "median_var64": 0.0853,
1729
+ "q25_var64": 0.0797,
1730
+ "q75_var64": 0.0955,
1731
+ "mean_s0": 5.75,
1732
+ "std_s0": 2.08,
1733
+ "mean_s0_ratio": 1.15,
1734
+ "n_experts": 384
1735
+ },
1736
+ "shared_expert": {
1737
+ "var64": 0.1874,
1738
+ "s0": 7.05,
1739
+ "shape": [
1740
+ 7168,
1741
+ 2048
1742
+ ]
1743
+ },
1744
+ "router": {
1745
+ "var64": 0.3379,
1746
+ "s0": 4.87,
1747
+ "s0_s1": 1.13,
1748
+ "shape": [
1749
+ 384,
1750
+ 7168
1751
+ ]
1752
+ }
1753
+ },
1754
+ "59": {
1755
+ "layer": 59,
1756
+ "is_moe": true,
1757
+ "routed_experts": {
1758
+ "median_var64": 0.0846,
1759
+ "q25_var64": 0.0797,
1760
+ "q75_var64": 0.0928,
1761
+ "mean_s0": 5.58,
1762
+ "std_s0": 1.81,
1763
+ "mean_s0_ratio": 1.13,
1764
+ "n_experts": 384
1765
+ },
1766
+ "shared_expert": {
1767
+ "var64": 0.1663,
1768
+ "s0": 5.87,
1769
+ "shape": [
1770
+ 7168,
1771
+ 2048
1772
+ ]
1773
+ },
1774
+ "router": {
1775
+ "var64": 0.3446,
1776
+ "s0": 5.4,
1777
+ "s0_s1": 1.12,
1778
+ "shape": [
1779
+ 384,
1780
+ 7168
1781
+ ]
1782
+ }
1783
+ },
1784
+ "60": {
1785
+ "layer": 60,
1786
+ "is_moe": true,
1787
+ "routed_experts": {
1788
+ "median_var64": 0.0863,
1789
+ "q25_var64": 0.0811,
1790
+ "q75_var64": 0.0944,
1791
+ "mean_s0": 5.52,
1792
+ "std_s0": 1.19,
1793
+ "mean_s0_ratio": 1.1,
1794
+ "n_experts": 384
1795
+ },
1796
+ "shared_expert": {
1797
+ "var64": 0.2663,
1798
+ "s0": 6.47,
1799
+ "shape": [
1800
+ 7168,
1801
+ 2048
1802
+ ]
1803
+ },
1804
+ "router": {
1805
+ "var64": 0.3667,
1806
+ "s0": 5.54,
1807
+ "s0_s1": 1.06,
1808
+ "shape": [
1809
+ 384,
1810
+ 7168
1811
+ ]
1812
+ }
1813
+ }
1814
+ }