You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

gpt2_moe_hom_1024_100mb_gelu

This model is a fine-tuned version of on the arrow dataset. It achieves the following results on the evaluation set:

  • Loss: 4.0647

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 8
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 10252
  • training_steps: 102526
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
9.3834 0.1463 500 8.7093
7.5267 0.2926 1000 7.3227
6.9908 0.4389 1500 6.7627
6.5106 0.5852 2000 6.4202
6.3056 0.7315 2500 6.1956
6.0878 0.8778 3000 6.0325
5.9801 1.0240 3500 5.9058
5.8392 1.1703 4000 5.7994
5.749 1.3166 4500 5.6914
5.6274 1.4629 5000 5.5682
5.5212 1.6092 5500 5.4333
5.3752 1.7555 6000 5.3082
5.2817 1.9018 6500 5.1953
5.1303 2.0480 7000 5.0931
5.066 2.1943 7500 5.0033
4.9701 2.3406 8000 4.9219
4.9075 2.4869 8500 4.8434
4.8244 2.6332 9000 4.7711
4.7694 2.7795 9500 4.6991
4.702 2.9258 10000 4.6343
4.5982 3.0720 10500 4.5734
4.5412 3.2183 11000 4.5152
4.5098 3.3646 11500 4.4656
4.4478 3.5109 12000 4.4140
4.4289 3.6572 12500 4.3707
4.3789 3.8035 13000 4.3368
4.3439 3.9497 13500 4.2969
4.2479 4.0960 14000 4.2698
4.2331 4.2423 14500 4.2447
4.2092 4.3886 15000 4.2206
4.1969 4.5349 15500 4.1968
4.1754 4.6811 16000 4.1727
4.1572 4.8274 16500 4.1517
4.141 4.9737 17000 4.1322
4.0416 5.1200 17500 4.1209
4.0458 5.2663 18000 4.1070
4.044 5.4126 18500 4.0929
4.0392 5.5588 19000 4.0768
4.0161 5.7051 19500 4.0612
4.0105 5.8514 20000 4.0488
4.0083 5.9977 20500 4.0336
3.912 6.1440 21000 4.0344
3.9133 6.2902 21500 4.0233
3.9166 6.4365 22000 4.0126
3.9155 6.5828 22500 4.0026
3.9191 6.7291 23000 3.9910
3.9205 6.8754 23500 3.9806
3.8652 7.0217 24000 3.9782
3.8049 7.1679 24500 3.9778
3.8194 7.3142 25000 3.9715
3.8292 7.4605 25500 3.9614
3.8259 7.6068 26000 3.9530
3.8278 7.7531 26500 3.9454
3.8223 7.8994 27000 3.9360
3.7907 8.0456 27500 3.9411
3.7334 8.1919 28000 3.9404
3.7343 8.3382 28500 3.9351
3.7436 8.4845 29000 3.9270
3.7576 8.6308 29500 3.9194
3.7487 8.7771 30000 3.9136
3.7546 8.9234 30500 3.9059
3.6439 9.0696 31000 3.9144
3.6589 9.2159 31500 3.9138
3.6728 9.3622 32000 3.9111
3.6795 9.5085 32500 3.9056
3.6923 9.6548 33000 3.8993
3.6938 9.8011 33500 3.8940
3.6946 9.9474 34000 3.8864
3.5679 10.0936 34500 3.9031
3.6002 10.2399 35000 3.9004
3.6119 10.3862 35500 3.8958
3.6184 10.5325 36000 3.8907
3.6337 10.6788 36500 3.8838
3.6322 10.8251 37000 3.8798
3.636 10.9714 37500 3.8749
3.5265 11.1176 38000 3.8918
3.5411 11.2639 38500 3.8916
3.5663 11.4102 39000 3.8869
3.5703 11.5565 39500 3.8816
3.5862 11.7028 40000 3.8777
3.5829 11.8491 40500 3.8727
3.5889 11.9954 41000 3.8703
3.4682 12.1416 41500 3.8876
3.501 12.2879 42000 3.8882
3.5109 12.4342 42500 3.8835
3.5187 12.5805 43000 3.8784
3.5374 12.7268 43500 3.8748
3.5311 12.8731 44000 3.8715
3.5457 13.0193 44500 3.8823
3.4344 13.1656 45000 3.8897
3.4551 13.3119 45500 3.8865
3.4675 13.4582 46000 3.8840
3.4719 13.6045 46500 3.8795
3.4836 13.7508 47000 3.8767
3.4945 13.8971 47500 3.8710
3.3994 14.0433 48000 3.8904
3.3857 14.1896 48500 3.8964
3.4053 14.3359 49000 3.8918
3.4309 14.4822 49500 3.8888
3.433 14.6285 50000 3.8854
3.4475 14.7748 50500 3.8787
3.4465 14.9211 51000 3.8751
3.3686 15.0673 51500 3.8988
3.3508 15.2136 52000 3.9007
3.3644 15.3599 52500 3.9003
3.3894 15.5062 53000 3.8961
3.3914 15.6525 53500 3.8911
3.4053 15.7988 54000 3.8888
3.4118 15.9451 54500 3.8818
3.2958 16.0913 55000 3.9100
3.3117 16.2376 55500 3.9103
3.3305 16.3839 56000 3.9058
3.3377 16.5302 56500 3.9047
3.3535 16.6765 57000 3.9001
3.3587 16.8228 57500 3.8953
3.3666 16.9691 58000 3.8936
3.2519 17.1153 58500 3.9214
3.2752 17.2616 59000 3.9207
3.2917 17.4079 59500 3.9185
3.312 17.5542 60000 3.9159
3.3088 17.7005 60500 3.9122
3.332 17.8468 61000 3.9085
3.3385 17.9931 61500 3.9059
3.2243 18.1393 62000 3.9329
3.2414 18.2856 62500 3.9323
3.2581 18.4319 63000 3.9308
3.2759 18.5782 63500 3.9267
3.2858 18.7245 64000 3.9246
3.292 18.8707 64500 3.9210
3.2586 19.0170 65000 3.9374
3.1976 19.1633 65500 3.9466
3.2189 19.3096 66000 3.9465
3.2167 19.4559 66500 3.9440
3.2399 19.6022 67000 3.9409
3.2546 19.7484 67500 3.9366
3.2588 19.8947 68000 3.9329
3.2436 20.0410 68500 3.9545
3.1653 20.1873 69000 3.9610
3.187 20.3336 69500 3.9601
3.1987 20.4798 70000 3.9573
3.2097 20.6261 70500 3.9556
3.2227 20.7724 71000 3.9491
3.2255 20.9187 71500 3.9475
3.1149 21.0650 72000 3.9709
3.1352 21.2113 72500 3.9741
3.1525 21.3575 73000 3.9730
3.1624 21.5038 73500 3.9710
3.1771 21.6501 74000 3.9681
3.1841 21.7964 74500 3.9657
3.193 21.9427 75000 3.9614
3.0821 22.0889 75500 3.9879
3.1162 22.2352 76000 3.9907
3.1209 22.3815 76500 3.9865
3.1469 22.5278 77000 3.9848
3.1475 22.6741 77500 3.9826
3.1537 22.8204 78000 3.9798
3.1587 22.9667 78500 3.9773
3.071 23.1129 79000 4.0029
3.086 23.2592 79500 4.0026
3.1013 23.4055 80000 4.0002
3.1055 23.5518 80500 4.0000
3.1207 23.6981 81000 3.9960
3.1202 23.8444 81500 3.9942
3.1244 23.9907 82000 3.9929
3.0405 24.1369 82500 4.0158
3.0636 24.2832 83000 4.0166
3.07 24.4295 83500 4.0155
3.0845 24.5758 84000 4.0131
3.0857 24.7221 84500 4.0102
3.0969 24.8684 85000 4.0096
3.1004 25.0146 85500 4.0210
3.0231 25.1609 86000 4.0280
3.0347 25.3072 86500 4.0298
3.0489 25.4535 87000 4.0301
3.0574 25.5998 87500 4.0267
3.0626 25.7461 88000 4.0245
3.066 25.8924 88500 4.0244
3.0193 26.0386 89000 4.0374
3.0058 26.1849 89500 4.0411
3.0135 26.3312 90000 4.0403
3.0217 26.4775 90500 4.0416
3.0323 26.6238 91000 4.0391
3.0276 26.7701 91500 4.0382
3.0389 26.9164 92000 4.0362
3.0004 27.0626 92500 4.0486
2.9859 27.2089 93000 4.0523
2.9851 27.3552 93500 4.0518
2.9981 27.5015 94000 4.0514
3.002 27.6478 94500 4.0504
3.0052 27.7941 95000 4.0492
3.0078 27.9404 95500 4.0476
2.9615 28.0866 96000 4.0570
2.9622 28.2329 96500 4.0594
2.9711 28.3792 97000 4.0592
2.9678 28.5255 97500 4.0596
2.9898 28.6718 98000 4.0589
2.9775 28.8181 98500 4.0584
2.985 28.9644 99000 4.0581
2.9462 29.1106 99500 4.0637
2.9585 29.2569 100000 4.0655
2.9541 29.4032 100500 4.0650
2.9643 29.5495 101000 4.0647
2.9541 29.6958 101500 4.0646
2.958 29.8421 102000 4.0647
2.9525 29.9884 102500 4.0647

Framework versions

  • Transformers 4.57.1
  • Pytorch 2.7.1+cu118
  • Datasets 3.6.0
  • Tokenizers 0.22.1
Downloads last month
-
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support