You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

gpt2_moe_eng_hom_1024_100mb_gelu

This model is a fine-tuned version of on the arrow dataset. It achieves the following results on the evaluation set:

  • Loss: 4.8793

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 8
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 2028
  • training_steps: 20287
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
No log 0.1479 100 9.3622
9.7621 0.2957 200 8.4727
9.7621 0.4436 300 7.4517
7.6605 0.5915 400 6.7971
7.6605 0.7394 500 6.4983
6.6653 0.8872 600 6.3011
6.6653 1.0340 700 6.1616
6.3041 1.1819 800 6.0542
6.3041 1.3298 900 5.9704
6.1099 1.4776 1000 5.8965
6.1099 1.6255 1100 5.8348
5.971 1.7734 1200 5.7812
5.971 1.9213 1300 5.7294
5.8583 2.0680 1400 5.6836
5.8583 2.2159 1500 5.6280
5.7424 2.3638 1600 5.5761
5.7424 2.5116 1700 5.5054
5.6289 2.6595 1800 5.4339
5.6289 2.8074 1900 5.3614
5.4999 2.9553 2000 5.2909
5.4999 3.1020 2100 5.2318
5.3357 3.2499 2200 5.1698
5.3357 3.3978 2300 5.1215
5.2257 3.5457 2400 5.0580
5.2257 3.6935 2500 5.0113
5.134 3.8414 2600 4.9670
5.134 3.9893 2700 4.9193
5.0174 4.1360 2800 4.8910
5.0174 4.2839 2900 4.8529
4.9263 4.4318 3000 4.8262
4.9263 4.5797 3100 4.7836
4.8709 4.7275 3200 4.7552
4.8709 4.8754 3300 4.7274
4.8187 5.0222 3400 4.7045
4.8187 5.1701 3500 4.6760
4.687 5.3179 3600 4.6522
4.687 5.4658 3700 4.6301
4.6651 5.6137 3800 4.6091
4.6651 5.7616 3900 4.5833
4.6359 5.9094 4000 4.5632
4.6359 6.0562 4100 4.5445
4.5337 6.2041 4200 4.5332
4.5337 6.3519 4300 4.5184
4.4876 6.4998 4400 4.4997
4.4876 6.6477 4500 4.4774
4.4714 6.7956 4600 4.4611
4.4714 6.9434 4700 4.4495
4.42 7.0902 4800 4.4400
4.42 7.2381 4900 4.4303
4.3298 7.3860 5000 4.4224
4.3298 7.5338 5100 4.4055
4.3264 7.6817 5200 4.3911
4.3264 7.8296 5300 4.3814
4.3255 7.9774 5400 4.3689
4.3255 8.1242 5500 4.3719
4.1911 8.2721 5600 4.3644
4.1911 8.4200 5700 4.3602
4.2009 8.5678 5800 4.3501
4.2009 8.7157 5900 4.3361
4.1997 8.8636 6000 4.3258
4.1997 9.0104 6100 4.3262
4.1258 9.1582 6200 4.3337
4.1258 9.3061 6300 4.3271
4.0768 9.4540 6400 4.3199
4.0768 9.6018 6500 4.3144
4.0837 9.7497 6600 4.3084
4.0837 9.8976 6700 4.2958
3.7618 10.0444 6800 4.2656
3.7618 10.1922 6900 4.2746
3.8077 10.3401 7000 4.2708
3.8077 10.4880 7100 4.2645
3.8224 10.6359 7200 4.2603
3.8224 10.7837 7300 4.2554
3.8296 10.9316 7400 4.2464
3.8296 11.0784 7500 4.2648
3.7212 11.2262 7600 4.2734
3.7212 11.3741 7700 4.2698
3.7198 11.5220 7800 4.2663
3.7198 11.6699 7900 4.2647
3.7248 11.8177 8000 4.2585
3.7248 11.9656 8100 4.2536
3.6843 12.1124 8200 4.2792
3.6843 12.2603 8300 4.2811
3.6066 12.4081 8400 4.2802
3.6066 12.5560 8500 4.2760
3.6268 12.7039 8600 4.2736
3.6268 12.8518 8700 4.2667
3.6459 12.9996 8800 4.2686
3.6459 13.1464 8900 4.2976
3.4922 13.2943 9000 4.3040
3.4922 13.4421 9100 4.2989
3.528 13.5900 9200 4.3004
3.528 13.7379 9300 4.2946
3.5509 13.8858 9400 4.2918
3.5509 14.0325 9500 4.3121
3.47 14.1804 9600 4.3275
3.47 14.3283 9700 4.3296
3.4331 14.4762 9800 4.3285
3.4331 14.6240 9900 4.3279
3.4572 14.7719 10000 4.3232
3.4572 14.9198 10100 4.3211
3.4453 15.0665 10200 4.3463
3.4453 15.2144 10300 4.3598
3.3344 15.3623 10400 4.3620
3.3344 15.5102 10500 4.3594
3.3623 15.6580 10600 4.3634
3.3623 15.8059 10700 4.3570
3.3875 15.9538 10800 4.3561
3.3875 16.1006 10900 4.3936
3.2621 16.2484 11000 4.3980
3.2621 16.3963 11100 4.4003
3.2716 16.5442 11200 4.4013
3.2716 16.6921 11300 4.3989
3.2982 16.8399 11400 4.3975
3.2982 16.9878 11500 4.3952
3.2373 17.1346 11600 4.4309
3.2373 17.2824 11700 4.4393
3.1831 17.4303 11800 4.4477
3.1831 17.5782 11900 4.4412
3.2104 17.7261 12000 4.4393
3.2104 17.8739 12100 4.4376
3.221 18.0207 12200 4.4595
3.221 18.1686 12300 4.4775
3.0792 18.3165 12400 4.4852
3.0792 18.4643 12500 4.4869
3.1278 18.6122 12600 4.4878
3.1278 18.7601 12700 4.4874
3.1524 18.9079 12800 4.4843
3.1524 19.0547 12900 4.5108
3.0559 19.2026 13000 4.5251
3.0559 19.3505 13100 4.5320
3.0386 19.4983 13200 4.5330
3.0386 19.6462 13300 4.5327
3.0742 19.7941 13400 4.5318
3.0742 19.9420 13500 4.5313
3.037 20.0887 13600 4.5636
3.037 20.2366 13700 4.5708
2.9601 20.3845 13800 4.5759
2.9601 20.5323 13900 4.5770
2.9845 20.6802 14000 4.5785
2.9845 20.8281 14100 4.5787
3.0182 20.9760 14200 4.5752
3.0182 21.1227 14300 4.6077
2.897 21.2706 14400 4.6195
2.897 21.4185 14500 4.6227
2.9144 21.5664 14600 4.6235
2.9144 21.7142 14700 4.6225
2.941 21.8621 14800 4.6226
2.941 22.0089 14900 4.6335
2.8755 22.1567 15000 4.6583
2.8755 22.3046 15100 4.6638
2.844 22.4525 15200 4.6678
2.844 22.6004 15300 4.6670
2.8713 22.7482 15400 4.6677
2.8713 22.8961 15500 4.6687
2.8659 23.0429 15600 4.6898
2.8659 23.1908 15700 4.7004
2.7686 23.3386 15800 4.7080
2.7686 23.4865 15900 4.7120
2.807 23.6344 16000 4.7123
2.807 23.7823 16100 4.7129
2.8185 23.9301 16200 4.7133
2.8185 24.0769 16300 4.7336
2.7406 24.2248 16400 4.7425
2.7406 24.3726 16500 4.7490
2.7357 24.5205 16600 4.7505
2.7357 24.6684 16700 4.7498
2.7585 24.8163 16800 4.7543
2.7585 24.9641 16900 4.7538
2.7301 25.1109 17000 4.7770
2.7301 25.2588 17100 4.7824
2.6831 25.4067 17200 4.7839
2.6831 25.5545 17300 4.7862
2.7023 25.7024 17400 4.7903
2.7023 25.8503 17500 4.7911
2.715 25.9982 17600 4.7907
2.715 26.1449 17700 4.8110
2.6278 26.2928 17800 4.8160
2.6278 26.4407 17900 4.8179
2.6537 26.5885 18000 4.8217
2.6537 26.7364 18100 4.8230
2.6646 26.8843 18200 4.8234
2.6646 27.0311 18300 4.8324
2.6243 27.1789 18400 4.8411
2.6243 27.3268 18500 4.8438
2.6034 27.4747 18600 4.8450
2.6034 27.6226 18700 4.8484
2.62 27.7704 18800 4.8491
2.62 27.9183 18900 4.8498
2.6108 28.0651 19000 4.8576
2.6108 28.2129 19100 4.8638
2.5697 28.3608 19200 4.8656
2.5697 28.5087 19300 4.8668
2.5787 28.6566 19400 4.8677
2.5787 28.8044 19500 4.8695
2.582 28.9523 19600 4.8693
2.582 29.0991 19700 4.8744
2.55 29.2470 19800 4.8767
2.55 29.3948 19900 4.8781
2.5513 29.5427 20000 4.8790
2.5513 29.6906 20100 4.8792
2.552 29.8384 20200 4.8793

Framework versions

  • Transformers 4.57.1
  • Pytorch 2.7.1+cu118
  • Datasets 3.6.0
  • Tokenizers 0.22.1
Downloads last month
-
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support