gpt2_moe_eng_hom_1024_100mb_gelu
This model is a fine-tuned version of on the arrow dataset. It achieves the following results on the evaluation set:
- Loss: 4.8793
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 8
- eval_batch_size: 4
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 2028
- training_steps: 20287
- mixed_precision_training: Native AMP
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| No log | 0.1479 | 100 | 9.3622 |
| 9.7621 | 0.2957 | 200 | 8.4727 |
| 9.7621 | 0.4436 | 300 | 7.4517 |
| 7.6605 | 0.5915 | 400 | 6.7971 |
| 7.6605 | 0.7394 | 500 | 6.4983 |
| 6.6653 | 0.8872 | 600 | 6.3011 |
| 6.6653 | 1.0340 | 700 | 6.1616 |
| 6.3041 | 1.1819 | 800 | 6.0542 |
| 6.3041 | 1.3298 | 900 | 5.9704 |
| 6.1099 | 1.4776 | 1000 | 5.8965 |
| 6.1099 | 1.6255 | 1100 | 5.8348 |
| 5.971 | 1.7734 | 1200 | 5.7812 |
| 5.971 | 1.9213 | 1300 | 5.7294 |
| 5.8583 | 2.0680 | 1400 | 5.6836 |
| 5.8583 | 2.2159 | 1500 | 5.6280 |
| 5.7424 | 2.3638 | 1600 | 5.5761 |
| 5.7424 | 2.5116 | 1700 | 5.5054 |
| 5.6289 | 2.6595 | 1800 | 5.4339 |
| 5.6289 | 2.8074 | 1900 | 5.3614 |
| 5.4999 | 2.9553 | 2000 | 5.2909 |
| 5.4999 | 3.1020 | 2100 | 5.2318 |
| 5.3357 | 3.2499 | 2200 | 5.1698 |
| 5.3357 | 3.3978 | 2300 | 5.1215 |
| 5.2257 | 3.5457 | 2400 | 5.0580 |
| 5.2257 | 3.6935 | 2500 | 5.0113 |
| 5.134 | 3.8414 | 2600 | 4.9670 |
| 5.134 | 3.9893 | 2700 | 4.9193 |
| 5.0174 | 4.1360 | 2800 | 4.8910 |
| 5.0174 | 4.2839 | 2900 | 4.8529 |
| 4.9263 | 4.4318 | 3000 | 4.8262 |
| 4.9263 | 4.5797 | 3100 | 4.7836 |
| 4.8709 | 4.7275 | 3200 | 4.7552 |
| 4.8709 | 4.8754 | 3300 | 4.7274 |
| 4.8187 | 5.0222 | 3400 | 4.7045 |
| 4.8187 | 5.1701 | 3500 | 4.6760 |
| 4.687 | 5.3179 | 3600 | 4.6522 |
| 4.687 | 5.4658 | 3700 | 4.6301 |
| 4.6651 | 5.6137 | 3800 | 4.6091 |
| 4.6651 | 5.7616 | 3900 | 4.5833 |
| 4.6359 | 5.9094 | 4000 | 4.5632 |
| 4.6359 | 6.0562 | 4100 | 4.5445 |
| 4.5337 | 6.2041 | 4200 | 4.5332 |
| 4.5337 | 6.3519 | 4300 | 4.5184 |
| 4.4876 | 6.4998 | 4400 | 4.4997 |
| 4.4876 | 6.6477 | 4500 | 4.4774 |
| 4.4714 | 6.7956 | 4600 | 4.4611 |
| 4.4714 | 6.9434 | 4700 | 4.4495 |
| 4.42 | 7.0902 | 4800 | 4.4400 |
| 4.42 | 7.2381 | 4900 | 4.4303 |
| 4.3298 | 7.3860 | 5000 | 4.4224 |
| 4.3298 | 7.5338 | 5100 | 4.4055 |
| 4.3264 | 7.6817 | 5200 | 4.3911 |
| 4.3264 | 7.8296 | 5300 | 4.3814 |
| 4.3255 | 7.9774 | 5400 | 4.3689 |
| 4.3255 | 8.1242 | 5500 | 4.3719 |
| 4.1911 | 8.2721 | 5600 | 4.3644 |
| 4.1911 | 8.4200 | 5700 | 4.3602 |
| 4.2009 | 8.5678 | 5800 | 4.3501 |
| 4.2009 | 8.7157 | 5900 | 4.3361 |
| 4.1997 | 8.8636 | 6000 | 4.3258 |
| 4.1997 | 9.0104 | 6100 | 4.3262 |
| 4.1258 | 9.1582 | 6200 | 4.3337 |
| 4.1258 | 9.3061 | 6300 | 4.3271 |
| 4.0768 | 9.4540 | 6400 | 4.3199 |
| 4.0768 | 9.6018 | 6500 | 4.3144 |
| 4.0837 | 9.7497 | 6600 | 4.3084 |
| 4.0837 | 9.8976 | 6700 | 4.2958 |
| 3.7618 | 10.0444 | 6800 | 4.2656 |
| 3.7618 | 10.1922 | 6900 | 4.2746 |
| 3.8077 | 10.3401 | 7000 | 4.2708 |
| 3.8077 | 10.4880 | 7100 | 4.2645 |
| 3.8224 | 10.6359 | 7200 | 4.2603 |
| 3.8224 | 10.7837 | 7300 | 4.2554 |
| 3.8296 | 10.9316 | 7400 | 4.2464 |
| 3.8296 | 11.0784 | 7500 | 4.2648 |
| 3.7212 | 11.2262 | 7600 | 4.2734 |
| 3.7212 | 11.3741 | 7700 | 4.2698 |
| 3.7198 | 11.5220 | 7800 | 4.2663 |
| 3.7198 | 11.6699 | 7900 | 4.2647 |
| 3.7248 | 11.8177 | 8000 | 4.2585 |
| 3.7248 | 11.9656 | 8100 | 4.2536 |
| 3.6843 | 12.1124 | 8200 | 4.2792 |
| 3.6843 | 12.2603 | 8300 | 4.2811 |
| 3.6066 | 12.4081 | 8400 | 4.2802 |
| 3.6066 | 12.5560 | 8500 | 4.2760 |
| 3.6268 | 12.7039 | 8600 | 4.2736 |
| 3.6268 | 12.8518 | 8700 | 4.2667 |
| 3.6459 | 12.9996 | 8800 | 4.2686 |
| 3.6459 | 13.1464 | 8900 | 4.2976 |
| 3.4922 | 13.2943 | 9000 | 4.3040 |
| 3.4922 | 13.4421 | 9100 | 4.2989 |
| 3.528 | 13.5900 | 9200 | 4.3004 |
| 3.528 | 13.7379 | 9300 | 4.2946 |
| 3.5509 | 13.8858 | 9400 | 4.2918 |
| 3.5509 | 14.0325 | 9500 | 4.3121 |
| 3.47 | 14.1804 | 9600 | 4.3275 |
| 3.47 | 14.3283 | 9700 | 4.3296 |
| 3.4331 | 14.4762 | 9800 | 4.3285 |
| 3.4331 | 14.6240 | 9900 | 4.3279 |
| 3.4572 | 14.7719 | 10000 | 4.3232 |
| 3.4572 | 14.9198 | 10100 | 4.3211 |
| 3.4453 | 15.0665 | 10200 | 4.3463 |
| 3.4453 | 15.2144 | 10300 | 4.3598 |
| 3.3344 | 15.3623 | 10400 | 4.3620 |
| 3.3344 | 15.5102 | 10500 | 4.3594 |
| 3.3623 | 15.6580 | 10600 | 4.3634 |
| 3.3623 | 15.8059 | 10700 | 4.3570 |
| 3.3875 | 15.9538 | 10800 | 4.3561 |
| 3.3875 | 16.1006 | 10900 | 4.3936 |
| 3.2621 | 16.2484 | 11000 | 4.3980 |
| 3.2621 | 16.3963 | 11100 | 4.4003 |
| 3.2716 | 16.5442 | 11200 | 4.4013 |
| 3.2716 | 16.6921 | 11300 | 4.3989 |
| 3.2982 | 16.8399 | 11400 | 4.3975 |
| 3.2982 | 16.9878 | 11500 | 4.3952 |
| 3.2373 | 17.1346 | 11600 | 4.4309 |
| 3.2373 | 17.2824 | 11700 | 4.4393 |
| 3.1831 | 17.4303 | 11800 | 4.4477 |
| 3.1831 | 17.5782 | 11900 | 4.4412 |
| 3.2104 | 17.7261 | 12000 | 4.4393 |
| 3.2104 | 17.8739 | 12100 | 4.4376 |
| 3.221 | 18.0207 | 12200 | 4.4595 |
| 3.221 | 18.1686 | 12300 | 4.4775 |
| 3.0792 | 18.3165 | 12400 | 4.4852 |
| 3.0792 | 18.4643 | 12500 | 4.4869 |
| 3.1278 | 18.6122 | 12600 | 4.4878 |
| 3.1278 | 18.7601 | 12700 | 4.4874 |
| 3.1524 | 18.9079 | 12800 | 4.4843 |
| 3.1524 | 19.0547 | 12900 | 4.5108 |
| 3.0559 | 19.2026 | 13000 | 4.5251 |
| 3.0559 | 19.3505 | 13100 | 4.5320 |
| 3.0386 | 19.4983 | 13200 | 4.5330 |
| 3.0386 | 19.6462 | 13300 | 4.5327 |
| 3.0742 | 19.7941 | 13400 | 4.5318 |
| 3.0742 | 19.9420 | 13500 | 4.5313 |
| 3.037 | 20.0887 | 13600 | 4.5636 |
| 3.037 | 20.2366 | 13700 | 4.5708 |
| 2.9601 | 20.3845 | 13800 | 4.5759 |
| 2.9601 | 20.5323 | 13900 | 4.5770 |
| 2.9845 | 20.6802 | 14000 | 4.5785 |
| 2.9845 | 20.8281 | 14100 | 4.5787 |
| 3.0182 | 20.9760 | 14200 | 4.5752 |
| 3.0182 | 21.1227 | 14300 | 4.6077 |
| 2.897 | 21.2706 | 14400 | 4.6195 |
| 2.897 | 21.4185 | 14500 | 4.6227 |
| 2.9144 | 21.5664 | 14600 | 4.6235 |
| 2.9144 | 21.7142 | 14700 | 4.6225 |
| 2.941 | 21.8621 | 14800 | 4.6226 |
| 2.941 | 22.0089 | 14900 | 4.6335 |
| 2.8755 | 22.1567 | 15000 | 4.6583 |
| 2.8755 | 22.3046 | 15100 | 4.6638 |
| 2.844 | 22.4525 | 15200 | 4.6678 |
| 2.844 | 22.6004 | 15300 | 4.6670 |
| 2.8713 | 22.7482 | 15400 | 4.6677 |
| 2.8713 | 22.8961 | 15500 | 4.6687 |
| 2.8659 | 23.0429 | 15600 | 4.6898 |
| 2.8659 | 23.1908 | 15700 | 4.7004 |
| 2.7686 | 23.3386 | 15800 | 4.7080 |
| 2.7686 | 23.4865 | 15900 | 4.7120 |
| 2.807 | 23.6344 | 16000 | 4.7123 |
| 2.807 | 23.7823 | 16100 | 4.7129 |
| 2.8185 | 23.9301 | 16200 | 4.7133 |
| 2.8185 | 24.0769 | 16300 | 4.7336 |
| 2.7406 | 24.2248 | 16400 | 4.7425 |
| 2.7406 | 24.3726 | 16500 | 4.7490 |
| 2.7357 | 24.5205 | 16600 | 4.7505 |
| 2.7357 | 24.6684 | 16700 | 4.7498 |
| 2.7585 | 24.8163 | 16800 | 4.7543 |
| 2.7585 | 24.9641 | 16900 | 4.7538 |
| 2.7301 | 25.1109 | 17000 | 4.7770 |
| 2.7301 | 25.2588 | 17100 | 4.7824 |
| 2.6831 | 25.4067 | 17200 | 4.7839 |
| 2.6831 | 25.5545 | 17300 | 4.7862 |
| 2.7023 | 25.7024 | 17400 | 4.7903 |
| 2.7023 | 25.8503 | 17500 | 4.7911 |
| 2.715 | 25.9982 | 17600 | 4.7907 |
| 2.715 | 26.1449 | 17700 | 4.8110 |
| 2.6278 | 26.2928 | 17800 | 4.8160 |
| 2.6278 | 26.4407 | 17900 | 4.8179 |
| 2.6537 | 26.5885 | 18000 | 4.8217 |
| 2.6537 | 26.7364 | 18100 | 4.8230 |
| 2.6646 | 26.8843 | 18200 | 4.8234 |
| 2.6646 | 27.0311 | 18300 | 4.8324 |
| 2.6243 | 27.1789 | 18400 | 4.8411 |
| 2.6243 | 27.3268 | 18500 | 4.8438 |
| 2.6034 | 27.4747 | 18600 | 4.8450 |
| 2.6034 | 27.6226 | 18700 | 4.8484 |
| 2.62 | 27.7704 | 18800 | 4.8491 |
| 2.62 | 27.9183 | 18900 | 4.8498 |
| 2.6108 | 28.0651 | 19000 | 4.8576 |
| 2.6108 | 28.2129 | 19100 | 4.8638 |
| 2.5697 | 28.3608 | 19200 | 4.8656 |
| 2.5697 | 28.5087 | 19300 | 4.8668 |
| 2.5787 | 28.6566 | 19400 | 4.8677 |
| 2.5787 | 28.8044 | 19500 | 4.8695 |
| 2.582 | 28.9523 | 19600 | 4.8693 |
| 2.582 | 29.0991 | 19700 | 4.8744 |
| 2.55 | 29.2470 | 19800 | 4.8767 |
| 2.55 | 29.3948 | 19900 | 4.8781 |
| 2.5513 | 29.5427 | 20000 | 4.8790 |
| 2.5513 | 29.6906 | 20100 | 4.8792 |
| 2.552 | 29.8384 | 20200 | 4.8793 |
Framework versions
- Transformers 4.57.1
- Pytorch 2.7.1+cu118
- Datasets 3.6.0
- Tokenizers 0.22.1
- Downloads last month
- -