gpt2_moe_hom_1024_100mb_gelu
This model is a fine-tuned version of on the arrow dataset. It achieves the following results on the evaluation set:
- Loss: 4.0647
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 8
- eval_batch_size: 4
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 10252
- training_steps: 102526
- mixed_precision_training: Native AMP
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 9.3834 | 0.1463 | 500 | 8.7093 |
| 7.5267 | 0.2926 | 1000 | 7.3227 |
| 6.9908 | 0.4389 | 1500 | 6.7627 |
| 6.5106 | 0.5852 | 2000 | 6.4202 |
| 6.3056 | 0.7315 | 2500 | 6.1956 |
| 6.0878 | 0.8778 | 3000 | 6.0325 |
| 5.9801 | 1.0240 | 3500 | 5.9058 |
| 5.8392 | 1.1703 | 4000 | 5.7994 |
| 5.749 | 1.3166 | 4500 | 5.6914 |
| 5.6274 | 1.4629 | 5000 | 5.5682 |
| 5.5212 | 1.6092 | 5500 | 5.4333 |
| 5.3752 | 1.7555 | 6000 | 5.3082 |
| 5.2817 | 1.9018 | 6500 | 5.1953 |
| 5.1303 | 2.0480 | 7000 | 5.0931 |
| 5.066 | 2.1943 | 7500 | 5.0033 |
| 4.9701 | 2.3406 | 8000 | 4.9219 |
| 4.9075 | 2.4869 | 8500 | 4.8434 |
| 4.8244 | 2.6332 | 9000 | 4.7711 |
| 4.7694 | 2.7795 | 9500 | 4.6991 |
| 4.702 | 2.9258 | 10000 | 4.6343 |
| 4.5982 | 3.0720 | 10500 | 4.5734 |
| 4.5412 | 3.2183 | 11000 | 4.5152 |
| 4.5098 | 3.3646 | 11500 | 4.4656 |
| 4.4478 | 3.5109 | 12000 | 4.4140 |
| 4.4289 | 3.6572 | 12500 | 4.3707 |
| 4.3789 | 3.8035 | 13000 | 4.3368 |
| 4.3439 | 3.9497 | 13500 | 4.2969 |
| 4.2479 | 4.0960 | 14000 | 4.2698 |
| 4.2331 | 4.2423 | 14500 | 4.2447 |
| 4.2092 | 4.3886 | 15000 | 4.2206 |
| 4.1969 | 4.5349 | 15500 | 4.1968 |
| 4.1754 | 4.6811 | 16000 | 4.1727 |
| 4.1572 | 4.8274 | 16500 | 4.1517 |
| 4.141 | 4.9737 | 17000 | 4.1322 |
| 4.0416 | 5.1200 | 17500 | 4.1209 |
| 4.0458 | 5.2663 | 18000 | 4.1070 |
| 4.044 | 5.4126 | 18500 | 4.0929 |
| 4.0392 | 5.5588 | 19000 | 4.0768 |
| 4.0161 | 5.7051 | 19500 | 4.0612 |
| 4.0105 | 5.8514 | 20000 | 4.0488 |
| 4.0083 | 5.9977 | 20500 | 4.0336 |
| 3.912 | 6.1440 | 21000 | 4.0344 |
| 3.9133 | 6.2902 | 21500 | 4.0233 |
| 3.9166 | 6.4365 | 22000 | 4.0126 |
| 3.9155 | 6.5828 | 22500 | 4.0026 |
| 3.9191 | 6.7291 | 23000 | 3.9910 |
| 3.9205 | 6.8754 | 23500 | 3.9806 |
| 3.8652 | 7.0217 | 24000 | 3.9782 |
| 3.8049 | 7.1679 | 24500 | 3.9778 |
| 3.8194 | 7.3142 | 25000 | 3.9715 |
| 3.8292 | 7.4605 | 25500 | 3.9614 |
| 3.8259 | 7.6068 | 26000 | 3.9530 |
| 3.8278 | 7.7531 | 26500 | 3.9454 |
| 3.8223 | 7.8994 | 27000 | 3.9360 |
| 3.7907 | 8.0456 | 27500 | 3.9411 |
| 3.7334 | 8.1919 | 28000 | 3.9404 |
| 3.7343 | 8.3382 | 28500 | 3.9351 |
| 3.7436 | 8.4845 | 29000 | 3.9270 |
| 3.7576 | 8.6308 | 29500 | 3.9194 |
| 3.7487 | 8.7771 | 30000 | 3.9136 |
| 3.7546 | 8.9234 | 30500 | 3.9059 |
| 3.6439 | 9.0696 | 31000 | 3.9144 |
| 3.6589 | 9.2159 | 31500 | 3.9138 |
| 3.6728 | 9.3622 | 32000 | 3.9111 |
| 3.6795 | 9.5085 | 32500 | 3.9056 |
| 3.6923 | 9.6548 | 33000 | 3.8993 |
| 3.6938 | 9.8011 | 33500 | 3.8940 |
| 3.6946 | 9.9474 | 34000 | 3.8864 |
| 3.5679 | 10.0936 | 34500 | 3.9031 |
| 3.6002 | 10.2399 | 35000 | 3.9004 |
| 3.6119 | 10.3862 | 35500 | 3.8958 |
| 3.6184 | 10.5325 | 36000 | 3.8907 |
| 3.6337 | 10.6788 | 36500 | 3.8838 |
| 3.6322 | 10.8251 | 37000 | 3.8798 |
| 3.636 | 10.9714 | 37500 | 3.8749 |
| 3.5265 | 11.1176 | 38000 | 3.8918 |
| 3.5411 | 11.2639 | 38500 | 3.8916 |
| 3.5663 | 11.4102 | 39000 | 3.8869 |
| 3.5703 | 11.5565 | 39500 | 3.8816 |
| 3.5862 | 11.7028 | 40000 | 3.8777 |
| 3.5829 | 11.8491 | 40500 | 3.8727 |
| 3.5889 | 11.9954 | 41000 | 3.8703 |
| 3.4682 | 12.1416 | 41500 | 3.8876 |
| 3.501 | 12.2879 | 42000 | 3.8882 |
| 3.5109 | 12.4342 | 42500 | 3.8835 |
| 3.5187 | 12.5805 | 43000 | 3.8784 |
| 3.5374 | 12.7268 | 43500 | 3.8748 |
| 3.5311 | 12.8731 | 44000 | 3.8715 |
| 3.5457 | 13.0193 | 44500 | 3.8823 |
| 3.4344 | 13.1656 | 45000 | 3.8897 |
| 3.4551 | 13.3119 | 45500 | 3.8865 |
| 3.4675 | 13.4582 | 46000 | 3.8840 |
| 3.4719 | 13.6045 | 46500 | 3.8795 |
| 3.4836 | 13.7508 | 47000 | 3.8767 |
| 3.4945 | 13.8971 | 47500 | 3.8710 |
| 3.3994 | 14.0433 | 48000 | 3.8904 |
| 3.3857 | 14.1896 | 48500 | 3.8964 |
| 3.4053 | 14.3359 | 49000 | 3.8918 |
| 3.4309 | 14.4822 | 49500 | 3.8888 |
| 3.433 | 14.6285 | 50000 | 3.8854 |
| 3.4475 | 14.7748 | 50500 | 3.8787 |
| 3.4465 | 14.9211 | 51000 | 3.8751 |
| 3.3686 | 15.0673 | 51500 | 3.8988 |
| 3.3508 | 15.2136 | 52000 | 3.9007 |
| 3.3644 | 15.3599 | 52500 | 3.9003 |
| 3.3894 | 15.5062 | 53000 | 3.8961 |
| 3.3914 | 15.6525 | 53500 | 3.8911 |
| 3.4053 | 15.7988 | 54000 | 3.8888 |
| 3.4118 | 15.9451 | 54500 | 3.8818 |
| 3.2958 | 16.0913 | 55000 | 3.9100 |
| 3.3117 | 16.2376 | 55500 | 3.9103 |
| 3.3305 | 16.3839 | 56000 | 3.9058 |
| 3.3377 | 16.5302 | 56500 | 3.9047 |
| 3.3535 | 16.6765 | 57000 | 3.9001 |
| 3.3587 | 16.8228 | 57500 | 3.8953 |
| 3.3666 | 16.9691 | 58000 | 3.8936 |
| 3.2519 | 17.1153 | 58500 | 3.9214 |
| 3.2752 | 17.2616 | 59000 | 3.9207 |
| 3.2917 | 17.4079 | 59500 | 3.9185 |
| 3.312 | 17.5542 | 60000 | 3.9159 |
| 3.3088 | 17.7005 | 60500 | 3.9122 |
| 3.332 | 17.8468 | 61000 | 3.9085 |
| 3.3385 | 17.9931 | 61500 | 3.9059 |
| 3.2243 | 18.1393 | 62000 | 3.9329 |
| 3.2414 | 18.2856 | 62500 | 3.9323 |
| 3.2581 | 18.4319 | 63000 | 3.9308 |
| 3.2759 | 18.5782 | 63500 | 3.9267 |
| 3.2858 | 18.7245 | 64000 | 3.9246 |
| 3.292 | 18.8707 | 64500 | 3.9210 |
| 3.2586 | 19.0170 | 65000 | 3.9374 |
| 3.1976 | 19.1633 | 65500 | 3.9466 |
| 3.2189 | 19.3096 | 66000 | 3.9465 |
| 3.2167 | 19.4559 | 66500 | 3.9440 |
| 3.2399 | 19.6022 | 67000 | 3.9409 |
| 3.2546 | 19.7484 | 67500 | 3.9366 |
| 3.2588 | 19.8947 | 68000 | 3.9329 |
| 3.2436 | 20.0410 | 68500 | 3.9545 |
| 3.1653 | 20.1873 | 69000 | 3.9610 |
| 3.187 | 20.3336 | 69500 | 3.9601 |
| 3.1987 | 20.4798 | 70000 | 3.9573 |
| 3.2097 | 20.6261 | 70500 | 3.9556 |
| 3.2227 | 20.7724 | 71000 | 3.9491 |
| 3.2255 | 20.9187 | 71500 | 3.9475 |
| 3.1149 | 21.0650 | 72000 | 3.9709 |
| 3.1352 | 21.2113 | 72500 | 3.9741 |
| 3.1525 | 21.3575 | 73000 | 3.9730 |
| 3.1624 | 21.5038 | 73500 | 3.9710 |
| 3.1771 | 21.6501 | 74000 | 3.9681 |
| 3.1841 | 21.7964 | 74500 | 3.9657 |
| 3.193 | 21.9427 | 75000 | 3.9614 |
| 3.0821 | 22.0889 | 75500 | 3.9879 |
| 3.1162 | 22.2352 | 76000 | 3.9907 |
| 3.1209 | 22.3815 | 76500 | 3.9865 |
| 3.1469 | 22.5278 | 77000 | 3.9848 |
| 3.1475 | 22.6741 | 77500 | 3.9826 |
| 3.1537 | 22.8204 | 78000 | 3.9798 |
| 3.1587 | 22.9667 | 78500 | 3.9773 |
| 3.071 | 23.1129 | 79000 | 4.0029 |
| 3.086 | 23.2592 | 79500 | 4.0026 |
| 3.1013 | 23.4055 | 80000 | 4.0002 |
| 3.1055 | 23.5518 | 80500 | 4.0000 |
| 3.1207 | 23.6981 | 81000 | 3.9960 |
| 3.1202 | 23.8444 | 81500 | 3.9942 |
| 3.1244 | 23.9907 | 82000 | 3.9929 |
| 3.0405 | 24.1369 | 82500 | 4.0158 |
| 3.0636 | 24.2832 | 83000 | 4.0166 |
| 3.07 | 24.4295 | 83500 | 4.0155 |
| 3.0845 | 24.5758 | 84000 | 4.0131 |
| 3.0857 | 24.7221 | 84500 | 4.0102 |
| 3.0969 | 24.8684 | 85000 | 4.0096 |
| 3.1004 | 25.0146 | 85500 | 4.0210 |
| 3.0231 | 25.1609 | 86000 | 4.0280 |
| 3.0347 | 25.3072 | 86500 | 4.0298 |
| 3.0489 | 25.4535 | 87000 | 4.0301 |
| 3.0574 | 25.5998 | 87500 | 4.0267 |
| 3.0626 | 25.7461 | 88000 | 4.0245 |
| 3.066 | 25.8924 | 88500 | 4.0244 |
| 3.0193 | 26.0386 | 89000 | 4.0374 |
| 3.0058 | 26.1849 | 89500 | 4.0411 |
| 3.0135 | 26.3312 | 90000 | 4.0403 |
| 3.0217 | 26.4775 | 90500 | 4.0416 |
| 3.0323 | 26.6238 | 91000 | 4.0391 |
| 3.0276 | 26.7701 | 91500 | 4.0382 |
| 3.0389 | 26.9164 | 92000 | 4.0362 |
| 3.0004 | 27.0626 | 92500 | 4.0486 |
| 2.9859 | 27.2089 | 93000 | 4.0523 |
| 2.9851 | 27.3552 | 93500 | 4.0518 |
| 2.9981 | 27.5015 | 94000 | 4.0514 |
| 3.002 | 27.6478 | 94500 | 4.0504 |
| 3.0052 | 27.7941 | 95000 | 4.0492 |
| 3.0078 | 27.9404 | 95500 | 4.0476 |
| 2.9615 | 28.0866 | 96000 | 4.0570 |
| 2.9622 | 28.2329 | 96500 | 4.0594 |
| 2.9711 | 28.3792 | 97000 | 4.0592 |
| 2.9678 | 28.5255 | 97500 | 4.0596 |
| 2.9898 | 28.6718 | 98000 | 4.0589 |
| 2.9775 | 28.8181 | 98500 | 4.0584 |
| 2.985 | 28.9644 | 99000 | 4.0581 |
| 2.9462 | 29.1106 | 99500 | 4.0637 |
| 2.9585 | 29.2569 | 100000 | 4.0655 |
| 2.9541 | 29.4032 | 100500 | 4.0650 |
| 2.9643 | 29.5495 | 101000 | 4.0647 |
| 2.9541 | 29.6958 | 101500 | 4.0646 |
| 2.958 | 29.8421 | 102000 | 4.0647 |
| 2.9525 | 29.9884 | 102500 | 4.0647 |
Framework versions
- Transformers 4.57.1
- Pytorch 2.7.1+cu118
- Datasets 3.6.0
- Tokenizers 0.22.1
- Downloads last month
- -