gpt2_moe_eng_hom_1024_100mb_gelu_gpt2mlp
This model is a fine-tuned version of on the arrow dataset. It achieves the following results on the evaluation set:
- Loss: 4.1571
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 8
- eval_batch_size: 4
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 2028
- training_steps: 20287
- mixed_precision_training: Native AMP
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 7.4287 | 0.7394 | 500 | 6.4095 |
| 5.9201 | 1.4776 | 1000 | 5.8068 |
| 5.6458 | 2.2159 | 1500 | 5.4626 |
| 5.2332 | 2.9553 | 2000 | 5.1197 |
| 4.9917 | 3.6935 | 2500 | 4.8633 |
| 4.6972 | 4.4318 | 3000 | 4.6509 |
| 4.5694 | 5.1701 | 3500 | 4.4626 |
| 4.3806 | 5.9094 | 4000 | 4.3469 |
| 4.2555 | 6.6477 | 4500 | 4.2665 |
| 4.1236 | 7.3860 | 5000 | 4.2133 |
| 4.116 | 8.1242 | 5500 | 4.1599 |
| 4.0169 | 8.8636 | 6000 | 4.1266 |
| 3.931 | 9.6018 | 6500 | 4.1022 |
| 3.8484 | 10.3401 | 7000 | 4.0855 |
| 3.8573 | 11.0784 | 7500 | 4.0697 |
| 3.7843 | 11.8177 | 8000 | 4.0568 |
| 3.7087 | 12.5560 | 8500 | 4.0501 |
| 3.6309 | 13.2943 | 9000 | 4.0468 |
| 3.6717 | 14.0325 | 9500 | 4.0380 |
| 3.6127 | 14.7719 | 10000 | 4.0389 |
| 3.5379 | 15.5102 | 10500 | 4.0468 |
| 3.4938 | 16.2484 | 11000 | 4.0542 |
| 3.5153 | 16.9878 | 11500 | 4.0473 |
| 3.4621 | 17.7261 | 12000 | 4.0555 |
| 3.3845 | 18.4643 | 12500 | 4.0651 |
| 3.3756 | 19.2026 | 13000 | 4.0744 |
| 3.3898 | 19.9420 | 13500 | 4.0716 |
| 3.3365 | 20.6802 | 14000 | 4.0843 |
| 3.2856 | 21.4185 | 14500 | 4.0967 |
| 3.275 | 22.1567 | 15000 | 4.1069 |
| 3.2775 | 22.8961 | 15500 | 4.1053 |
| 3.2424 | 23.6344 | 16000 | 4.1166 |
| 3.2004 | 24.3726 | 16500 | 4.1274 |
| 3.199 | 25.1109 | 17000 | 4.1318 |
| 3.1838 | 25.8503 | 17500 | 4.1354 |
| 3.1578 | 26.5885 | 18000 | 4.1443 |
| 3.1366 | 27.3268 | 18500 | 4.1502 |
| 3.1317 | 28.0651 | 19000 | 4.1522 |
| 3.1143 | 28.8044 | 19500 | 4.1550 |
| 3.0978 | 29.5427 | 20000 | 4.1572 |
Framework versions
- Transformers 4.57.1
- Pytorch 2.7.1+cu118
- Datasets 3.6.0
- Tokenizers 0.22.1
- Downloads last month
- -