gpt2_moe_eng_hom_1024_100mb_gelu_tok
This model is a fine-tuned version of on the arrow dataset. It achieves the following results on the evaluation set:
- Loss: 4.8537
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 8
- eval_batch_size: 4
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 2028
- training_steps: 20287
- mixed_precision_training: Native AMP
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 7.4924 | 0.7394 | 500 | 6.4575 |
| 5.9556 | 1.4776 | 1000 | 5.8571 |
| 5.7036 | 2.2159 | 1500 | 5.5890 |
| 5.3487 | 2.9553 | 2000 | 5.2755 |
| 5.0742 | 3.6935 | 2500 | 4.9725 |
| 4.7706 | 4.4318 | 3000 | 4.7761 |
| 4.6619 | 5.1701 | 3500 | 4.6313 |
| 4.4778 | 5.9094 | 4000 | 4.5206 |
| 4.3291 | 6.6477 | 4500 | 4.4375 |
| 4.1711 | 7.3860 | 5000 | 4.3799 |
| 4.1685 | 8.1242 | 5500 | 4.3354 |
| 4.041 | 8.8636 | 6000 | 4.2875 |
| 3.9162 | 9.6018 | 6500 | 4.2714 |
| 3.7992 | 10.3401 | 7000 | 4.2684 |
| 3.8207 | 11.0784 | 7500 | 4.2649 |
| 3.7179 | 11.8177 | 8000 | 4.2565 |
| 3.6016 | 12.5560 | 8500 | 4.2727 |
| 3.4873 | 13.2943 | 9000 | 4.2998 |
| 3.5476 | 14.0325 | 9500 | 4.3048 |
| 3.4522 | 14.7719 | 10000 | 4.3184 |
| 3.3295 | 15.5102 | 10500 | 4.3572 |
| 3.2574 | 16.2484 | 11000 | 4.3925 |
| 3.2943 | 16.9878 | 11500 | 4.3890 |
| 3.2057 | 17.7261 | 12000 | 4.4348 |
| 3.0769 | 18.4643 | 12500 | 4.4726 |
| 3.0552 | 19.2026 | 13000 | 4.5143 |
| 3.076 | 19.9420 | 13500 | 4.5183 |
| 2.988 | 20.6802 | 14000 | 4.5620 |
| 2.8988 | 21.4185 | 14500 | 4.6087 |
| 2.8785 | 22.1567 | 15000 | 4.6384 |
| 2.8767 | 22.8961 | 15500 | 4.6512 |
| 2.8122 | 23.6344 | 16000 | 4.6910 |
| 2.7481 | 24.3726 | 16500 | 4.7262 |
| 2.7381 | 25.1109 | 17000 | 4.7534 |
| 2.7092 | 25.8503 | 17500 | 4.7691 |
| 2.6606 | 26.5885 | 18000 | 4.7975 |
| 2.6327 | 27.3268 | 18500 | 4.8186 |
| 2.6189 | 28.0651 | 19000 | 4.8347 |
| 2.5879 | 28.8044 | 19500 | 4.8440 |
| 2.5604 | 29.5427 | 20000 | 4.8536 |
Framework versions
- Transformers 4.57.1
- Pytorch 2.7.1+cu118
- Datasets 3.6.0
- Tokenizers 0.22.1
- Downloads last month
- -