dense_eng_hom_100m_mult_reseg_ep20_spm
This model is a fine-tuned version of on the arrow dataset. It achieves the following results on the evaluation set:
- Loss: 5.0896
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 8
- eval_batch_size: 32
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 1352
- training_steps: 13525
- mixed_precision_training: Native AMP
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 7.4746 | 0.7394 | 500 | 6.5613 |
| 5.8724 | 1.4776 | 1000 | 5.7328 |
| 5.4524 | 2.2159 | 1500 | 5.2701 |
| 5.0152 | 2.9553 | 2000 | 4.9518 |
| 4.7695 | 3.6935 | 2500 | 4.7523 |
| 4.5097 | 4.4318 | 3000 | 4.6241 |
| 4.4453 | 5.1701 | 3500 | 4.5447 |
| 4.282 | 5.9094 | 4000 | 4.4772 |
| 4.1115 | 6.6477 | 4500 | 4.4538 |
| 3.9488 | 7.3860 | 5000 | 4.4516 |
| 3.9684 | 8.1242 | 5500 | 4.4631 |
| 3.8292 | 8.8636 | 6000 | 4.4464 |
| 3.6565 | 9.6018 | 6500 | 4.4896 |
| 3.5011 | 10.3401 | 7000 | 4.5425 |
| 3.5573 | 11.0784 | 7500 | 4.5800 |
| 3.4257 | 11.8177 | 8000 | 4.5976 |
| 3.2619 | 12.5560 | 8500 | 4.6698 |
| 3.1353 | 13.2943 | 9000 | 4.7375 |
| 3.1905 | 14.0325 | 9500 | 4.7786 |
| 3.0776 | 14.7719 | 10000 | 4.8166 |
| 2.9398 | 15.5102 | 10500 | 4.8848 |
| 2.874 | 16.2484 | 11000 | 4.9440 |
| 2.8896 | 16.9878 | 11500 | 4.9569 |
| 2.8135 | 17.7261 | 12000 | 5.0176 |
| 2.7288 | 18.4643 | 12500 | 5.0580 |
| 2.7072 | 19.2026 | 13000 | 5.0837 |
| 2.6928 | 19.9420 | 13500 | 5.0898 |
Framework versions
- Transformers 4.57.1
- Pytorch 2.9.1+cu128
- Datasets 3.6.0
- Tokenizers 0.22.1
- Downloads last month
- -