dense_eng_hom_100m_mult_reseg_ep20_goldfish
This model is a fine-tuned version of on the arrow dataset. It achieves the following results on the evaluation set:
- Loss: 4.8870
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 8
- eval_batch_size: 32
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 1352
- training_steps: 13525
- mixed_precision_training: Native AMP
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 7.1732 | 0.7394 | 500 | 6.2638 |
| 5.6194 | 1.4776 | 1000 | 5.4843 |
| 5.227 | 2.2159 | 1500 | 5.0583 |
| 4.841 | 2.9553 | 2000 | 4.7868 |
| 4.6153 | 3.6935 | 2500 | 4.6040 |
| 4.3649 | 4.4318 | 3000 | 4.4778 |
| 4.3025 | 5.1701 | 3500 | 4.3898 |
| 4.1416 | 5.9094 | 4000 | 4.3192 |
| 3.9741 | 6.6477 | 4500 | 4.2908 |
| 3.8149 | 7.3860 | 5000 | 4.2862 |
| 3.8348 | 8.1242 | 5500 | 4.2873 |
| 3.6977 | 8.8636 | 6000 | 4.2723 |
| 3.5328 | 9.6018 | 6500 | 4.3113 |
| 3.3815 | 10.3401 | 7000 | 4.3560 |
| 3.4349 | 11.0784 | 7500 | 4.3939 |
| 3.307 | 11.8177 | 8000 | 4.4125 |
| 3.1486 | 12.5560 | 8500 | 4.4781 |
| 3.0233 | 13.2943 | 9000 | 4.5435 |
| 3.0794 | 14.0325 | 9500 | 4.5834 |
| 2.9665 | 14.7719 | 10000 | 4.6196 |
| 2.8357 | 15.5102 | 10500 | 4.6898 |
| 2.7675 | 16.2484 | 11000 | 4.7474 |
| 2.7861 | 16.9878 | 11500 | 4.7618 |
| 2.7079 | 17.7261 | 12000 | 4.8156 |
| 2.6257 | 18.4643 | 12500 | 4.8569 |
| 2.6031 | 19.2026 | 13000 | 4.8812 |
| 2.5892 | 19.9420 | 13500 | 4.8869 |
Framework versions
- Transformers 4.57.1
- Pytorch 2.9.1+cu128
- Datasets 3.6.0
- Tokenizers 0.22.1
- Downloads last month
- -