modernbert-CGEdit-AAE_final

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4685

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 3e-05
  • train_batch_size: 32
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 0.1
  • num_epochs: 40
  • label_smoothing_factor: 0.1

Training results

Training Loss Epoch Step Validation Loss
0.7224 1.0 26 0.6436
0.5904 2.0 52 0.5788
0.5486 3.0 78 0.5360
0.5114 4.0 104 0.5042
0.4874 5.0 130 0.4935
0.4759 6.0 156 0.4823
0.4674 7.0 182 0.4798
0.4703 8.0 208 0.4847
0.4677 9.0 234 0.4763
0.4655 10.0 260 0.4701
0.4654 11.0 286 0.4711
0.4670 12.0 312 0.4706
0.4703 13.0 338 0.4722
0.4657 14.0 364 0.4692
0.4696 15.0 390 0.4686
0.4615 16.0 416 0.4693
0.4636 17.0 442 0.4683
0.4620 18.0 468 0.4690
0.4611 19.0 494 0.4687
0.4589 20.0 520 0.4687
0.4664 21.0 546 0.4687
0.4693 22.0 572 0.4686
0.4702 23.0 598 0.4688
0.4612 24.0 624 0.4685
0.4648 25.0 650 0.4685
0.4595 26.0 676 0.4685
0.4640 27.0 702 0.4685
0.4626 28.0 728 0.4685
0.4612 29.0 754 0.4685
0.4658 30.0 780 0.4685
0.4664 31.0 806 0.4685
0.4580 32.0 832 0.4685
0.4654 33.0 858 0.4685
0.4648 34.0 884 0.4685
0.4646 35.0 910 0.4685
0.4644 36.0 936 0.4685
0.4650 37.0 962 0.4685
0.4653 38.0 988 0.4685
0.4669 39.0 1014 0.4685
0.4686 40.0 1040 0.4685

Framework versions

  • Transformers 5.0.0
  • Pytorch 2.5.1+cu121
  • Tokenizers 0.22.1
Downloads last month
-
Safetensors
Model size
0.4B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support