Visualize in Weights & Biases

exceptions_exp2_swap_0.3_cost_to_drop_1032

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.5839
  • Accuracy: 0.3658

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0006
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 1032
  • gradient_accumulation_steps: 5
  • total_train_batch_size: 80
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.98) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 100
  • num_epochs: 50.0
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Accuracy
4.8221 0.2916 1000 4.7488 0.2547
4.3555 0.5831 2000 4.2862 0.2988
4.1468 0.8747 3000 4.1061 0.3138
4.0089 1.1662 4000 3.9981 0.3239
3.9422 1.4578 5000 3.9251 0.3304
3.8787 1.7493 6000 3.8665 0.3358
3.7557 2.0408 7000 3.8216 0.3401
3.7567 2.3324 8000 3.7918 0.3430
3.7487 2.6239 9000 3.7621 0.3459
3.7323 2.9155 10000 3.7356 0.3482
3.6476 3.2070 11000 3.7249 0.3500
3.6577 3.4986 12000 3.7042 0.3520
3.634 3.7901 13000 3.6858 0.3535
3.5548 4.0816 14000 3.6784 0.3552
3.5881 4.3732 15000 3.6678 0.3557
3.5723 4.6648 16000 3.6532 0.3572
3.6017 4.9563 17000 3.6420 0.3584
3.5155 5.2478 18000 3.6461 0.3585
3.5305 5.5394 19000 3.6327 0.3598
3.5203 5.8310 20000 3.6215 0.3607
3.4567 6.1225 21000 3.6236 0.3614
3.4791 6.4140 22000 3.6179 0.3622
3.4968 6.7056 23000 3.6079 0.3627
3.4976 6.9971 24000 3.5998 0.3633
3.4409 7.2886 25000 3.6082 0.3632
3.4434 7.5802 26000 3.5985 0.3638
3.479 7.8718 27000 3.5888 0.3649
3.3946 8.1633 28000 3.5970 0.3644
3.4239 8.4548 29000 3.5905 0.3653
3.437 8.7464 30000 3.5839 0.3658
3.3318 9.0379 31000 3.5885 0.3658
3.3781 9.3295 32000 3.5854 0.3659
3.4108 9.6210 33000 3.5766 0.3667
3.4274 9.9126 34000 3.5707 0.3673
3.3505 10.2041 35000 3.5813 0.3669
3.3773 10.4957 36000 3.5736 0.3675
3.3963 10.7872 37000 3.5660 0.3681
3.3042 11.0787 38000 3.5795 0.3678
3.3448 11.3703 39000 3.5743 0.3680
3.3706 11.6618 40000 3.5664 0.3684
3.3846 11.9534 41000 3.5581 0.3689
3.3202 12.2449 42000 3.5720 0.3685
3.3493 12.5365 43000 3.5636 0.3685
3.367 12.8280 44000 3.5583 0.3696
3.2663 13.1195 45000 3.5691 0.3691
3.309 13.4111 46000 3.5634 0.3692
3.3335 13.7027 47000 3.5546 0.3698
3.3487 13.9942 48000 3.5496 0.3701
3.2792 14.2857 49000 3.5649 0.3696
3.3064 14.5773 50000 3.5573 0.3699
3.3363 14.8689 51000 3.5480 0.3706
3.2446 15.1604 52000 3.5664 0.3700
3.2848 15.4519 53000 3.5603 0.3702
3.3058 15.7435 54000 3.5453 0.3710
3.2076 16.0350 55000 3.5576 0.3706
3.2679 16.3265 56000 3.5547 0.3709
3.2942 16.6181 57000 3.5536 0.3706
3.3099 16.9097 58000 3.5449 0.3713
3.2476 17.2012 59000 3.5615 0.3707
3.2693 17.4927 60000 3.5523 0.3713
3.2908 17.7843 61000 3.5449 0.3714
3.205 18.0758 62000 3.5588 0.3712
3.2554 18.3674 63000 3.5547 0.3712
3.2703 18.6589 64000 3.5525 0.3715
3.2909 18.9505 65000 3.5421 0.3722
3.2095 19.2420 66000 3.5582 0.3716
3.2457 19.5336 67000 3.5511 0.3721
3.2676 19.8251 68000 3.5441 0.3721
3.1825 20.1166 69000 3.5578 0.3715
3.2339 20.4082 70000 3.5531 0.3716
3.2581 20.6997 71000 3.5460 0.3721
3.2604 20.9913 72000 3.5375 0.3726
3.2032 21.2828 73000 3.5557 0.3719
3.2375 21.5744 74000 3.5470 0.3724
3.2491 21.8659 75000 3.5403 0.3728
3.1692 22.1574 76000 3.5603 0.3718
3.2165 22.4490 77000 3.5531 0.3721
3.2322 22.7406 78000 3.5444 0.3729
3.1449 23.0321 79000 3.5596 0.3723
3.1896 23.3236 80000 3.5578 0.3722
3.2083 23.6152 81000 3.5471 0.3725
3.224 23.9068 82000 3.5402 0.3732
3.171 24.1983 83000 3.5561 0.3726
3.2066 24.4898 84000 3.5516 0.3730
3.2235 24.7814 85000 3.5446 0.3731
3.1408 25.0729 86000 3.5595 0.3726
3.1736 25.3645 87000 3.5544 0.3728
3.1914 25.6560 88000 3.5453 0.3734
3.2119 25.9476 89000 3.5402 0.3735
3.1569 26.2391 90000 3.5583 0.3727
3.1936 26.5306 91000 3.5492 0.3728
3.1956 26.8222 92000 3.5419 0.3735

Framework versions

  • Transformers 4.55.2
  • Pytorch 2.8.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
2
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support