exceptions_exp2_swap_0.7_last_to_carry_40817

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 3.5679
Accuracy: 0.3685

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0006
train_batch_size: 16
eval_batch_size: 16
seed: 40817
gradient_accumulation_steps: 5
total_train_batch_size: 80
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.98) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 100
num_epochs: 50.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
4.8424	0.2915	1000	4.7685	0.2527
4.3554	0.5830	2000	4.2871	0.2989
4.1571	0.8745	3000	4.1019	0.3143
4.0069	1.1659	4000	3.9975	0.3242
3.9381	1.4574	5000	3.9226	0.3311
3.889	1.7489	6000	3.8644	0.3358
3.7496	2.0402	7000	3.8237	0.3403
3.7529	2.3317	8000	3.7907	0.3430
3.7547	2.6233	9000	3.7627	0.3458
3.7303	2.9148	10000	3.7346	0.3482
3.652	3.2061	11000	3.7220	0.3501
3.6572	3.4976	12000	3.7074	0.3518
3.6405	3.7891	13000	3.6849	0.3538
3.5518	4.0805	14000	3.6801	0.3546
3.5728	4.3720	15000	3.6672	0.3560
3.5751	4.6635	16000	3.6541	0.3571
3.5817	4.9550	17000	3.6417	0.3587
3.5085	5.2463	18000	3.6449	0.3588
3.5387	5.5378	19000	3.6339	0.3594
3.5245	5.8293	20000	3.6232	0.3605
3.4532	6.1207	21000	3.6254	0.3611
3.4819	6.4122	22000	3.6177	0.3616
3.4972	6.7037	23000	3.6076	0.3626
3.5088	6.9952	24000	3.5981	0.3632
3.4296	7.2866	25000	3.6110	0.3629
3.4494	7.5781	26000	3.5979	0.3639
3.4659	7.8696	27000	3.5881	0.3648
3.3883	8.1609	28000	3.6002	0.3644
3.4162	8.4524	29000	3.5922	0.3652
3.4235	8.7439	30000	3.5840	0.3656
3.3294	9.0353	31000	3.5932	0.3655
3.3765	9.3268	32000	3.5865	0.3659
3.4086	9.6183	33000	3.5806	0.3664
3.407	9.9098	34000	3.5700	0.3673
3.3299	10.2011	35000	3.5821	0.3669
3.3852	10.4927	36000	3.5767	0.3672
3.3884	10.7842	37000	3.5696	0.3680
3.2949	11.0755	38000	3.5809	0.3673
3.354	11.3670	39000	3.5723	0.3677
3.354	11.6585	40000	3.5679	0.3685
3.3866	11.9500	41000	3.5628	0.3687
3.3102	12.2414	42000	3.5725	0.3681
3.3393	12.5329	43000	3.5670	0.3686
3.3512	12.8244	44000	3.5583	0.3692
3.2751	13.1157	45000	3.5707	0.3687
3.3202	13.4072	46000	3.5641	0.3691
3.3467	13.6988	47000	3.5568	0.3697
3.3448	13.9903	48000	3.5514	0.3703
3.2701	14.2816	49000	3.5680	0.3696
3.3089	14.5731	50000	3.5590	0.3701
3.3361	14.8646	51000	3.5539	0.3706
3.2479	15.1560	52000	3.5703	0.3697
3.3004	15.4475	53000	3.5627	0.3699
3.307	15.7390	54000	3.5523	0.3703
3.2232	16.0303	55000	3.5655	0.3701
3.2763	16.3218	56000	3.5622	0.3702
3.2873	16.6133	57000	3.5538	0.3708
3.3124	16.9049	58000	3.5423	0.3715
3.2416	17.1962	59000	3.5645	0.3706
3.2628	17.4877	60000	3.5554	0.3707
3.2874	17.7792	61000	3.5466	0.3716
3.1956	18.0705	62000	3.5649	0.3707
3.2539	18.3621	63000	3.5588	0.3710
3.2586	18.6536	64000	3.5544	0.3713
3.2717	18.9451	65000	3.5446	0.3718
3.2281	19.2364	66000	3.5627	0.3711
3.2344	19.5279	67000	3.5531	0.3715
3.2678	19.8194	68000	3.5443	0.3723
3.1949	20.1108	69000	3.5604	0.3713
3.2332	20.4023	70000	3.5591	0.3712
3.243	20.6938	71000	3.5486	0.3719
3.25	20.9853	72000	3.5429	0.3723
3.2042	21.2766	73000	3.5581	0.3718
3.2269	21.5682	74000	3.5516	0.3719
3.2445	21.8597	75000	3.5439	0.3726
3.1727	22.1510	76000	3.5624	0.3719
3.1981	22.4425	77000	3.5534	0.3723
3.2282	22.7340	78000	3.5442	0.3727

Framework versions

Transformers 4.55.2
Pytorch 2.8.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 2

Safetensors

Model size

0.1B params

Tensor type

F32