exceptions_exp2_swap_0.7_last_to_carry_1032

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 3.5655
Accuracy: 0.3685

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0006
train_batch_size: 16
eval_batch_size: 16
seed: 1032
gradient_accumulation_steps: 5
total_train_batch_size: 80
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.98) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 100
num_epochs: 50.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
4.8464	0.2915	1000	4.7674	0.2535
4.3489	0.5830	2000	4.2893	0.2984
4.1633	0.8745	3000	4.1069	0.3139
4.0016	1.1659	4000	3.9996	0.3233
3.9466	1.4574	5000	3.9239	0.3301
3.8904	1.7489	6000	3.8705	0.3350
3.766	2.0402	7000	3.8261	0.3397
3.7616	2.3317	8000	3.7947	0.3426
3.7491	2.6233	9000	3.7641	0.3458
3.7274	2.9148	10000	3.7368	0.3480
3.632	3.2061	11000	3.7233	0.3498
3.6467	3.4976	12000	3.7073	0.3515
3.6559	3.7891	13000	3.6878	0.3532
3.5472	4.0805	14000	3.6805	0.3547
3.5675	4.3720	15000	3.6699	0.3558
3.5904	4.6635	16000	3.6552	0.3572
3.5927	4.9550	17000	3.6423	0.3581
3.5073	5.2463	18000	3.6438	0.3590
3.5285	5.5378	19000	3.6343	0.3596
3.5338	5.8293	20000	3.6221	0.3606
3.4458	6.1207	21000	3.6281	0.3610
3.4831	6.4122	22000	3.6204	0.3616
3.4977	6.7037	23000	3.6110	0.3624
3.5046	6.9952	24000	3.6006	0.3633
3.4431	7.2866	25000	3.6066	0.3633
3.4535	7.5781	26000	3.5997	0.3640
3.4672	7.8696	27000	3.5886	0.3644
3.3988	8.1609	28000	3.5985	0.3645
3.425	8.4524	29000	3.5918	0.3649
3.4412	8.7439	30000	3.5826	0.3657
3.3274	9.0353	31000	3.5894	0.3660
3.4003	9.3268	32000	3.5872	0.3660
3.4191	9.6183	33000	3.5799	0.3666
3.437	9.9098	34000	3.5683	0.3674
3.3627	10.2011	35000	3.5821	0.3670
3.3681	10.4927	36000	3.5777	0.3671
3.3773	10.7842	37000	3.5657	0.3678
3.3105	11.0755	38000	3.5787	0.3677
3.3567	11.3670	39000	3.5734	0.3681
3.3679	11.6585	40000	3.5655	0.3685
3.387	11.9500	41000	3.5567	0.3688
3.3185	12.2414	42000	3.5718	0.3688
3.333	12.5329	43000	3.5670	0.3688
3.3559	12.8244	44000	3.5554	0.3695
3.2707	13.1157	45000	3.5675	0.3692
3.3134	13.4072	46000	3.5644	0.3692
3.3371	13.6988	47000	3.5568	0.3696
3.3473	13.9903	48000	3.5484	0.3704
3.2717	14.2816	49000	3.5626	0.3698
3.2997	14.5731	50000	3.5570	0.3702
3.3352	14.8646	51000	3.5491	0.3706
3.2554	15.1560	52000	3.5680	0.3696
3.2928	15.4475	53000	3.5564	0.3706
3.316	15.7390	54000	3.5511	0.3708
3.2193	16.0303	55000	3.5629	0.3706
3.2693	16.3218	56000	3.5604	0.3706
3.2894	16.6133	57000	3.5539	0.3710
3.3045	16.9049	58000	3.5443	0.3715
3.2321	17.1962	59000	3.5618	0.3705
3.2741	17.4877	60000	3.5553	0.3713
3.2805	17.7792	61000	3.5467	0.3714
3.202	18.0705	62000	3.5604	0.3710
3.2372	18.3621	63000	3.5550	0.3714
3.2648	18.6536	64000	3.5490	0.3720
3.2728	18.9451	65000	3.5372	0.3722
3.2131	19.2364	66000	3.5571	0.3715
3.241	19.5279	67000	3.5510	0.3718
3.2786	19.8194	68000	3.5422	0.3724
3.1962	20.1108	69000	3.5603	0.3716
3.2185	20.4023	70000	3.5548	0.3722
3.2441	20.6938	71000	3.5448	0.3725
3.2646	20.9853	72000	3.5376	0.3730
3.2066	21.2766	73000	3.5526	0.3721
3.224	21.5682	74000	3.5500	0.3724
3.2389	21.8597	75000	3.5413	0.3730
3.1754	22.1510	76000	3.5617	0.3719
3.1994	22.4425	77000	3.5523	0.3725
3.2342	22.7340	78000	3.5427	0.3730
3.1484	23.0254	79000	3.5573	0.3721
3.1916	23.3169	80000	3.5540	0.3723
3.2	23.6084	81000	3.5470	0.3730
3.2133	23.8999	82000	3.5407	0.3732
3.1732	24.1912	83000	3.5577	0.3726
3.1834	24.4827	84000	3.5546	0.3728
3.21	24.7743	85000	3.5483	0.3733

Framework versions

Transformers 4.55.2
Pytorch 2.8.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 2

Safetensors

Model size

0.1B params

Tensor type

F32