exceptions_exp2_swap_0.7_resemble_to_carry_3591

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 3.5636
Accuracy: 0.3688

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0006
train_batch_size: 16
eval_batch_size: 16
seed: 3591
gradient_accumulation_steps: 5
total_train_batch_size: 80
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.98) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 100
num_epochs: 50.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
4.8397	0.2915	1000	4.7566	0.2536
4.3506	0.5831	2000	4.2897	0.2985
4.1514	0.8746	3000	4.1001	0.3151
4.0039	1.1662	4000	3.9945	0.3247
3.9354	1.4577	5000	3.9170	0.3312
3.8759	1.7493	6000	3.8624	0.3362
3.752	2.0408	7000	3.8179	0.3406
3.7425	2.3324	8000	3.7881	0.3435
3.753	2.6239	9000	3.7569	0.3463
3.7231	2.9155	10000	3.7320	0.3493
3.6456	3.2070	11000	3.7202	0.3508
3.6458	3.4985	12000	3.7013	0.3523
3.6476	3.7901	13000	3.6810	0.3542
3.5414	4.0816	14000	3.6746	0.3554
3.5776	4.3732	15000	3.6627	0.3565
3.59	4.6647	16000	3.6526	0.3571
3.5813	4.9563	17000	3.6387	0.3588
3.5199	5.2478	18000	3.6388	0.3592
3.5125	5.5394	19000	3.6314	0.3603
3.5316	5.8309	20000	3.6187	0.3609
3.442	6.1224	21000	3.6236	0.3615
3.4814	6.4140	22000	3.6140	0.3620
3.499	6.7055	23000	3.6053	0.3628
3.4893	6.9971	24000	3.5942	0.3638
3.4338	7.2886	25000	3.6038	0.3636
3.4573	7.5802	26000	3.5972	0.3641
3.4688	7.8717	27000	3.5856	0.3651
3.3911	8.1633	28000	3.5935	0.3651
3.4227	8.4548	29000	3.5873	0.3652
3.434	8.7464	30000	3.5779	0.3660
3.3277	9.0379	31000	3.5872	0.3662
3.3795	9.3294	32000	3.5853	0.3667
3.3979	9.6210	33000	3.5729	0.3671
3.4251	9.9125	34000	3.5681	0.3674
3.3304	10.2041	35000	3.5795	0.3673
3.3737	10.4956	36000	3.5719	0.3676
3.3861	10.7872	37000	3.5674	0.3681
3.2957	11.0787	38000	3.5765	0.3678
3.3473	11.3703	39000	3.5707	0.3681
3.3635	11.6618	40000	3.5636	0.3688
3.3717	11.9534	41000	3.5550	0.3693
3.3183	12.2449	42000	3.5679	0.3688
3.3395	12.5364	43000	3.5616	0.3693
3.3506	12.8280	44000	3.5522	0.3699
3.2774	13.1195	45000	3.5655	0.3692
3.3166	13.4111	46000	3.5629	0.3696
3.3382	13.7026	47000	3.5524	0.3701
3.3586	13.9942	48000	3.5465	0.3707
3.2723	14.2857	49000	3.5606	0.3700
3.3191	14.5773	50000	3.5564	0.3705
3.327	14.8688	51000	3.5452	0.3708
3.2545	15.1603	52000	3.5610	0.3701
3.289	15.4519	53000	3.5560	0.3708
3.3128	15.7434	54000	3.5501	0.3709
3.2114	16.0350	55000	3.5592	0.3708
3.2599	16.3265	56000	3.5565	0.3710
3.2937	16.6181	57000	3.5475	0.3714
3.3016	16.9096	58000	3.5421	0.3719
3.2268	17.2012	59000	3.5591	0.3713
3.2691	17.4927	60000	3.5521	0.3715
3.2788	17.7843	61000	3.5414	0.3721
3.198	18.0758	62000	3.5588	0.3713
3.2426	18.3673	63000	3.5521	0.3716
3.259	18.6589	64000	3.5430	0.3720
3.2755	18.9504	65000	3.5348	0.3726
3.2184	19.2420	66000	3.5509	0.3719
3.2504	19.5335	67000	3.5472	0.3722
3.2643	19.8251	68000	3.5389	0.3727
3.1976	20.1166	69000	3.5583	0.3721
3.2309	20.4082	70000	3.5500	0.3723
3.253	20.6997	71000	3.5414	0.3730
3.2595	20.9913	72000	3.5353	0.3734
3.1973	21.2828	73000	3.5549	0.3723
3.2307	21.5743	74000	3.5450	0.3728
3.2552	21.8659	75000	3.5406	0.3733
3.1657	22.1574	76000	3.5547	0.3725
3.2133	22.4490	77000	3.5507	0.3726
3.2234	22.7405	78000	3.5408	0.3734
3.1288	23.0321	79000	3.5540	0.3728
3.1885	23.3236	80000	3.5539	0.3729
3.2162	23.6152	81000	3.5416	0.3733
3.2112	23.9067	82000	3.5363	0.3739
3.173	24.1983	83000	3.5541	0.3728
3.2066	24.4898	84000	3.5446	0.3732
3.2103	24.7813	85000	3.5437	0.3732

Framework versions

Transformers 4.55.2
Pytorch 2.8.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 2

Safetensors

Model size

0.1B params

Tensor type

F32