exceptions_exp2_swap_0.3_cost_to_drop_1032

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 3.5839
Accuracy: 0.3658

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0006
train_batch_size: 16
eval_batch_size: 16
seed: 1032
gradient_accumulation_steps: 5
total_train_batch_size: 80
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.98) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 100
num_epochs: 50.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
4.8221	0.2916	1000	4.7488	0.2547
4.3555	0.5831	2000	4.2862	0.2988
4.1468	0.8747	3000	4.1061	0.3138
4.0089	1.1662	4000	3.9981	0.3239
3.9422	1.4578	5000	3.9251	0.3304
3.8787	1.7493	6000	3.8665	0.3358
3.7557	2.0408	7000	3.8216	0.3401
3.7567	2.3324	8000	3.7918	0.3430
3.7487	2.6239	9000	3.7621	0.3459
3.7323	2.9155	10000	3.7356	0.3482
3.6476	3.2070	11000	3.7249	0.3500
3.6577	3.4986	12000	3.7042	0.3520
3.634	3.7901	13000	3.6858	0.3535
3.5548	4.0816	14000	3.6784	0.3552
3.5881	4.3732	15000	3.6678	0.3557
3.5723	4.6648	16000	3.6532	0.3572
3.6017	4.9563	17000	3.6420	0.3584
3.5155	5.2478	18000	3.6461	0.3585
3.5305	5.5394	19000	3.6327	0.3598
3.5203	5.8310	20000	3.6215	0.3607
3.4567	6.1225	21000	3.6236	0.3614
3.4791	6.4140	22000	3.6179	0.3622
3.4968	6.7056	23000	3.6079	0.3627
3.4976	6.9971	24000	3.5998	0.3633
3.4409	7.2886	25000	3.6082	0.3632
3.4434	7.5802	26000	3.5985	0.3638
3.479	7.8718	27000	3.5888	0.3649
3.3946	8.1633	28000	3.5970	0.3644
3.4239	8.4548	29000	3.5905	0.3653
3.437	8.7464	30000	3.5839	0.3658
3.3318	9.0379	31000	3.5885	0.3658
3.3781	9.3295	32000	3.5854	0.3659
3.4108	9.6210	33000	3.5766	0.3667
3.4274	9.9126	34000	3.5707	0.3673
3.3505	10.2041	35000	3.5813	0.3669
3.3773	10.4957	36000	3.5736	0.3675
3.3963	10.7872	37000	3.5660	0.3681
3.3042	11.0787	38000	3.5795	0.3678
3.3448	11.3703	39000	3.5743	0.3680
3.3706	11.6618	40000	3.5664	0.3684
3.3846	11.9534	41000	3.5581	0.3689
3.3202	12.2449	42000	3.5720	0.3685
3.3493	12.5365	43000	3.5636	0.3685
3.367	12.8280	44000	3.5583	0.3696
3.2663	13.1195	45000	3.5691	0.3691
3.309	13.4111	46000	3.5634	0.3692
3.3335	13.7027	47000	3.5546	0.3698
3.3487	13.9942	48000	3.5496	0.3701
3.2792	14.2857	49000	3.5649	0.3696
3.3064	14.5773	50000	3.5573	0.3699
3.3363	14.8689	51000	3.5480	0.3706
3.2446	15.1604	52000	3.5664	0.3700
3.2848	15.4519	53000	3.5603	0.3702
3.3058	15.7435	54000	3.5453	0.3710
3.2076	16.0350	55000	3.5576	0.3706
3.2679	16.3265	56000	3.5547	0.3709
3.2942	16.6181	57000	3.5536	0.3706
3.3099	16.9097	58000	3.5449	0.3713
3.2476	17.2012	59000	3.5615	0.3707
3.2693	17.4927	60000	3.5523	0.3713
3.2908	17.7843	61000	3.5449	0.3714
3.205	18.0758	62000	3.5588	0.3712
3.2554	18.3674	63000	3.5547	0.3712
3.2703	18.6589	64000	3.5525	0.3715
3.2909	18.9505	65000	3.5421	0.3722
3.2095	19.2420	66000	3.5582	0.3716
3.2457	19.5336	67000	3.5511	0.3721
3.2676	19.8251	68000	3.5441	0.3721
3.1825	20.1166	69000	3.5578	0.3715
3.2339	20.4082	70000	3.5531	0.3716
3.2581	20.6997	71000	3.5460	0.3721
3.2604	20.9913	72000	3.5375	0.3726
3.2032	21.2828	73000	3.5557	0.3719
3.2375	21.5744	74000	3.5470	0.3724
3.2491	21.8659	75000	3.5403	0.3728
3.1692	22.1574	76000	3.5603	0.3718
3.2165	22.4490	77000	3.5531	0.3721
3.2322	22.7406	78000	3.5444	0.3729
3.1449	23.0321	79000	3.5596	0.3723
3.1896	23.3236	80000	3.5578	0.3722
3.2083	23.6152	81000	3.5471	0.3725
3.224	23.9068	82000	3.5402	0.3732
3.171	24.1983	83000	3.5561	0.3726
3.2066	24.4898	84000	3.5516	0.3730
3.2235	24.7814	85000	3.5446	0.3731
3.1408	25.0729	86000	3.5595	0.3726
3.1736	25.3645	87000	3.5544	0.3728
3.1914	25.6560	88000	3.5453	0.3734
3.2119	25.9476	89000	3.5402	0.3735
3.1569	26.2391	90000	3.5583	0.3727
3.1936	26.5306	91000	3.5492	0.3728
3.1956	26.8222	92000	3.5419	0.3735

Framework versions

Transformers 4.55.2
Pytorch 2.8.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 2

Safetensors

Model size

0.1B params

Tensor type

F32