exceptions_exp2_swap_0.7_cost_to_carry_1032

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 3.5668
Accuracy: 0.3683

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0006
train_batch_size: 16
eval_batch_size: 16
seed: 1032
gradient_accumulation_steps: 5
total_train_batch_size: 80
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.98) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 100
num_epochs: 50.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
4.8476	0.2917	1000	4.7704	0.2531
4.3586	0.5834	2000	4.3032	0.2973
4.1666	0.8750	3000	4.1102	0.3139
4.0111	1.1665	4000	4.0036	0.3236
3.9353	1.4582	5000	3.9267	0.3307
3.8919	1.7499	6000	3.8697	0.3353
3.7664	2.0414	7000	3.8294	0.3396
3.7555	2.3331	8000	3.7952	0.3426
3.7532	2.6248	9000	3.7653	0.3454
3.7363	2.9165	10000	3.7385	0.3478
3.6442	3.2080	11000	3.7261	0.3498
3.6719	3.4996	12000	3.7091	0.3517
3.6552	3.7913	13000	3.6895	0.3533
3.5477	4.0828	14000	3.6847	0.3545
3.5826	4.3745	15000	3.6736	0.3556
3.5933	4.6662	16000	3.6581	0.3570
3.5987	4.9579	17000	3.6458	0.3581
3.5159	5.2494	18000	3.6475	0.3584
3.538	5.5411	19000	3.6365	0.3594
3.5591	5.8327	20000	3.6266	0.3604
3.4535	6.1243	21000	3.6307	0.3609
3.4887	6.4159	22000	3.6199	0.3615
3.5022	6.7076	23000	3.6129	0.3624
3.5084	6.9993	24000	3.6039	0.3630
3.4444	7.2908	25000	3.6152	0.3630
3.4614	7.5825	26000	3.6013	0.3637
3.4665	7.8742	27000	3.5935	0.3645
3.3861	8.1657	28000	3.6032	0.3642
3.4301	8.4574	29000	3.5965	0.3647
3.4411	8.7490	30000	3.5880	0.3656
3.3468	9.0405	31000	3.5911	0.3654
3.3793	9.3322	32000	3.5911	0.3655
3.3902	9.6239	33000	3.5821	0.3664
3.4288	9.9156	34000	3.5749	0.3670
3.3526	10.2071	35000	3.5839	0.3663
3.3777	10.4988	36000	3.5795	0.3670
3.3919	10.7905	37000	3.5720	0.3674
3.3128	11.0820	38000	3.5788	0.3678
3.3438	11.3736	39000	3.5745	0.3679
3.3744	11.6653	40000	3.5668	0.3683
3.3903	11.9570	41000	3.5610	0.3687
3.3234	12.2485	42000	3.5744	0.3679
3.3537	12.5402	43000	3.5671	0.3689
3.3608	12.8319	44000	3.5584	0.3691
3.2864	13.1234	45000	3.5711	0.3690
3.3203	13.4151	46000	3.5685	0.3691
3.3386	13.7067	47000	3.5589	0.3695
3.3493	13.9984	48000	3.5532	0.3698
3.2957	14.2899	49000	3.5686	0.3692
3.324	14.5816	50000	3.5593	0.3700
3.3352	14.8733	51000	3.5509	0.3704
3.2545	15.1648	52000	3.5637	0.3699
3.2945	15.4565	53000	3.5593	0.3700
3.3145	15.7482	54000	3.5503	0.3708
3.2141	16.0397	55000	3.5652	0.3702
3.2724	16.3313	56000	3.5588	0.3705
3.2836	16.6230	57000	3.5547	0.3709
3.3071	16.9147	58000	3.5470	0.3712
3.2347	17.2062	59000	3.5634	0.3707
3.2711	17.4979	60000	3.5545	0.3708
3.2837	17.7896	61000	3.5496	0.3715
3.1966	18.0811	62000	3.5628	0.3707
3.2545	18.3728	63000	3.5549	0.3712
3.2719	18.6644	64000	3.5499	0.3716
3.295	18.9561	65000	3.5434	0.3722
3.2215	19.2476	66000	3.5616	0.3713
3.2532	19.5393	67000	3.5538	0.3719
3.2791	19.8310	68000	3.5398	0.3725
3.2003	20.1225	69000	3.5590	0.3715
3.2436	20.4142	70000	3.5555	0.3717
3.2602	20.7059	71000	3.5488	0.3720
3.2599	20.9975	72000	3.5401	0.3730
3.2168	21.2891	73000	3.5559	0.3718
3.2474	21.5807	74000	3.5491	0.3722
3.2502	21.8724	75000	3.5445	0.3726
3.182	22.1639	76000	3.5606	0.3716
3.2228	22.4556	77000	3.5518	0.3725
3.238	22.7473	78000	3.5419	0.3730
3.153	23.0388	79000	3.5572	0.3723
3.1786	23.3305	80000	3.5566	0.3723
3.2148	23.6222	81000	3.5493	0.3727
3.2348	23.9138	82000	3.5422	0.3729
3.1725	24.2053	83000	3.5591	0.3726
3.2067	24.4970	84000	3.5504	0.3726
3.2116	24.7887	85000	3.5430	0.3732
3.142	25.0802	86000	3.5547	0.3728
3.1892	25.3719	87000	3.5540	0.3724
3.1998	25.6636	88000	3.5468	0.3729

Framework versions

Transformers 4.55.2
Pytorch 2.8.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 2

Safetensors

Model size

0.1B params

Tensor type

F32