train_qqp_42_1773148416

This model is a fine-tuned version of meta-llama/Llama-3.2-1B-Instruct on the qqp dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 5

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.1526	0.2500	10234	0.1407	6910656
0.0879	0.5000	20468	0.1251	13780928
0.0715	0.7501	30702	0.1135	20680640
0.0755	1.0001	40936	0.1096	27591776
0.097	1.2501	51170	0.1019	34492320
0.1527	1.5001	61404	0.1018	41393504
0.1193	1.7501	71638	0.0982	48287456
0.0554	2.0001	81872	0.0984	55178600
0.1809	2.2502	92106	0.0983	62093992
0.0284	2.5002	102340	0.0950	68988456
0.1156	2.7502	112574	0.0975	75874280
0.0921	3.0002	122808	0.0964	82772304
0.0862	3.2502	133042	0.1004	89675984
0.1293	3.5003	143276	0.0979	96560720
0.0988	3.7503	153510	0.0996	103465808
0.0618	4.0003	163744	0.0993	110357352
0.0892	4.2503	173978	0.1003	117230952
0.0825	4.5003	184212	0.1023	124100264
0.0962	4.7503	194446	0.1016	131030440

Base model

Adapter

(600)

this model