babylm-base2.5m-gpt2

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 195
training_steps: 19500
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss	Accuracy
5.8389	0.1065	200	4.8806	0.3144
4.8812	0.2130	400	4.4667	0.3240
4.707	0.3195	600	4.2884	0.3341
4.479	0.4260	800	4.2022	0.3348
4.2839	0.5325	1000	4.1496	0.3363
4.1679	0.6390	1200	4.1096	0.3369
4.125	0.7455	1400	4.0623	0.3393
3.9613	0.8520	1600	4.0172	0.3394
3.9268	0.9585	1800	4.0097	0.3367
3.8225	1.0650	2000	3.9714	0.3417
2.6933	2.1299	4000	3.6543	0.3824
2.3328	3.1949	6000	3.5196	0.3985
2.1778	4.2599	8000	3.4668	0.4016
2.0886	5.3248	10000	3.4159	0.4073
1.9842	6.3898	12000	3.3836	0.4104
1.9168	7.4547	14000	3.3775	0.4112
1.8811	8.5197	16000	3.3978	0.4121
1.8392	9.5847	18000	3.3874	0.4125

Safetensors

Model size

98.4M params

Tensor type

F32