babylm-rta10m-roberta

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 180
training_steps: 18000

Training Loss	Epoch	Step	Validation Loss	Accuracy
6.948	0.0557	200	6.4894	0.1043
6.2515	0.1114	400	6.1707	0.1271
6.0599	0.1670	600	6.0575	0.1367
5.998	0.2227	800	5.9873	0.1420
5.9554	0.2784	1000	5.9518	0.1436
5.9364	0.3341	1200	5.9270	0.1455
5.8973	0.3898	1400	5.8893	0.1504
5.8565	0.4454	1600	5.8677	0.1506
5.8052	0.5011	1800	5.8562	0.1504
5.8149	0.5568	2000	5.8329	0.1525
5.6917	1.1136	4000	5.7430	0.1577
5.6493	1.6704	6000	5.7029	0.1606
5.5965	2.2272	8000	5.6527	0.1631
5.5581	2.7840	10000	5.6288	0.1626
5.5325	3.3408	12000	5.6148	0.1659
5.5038	3.8976	14000	5.5877	0.1673
5.4782	4.4543	16000	5.5696	0.1682
5.5144	5.0111	18000	5.5651	0.1676

Safetensors

Model size

98.6M params

Tensor type

F32