mms-1b-all-bemgen-m50f50-ft-sd-dat-gdro-fusion-52

This model is a fine-tuned version of facebook/mms-1b-all on the BEMGEN - DRO dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.00046
train_batch_size: 8
eval_batch_size: 4
seed: 52
gradient_accumulation_steps: 2
total_train_batch_size: 16
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 100
num_epochs: 10.0
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss	Cer
440.7094	0.5076	100	22.0664	0.0883
34.5279	1.0152	200	14.9101	0.0660
36.4239	1.5228	300	14.7350	0.0655
21.1869	2.0305	400	15.4756	0.0659
28.1324	2.5381	500	14.0653	0.0626
18.7226	3.0457	600	14.8762	0.0629
21.2218	3.5533	700	15.1562	0.0624
24.739	4.0609	800	13.8784	0.0611
18.6545	4.5685	900	14.1829	0.0614
29.4826	5.0761	1000	13.7120	0.0604
23.4713	5.5838	1100	13.9247	0.0600
18.8498	6.0914	1200	13.7105	0.0607
32.2245	6.5990	1300	13.4921	0.0588
17.9839	7.1066	1400	14.1172	0.0598
23.0116	7.6142	1500	13.9041	0.0589
24.9199	8.1218	1600	13.3263	0.0594
27.8795	8.6294	1700	13.1945	0.0593
16.8887	9.1371	1800	13.7624	0.0588
30.9928	9.6447	1900	13.8587	0.0582

Safetensors

Model size

1.0B params

Tensor type

F32

Base model

Finetuned

(382)

this model