mms-1b-all-bemgen-m50f50-ft-sd-dat-gdro-fusion-42

This model is a fine-tuned version of facebook/mms-1b-all on the BEMGEN - DRO dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.00046
train_batch_size: 8
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 16
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 100
num_epochs: 10.0
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss	Cer
591.6625	0.5076	100	350.9228	0.8478
329.8372	1.0152	200	294.0757	0.8290
292.9459	1.5228	300	188.8121	0.9626
54.7296	2.0305	400	15.7224	0.0665
38.1779	2.5381	500	14.6667	0.0649
25.4896	3.0457	600	14.4596	0.0633
29.7732	3.5533	700	13.9807	0.0624
23.3068	4.0609	800	13.9858	0.0619
31.4498	4.5685	900	13.8847	0.0620
18.1676	5.0761	1000	14.0275	0.0622
20.9698	5.5838	1100	13.5899	0.0607
20.05	6.0914	1200	13.6126	0.0600
27.9095	6.5990	1300	13.9782	0.0595
26.9286	7.1066	1400	13.8477	0.0596

Safetensors

Model size

1.0B params

Tensor type

F32

Base model

Finetuned

(382)

this model