ssc-qxp-mms-model-mix-adapt-max-lowlr

This model is a fine-tuned version of facebook/mms-1b-all on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 8
eval_batch_size: 6
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 16
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 100
num_epochs: 30
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss	Cer	Wer
0.3816	0.9975	200	0.2246	0.1189	0.6397
0.1851	1.9925	400	0.1731	0.1049	0.5763
0.1762	2.9875	600	0.1681	0.1034	0.5689
0.1644	3.9825	800	0.1574	0.1025	0.5643
0.1541	4.9776	1000	0.1578	0.1020	0.5570
0.1467	5.9726	1200	0.1587	0.1012	0.5533
0.14	6.9676	1400	0.1569	0.1017	0.5579
0.1453	7.9626	1600	0.1616	0.1003	0.5432
0.1323	8.9576	1800	0.1529	0.1006	0.5469
0.1212	9.9526	2000	0.1558	0.1006	0.5441
0.126	10.9476	2200	0.1511	0.0997	0.5450
0.1249	11.9426	2400	0.1468	0.0991	0.5368
0.1197	12.9377	2600	0.1455	0.0980	0.5276
0.1183	13.9327	2800	0.1493	0.0982	0.5340
0.1143	14.9277	3000	0.1456	0.0975	0.5368
0.1104	15.9227	3200	0.1550	0.0973	0.5239
0.1018	16.9177	3400	0.1537	0.0970	0.5267
0.1069	17.9127	3600	0.1488	0.0976	0.5294
0.1041	18.9077	3800	0.1448	0.0933	0.5156
0.0985	19.9027	4000	0.1536	0.0962	0.5294
0.0969	20.8978	4200	0.1484	0.0933	0.5156
0.0935	21.8928	4400	0.1495	0.0957	0.5267
0.0903	22.8878	4600	0.1503	0.0962	0.5285
0.0914	23.8828	4800	0.1514	0.0959	0.5276
0.0876	24.8778	5000	0.1485	0.0938	0.5165
0.0927	25.8728	5200	0.1469	0.0940	0.5165
0.0892	26.8678	5400	0.1454	0.0929	0.5165
0.082	27.8628	5600	0.1470	0.0926	0.5119
0.0828	28.8579	5800	0.1485	0.0937	0.5175
0.0867	29.8529	6000	0.1488	0.0937	0.5184

Safetensors

Model size

1.0B params

Tensor type

F32

Base model

Finetuned

(382)

this model