ssc-meh-mms-model-mix-adapt-max-lowlr

This model is a fine-tuned version of facebook/mms-1b-all on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 8
eval_batch_size: 6
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 16
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 100
num_epochs: 10
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss	Cer	Wer
0.9811	0.3289	200	0.7751	0.2017	0.5926
0.7561	0.6579	400	0.7307	0.1919	0.5439
0.7058	0.9868	600	0.7007	0.1838	0.5200
0.6423	1.3158	800	0.6787	0.1802	0.4994
0.6639	1.6447	1000	0.6731	0.1793	0.4953
0.6096	1.9737	1200	0.6722	0.1784	0.4929
0.6177	2.3026	1400	0.6858	0.1776	0.4847
0.5689	2.6316	1600	0.6642	0.1767	0.4848
0.5697	2.9605	1800	0.6654	0.1742	0.4752
0.5563	3.2895	2000	0.6636	0.1739	0.4715
0.5722	3.6184	2200	0.6772	0.1736	0.4729
0.5781	3.9474	2400	0.6745	0.1733	0.4723
0.5316	4.2763	2600	0.6741	0.1750	0.4801
0.5538	4.6053	2800	0.6747	0.1741	0.4688
0.562	4.9342	3000	0.6701	0.1734	0.4706
0.5555	5.2632	3200	0.6770	0.1743	0.4780
0.5361	5.5921	3400	0.6752	0.1743	0.4782
0.5254	5.9211	3600	0.6836	0.1754	0.4792
0.5095	6.25	3800	0.6823	0.1748	0.4770
0.5482	6.5789	4000	0.6768	0.1736	0.4721
0.518	6.9079	4200	0.6786	0.1730	0.4689
0.4757	7.2368	4400	0.6978	0.1750	0.4803
0.5063	7.5658	4600	0.6799	0.1728	0.4717
0.4943	7.8947	4800	0.6860	0.1737	0.4757
0.4962	8.2237	5000	0.6865	0.1735	0.4752
0.4943	8.5526	5200	0.6903	0.1739	0.4760
0.5035	8.8816	5400	0.6983	0.1752	0.4795
0.4842	9.2105	5600	0.6862	0.1731	0.4682
0.4704	9.5395	5800	0.6897	0.1733	0.4719
0.4939	9.8684	6000	0.6881	0.1731	0.4714

Safetensors

Model size

1.0B params

Tensor type

F32

Base model

Finetuned

(382)

this model