wav2vec2-xls-r-300m-es-phoneme-ctc-52h-noisy

This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 24
eval_batch_size: 24
seed: 42
optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.05
num_epochs: 20
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss	Per	Phoneme Accuracy
3.9466	0.4284	500	3.6801	1.0	0.0
3.2626	0.8569	1000	3.2697	1.0	0.0
2.5046	1.2853	1500	1.8232	0.4321	0.5679
0.607	1.7138	2000	0.4133	0.0802	0.9198
0.4624	2.1422	2500	0.3217	0.0676	0.9324
0.3912	2.5707	3000	0.3097	0.0636	0.9364
0.437	2.9991	3500	0.2971	0.0614	0.9386
0.3601	3.4276	4000	0.2757	0.0611	0.9389
0.3291	3.8560	4500	0.2746	0.0584	0.9416
0.3299	4.2845	5000	0.2674	0.0577	0.9423
0.3267	4.7129	5500	0.2655	0.0571	0.9429
0.2991	5.1414	6000	0.2581	0.0575	0.9425
0.3073	5.5698	6500	0.2662	0.0567	0.9433
0.3061	5.9983	7000	0.2522	0.0553	0.9447
0.2702	6.4267	7500	0.2562	0.0552	0.9448
0.3034	6.8552	8000	0.2560	0.0552	0.9448
0.2583	7.2836	8500	0.2499	0.0546	0.9454
0.2894	7.7121	9000	0.2420	0.0557	0.9443
0.2599	8.1405	9500	0.2508	0.0549	0.9451
0.2858	8.5690	10000	0.2487	0.0548	0.9452
0.2458	8.9974	10500	0.2429	0.0556	0.9444
0.247	9.4259	11000	0.2469	0.0549	0.9451

Safetensors

Model size

0.3B params

Tensor type

F32

Base model

Finetuned

(832)

this model