wav2vec2-base-en-phoneme-ctc-41h

This model is a fine-tuned version of facebook/wav2vec2-base on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.2694
Per: 0.1045
Phoneme Accuracy: 0.8955

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 48
eval_batch_size: 48
seed: 42
optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 300
num_epochs: 50
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Per	Phoneme Accuracy
No log	1.0	398	3.8986	0.9980	0.0020
6.7922	2.0	796	3.5634	0.9980	0.0020
3.5948	3.0	1194	3.5295	0.9981	0.0019
3.5277	4.0	1592	3.4807	0.9981	0.0019
3.5277	5.0	1990	1.8312	0.4890	0.5110
2.9905	6.0	2388	0.8498	0.2052	0.7948
1.2882	7.0	2786	0.6013	0.1659	0.8341
0.7483	8.0	3184	0.5015	0.1492	0.8508
0.5721	9.0	3582	0.4318	0.1383	0.8617
0.5721	10.0	3980	0.3960	0.1314	0.8686
0.4794	11.0	4378	0.3599	0.1275	0.8725
0.4192	12.0	4776	0.3427	0.1232	0.8768
0.3798	13.0	5174	0.3269	0.1207	0.8793
0.3458	14.0	5572	0.3206	0.1185	0.8815
0.3458	15.0	5970	0.3014	0.1163	0.8837
0.323	16.0	6368	0.2953	0.1151	0.8849
0.3044	17.0	6766	0.2860	0.1136	0.8864
0.2873	18.0	7164	0.2822	0.1120	0.8880
0.2748	19.0	7562	0.2821	0.1116	0.8884
0.2748	20.0	7960	0.2734	0.1104	0.8896
0.26	21.0	8358	0.2681	0.1094	0.8906
0.2508	22.0	8756	0.2735	0.1092	0.8908
0.2411	23.0	9154	0.2719	0.1090	0.8910
0.2335	24.0	9552	0.2719	0.1086	0.8914
0.2335	25.0	9950	0.2642	0.1081	0.8919
0.226	26.0	10348	0.2668	0.1079	0.8921
0.2199	27.0	10746	0.2577	0.1072	0.8928
0.2147	28.0	11144	0.2662	0.1076	0.8924
0.2084	29.0	11542	0.2638	0.1073	0.8927
0.2084	30.0	11940	0.2631	0.1068	0.8932
0.2032	31.0	12338	0.2658	0.1064	0.8936
0.1983	32.0	12736	0.2715	0.1069	0.8931
0.1981	33.0	13134	0.2639	0.1062	0.8938
0.1918	34.0	13532	0.2673	0.1061	0.8939
0.1918	35.0	13930	0.2585	0.1055	0.8945
0.187	36.0	14328	0.2705	0.1059	0.8941
0.1838	37.0	14726	0.2722	0.1060	0.8940
0.183	38.0	15124	0.2633	0.1053	0.8947
0.1804	39.0	15522	0.2724	0.1055	0.8945
0.1804	40.0	15920	0.2612	0.1048	0.8952
0.1786	41.0	16318	0.2659	0.1049	0.8951
0.1755	42.0	16716	0.2691	0.1049	0.8951
0.1746	43.0	17114	0.2664	0.1047	0.8953
0.1743	44.0	17512	0.2689	0.1047	0.8953
0.1743	45.0	17910	0.2651	0.1045	0.8955
0.1714	46.0	18308	0.2705	0.1047	0.8953
0.1682	47.0	18706	0.2667	0.1044	0.8956
0.1707	48.0	19104	0.2710	0.1046	0.8954
0.1674	49.0	19502	0.2687	0.1045	0.8955
0.1674	50.0	19900	0.2694	0.1045	0.8955

Framework versions

Transformers 4.57.1
Pytorch 2.9.0+cu128
Datasets 4.4.1
Tokenizers 0.22.1

Downloads last month: 61

Safetensors

Model size

94.5M params

Tensor type

F32

Model tree for bobboyms/wav2vec2-base-en-phoneme-ctc-41h

Base model

facebook/wav2vec2-base

Finetuned

(947)

this model