wav2vec2-base-en-phoneme-ctc-41h

This model is a fine-tuned version of facebook/wav2vec2-base on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2694
  • Per: 0.1045
  • Phoneme Accuracy: 0.8955

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 48
  • eval_batch_size: 48
  • seed: 42
  • optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 300
  • num_epochs: 50
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Per Phoneme Accuracy
No log 1.0 398 3.8986 0.9980 0.0020
6.7922 2.0 796 3.5634 0.9980 0.0020
3.5948 3.0 1194 3.5295 0.9981 0.0019
3.5277 4.0 1592 3.4807 0.9981 0.0019
3.5277 5.0 1990 1.8312 0.4890 0.5110
2.9905 6.0 2388 0.8498 0.2052 0.7948
1.2882 7.0 2786 0.6013 0.1659 0.8341
0.7483 8.0 3184 0.5015 0.1492 0.8508
0.5721 9.0 3582 0.4318 0.1383 0.8617
0.5721 10.0 3980 0.3960 0.1314 0.8686
0.4794 11.0 4378 0.3599 0.1275 0.8725
0.4192 12.0 4776 0.3427 0.1232 0.8768
0.3798 13.0 5174 0.3269 0.1207 0.8793
0.3458 14.0 5572 0.3206 0.1185 0.8815
0.3458 15.0 5970 0.3014 0.1163 0.8837
0.323 16.0 6368 0.2953 0.1151 0.8849
0.3044 17.0 6766 0.2860 0.1136 0.8864
0.2873 18.0 7164 0.2822 0.1120 0.8880
0.2748 19.0 7562 0.2821 0.1116 0.8884
0.2748 20.0 7960 0.2734 0.1104 0.8896
0.26 21.0 8358 0.2681 0.1094 0.8906
0.2508 22.0 8756 0.2735 0.1092 0.8908
0.2411 23.0 9154 0.2719 0.1090 0.8910
0.2335 24.0 9552 0.2719 0.1086 0.8914
0.2335 25.0 9950 0.2642 0.1081 0.8919
0.226 26.0 10348 0.2668 0.1079 0.8921
0.2199 27.0 10746 0.2577 0.1072 0.8928
0.2147 28.0 11144 0.2662 0.1076 0.8924
0.2084 29.0 11542 0.2638 0.1073 0.8927
0.2084 30.0 11940 0.2631 0.1068 0.8932
0.2032 31.0 12338 0.2658 0.1064 0.8936
0.1983 32.0 12736 0.2715 0.1069 0.8931
0.1981 33.0 13134 0.2639 0.1062 0.8938
0.1918 34.0 13532 0.2673 0.1061 0.8939
0.1918 35.0 13930 0.2585 0.1055 0.8945
0.187 36.0 14328 0.2705 0.1059 0.8941
0.1838 37.0 14726 0.2722 0.1060 0.8940
0.183 38.0 15124 0.2633 0.1053 0.8947
0.1804 39.0 15522 0.2724 0.1055 0.8945
0.1804 40.0 15920 0.2612 0.1048 0.8952
0.1786 41.0 16318 0.2659 0.1049 0.8951
0.1755 42.0 16716 0.2691 0.1049 0.8951
0.1746 43.0 17114 0.2664 0.1047 0.8953
0.1743 44.0 17512 0.2689 0.1047 0.8953
0.1743 45.0 17910 0.2651 0.1045 0.8955
0.1714 46.0 18308 0.2705 0.1047 0.8953
0.1682 47.0 18706 0.2667 0.1044 0.8956
0.1707 48.0 19104 0.2710 0.1046 0.8954
0.1674 49.0 19502 0.2687 0.1045 0.8955
0.1674 50.0 19900 0.2694 0.1045 0.8955

Framework versions

  • Transformers 4.57.1
  • Pytorch 2.9.0+cu128
  • Datasets 4.4.1
  • Tokenizers 0.22.1
Downloads last month
61
Safetensors
Model size
94.5M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for bobboyms/wav2vec2-base-en-phoneme-ctc-41h

Finetuned
(947)
this model