wav2vec2-base-en-phoneme-ctc-41h
This model is a fine-tuned version of facebook/wav2vec2-base on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.2694
- Per: 0.1045
- Phoneme Accuracy: 0.8955
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 48
- eval_batch_size: 48
- seed: 42
- optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 300
- num_epochs: 50
- mixed_precision_training: Native AMP
Training results
| Training Loss | Epoch | Step | Validation Loss | Per | Phoneme Accuracy |
|---|---|---|---|---|---|
| No log | 1.0 | 398 | 3.8986 | 0.9980 | 0.0020 |
| 6.7922 | 2.0 | 796 | 3.5634 | 0.9980 | 0.0020 |
| 3.5948 | 3.0 | 1194 | 3.5295 | 0.9981 | 0.0019 |
| 3.5277 | 4.0 | 1592 | 3.4807 | 0.9981 | 0.0019 |
| 3.5277 | 5.0 | 1990 | 1.8312 | 0.4890 | 0.5110 |
| 2.9905 | 6.0 | 2388 | 0.8498 | 0.2052 | 0.7948 |
| 1.2882 | 7.0 | 2786 | 0.6013 | 0.1659 | 0.8341 |
| 0.7483 | 8.0 | 3184 | 0.5015 | 0.1492 | 0.8508 |
| 0.5721 | 9.0 | 3582 | 0.4318 | 0.1383 | 0.8617 |
| 0.5721 | 10.0 | 3980 | 0.3960 | 0.1314 | 0.8686 |
| 0.4794 | 11.0 | 4378 | 0.3599 | 0.1275 | 0.8725 |
| 0.4192 | 12.0 | 4776 | 0.3427 | 0.1232 | 0.8768 |
| 0.3798 | 13.0 | 5174 | 0.3269 | 0.1207 | 0.8793 |
| 0.3458 | 14.0 | 5572 | 0.3206 | 0.1185 | 0.8815 |
| 0.3458 | 15.0 | 5970 | 0.3014 | 0.1163 | 0.8837 |
| 0.323 | 16.0 | 6368 | 0.2953 | 0.1151 | 0.8849 |
| 0.3044 | 17.0 | 6766 | 0.2860 | 0.1136 | 0.8864 |
| 0.2873 | 18.0 | 7164 | 0.2822 | 0.1120 | 0.8880 |
| 0.2748 | 19.0 | 7562 | 0.2821 | 0.1116 | 0.8884 |
| 0.2748 | 20.0 | 7960 | 0.2734 | 0.1104 | 0.8896 |
| 0.26 | 21.0 | 8358 | 0.2681 | 0.1094 | 0.8906 |
| 0.2508 | 22.0 | 8756 | 0.2735 | 0.1092 | 0.8908 |
| 0.2411 | 23.0 | 9154 | 0.2719 | 0.1090 | 0.8910 |
| 0.2335 | 24.0 | 9552 | 0.2719 | 0.1086 | 0.8914 |
| 0.2335 | 25.0 | 9950 | 0.2642 | 0.1081 | 0.8919 |
| 0.226 | 26.0 | 10348 | 0.2668 | 0.1079 | 0.8921 |
| 0.2199 | 27.0 | 10746 | 0.2577 | 0.1072 | 0.8928 |
| 0.2147 | 28.0 | 11144 | 0.2662 | 0.1076 | 0.8924 |
| 0.2084 | 29.0 | 11542 | 0.2638 | 0.1073 | 0.8927 |
| 0.2084 | 30.0 | 11940 | 0.2631 | 0.1068 | 0.8932 |
| 0.2032 | 31.0 | 12338 | 0.2658 | 0.1064 | 0.8936 |
| 0.1983 | 32.0 | 12736 | 0.2715 | 0.1069 | 0.8931 |
| 0.1981 | 33.0 | 13134 | 0.2639 | 0.1062 | 0.8938 |
| 0.1918 | 34.0 | 13532 | 0.2673 | 0.1061 | 0.8939 |
| 0.1918 | 35.0 | 13930 | 0.2585 | 0.1055 | 0.8945 |
| 0.187 | 36.0 | 14328 | 0.2705 | 0.1059 | 0.8941 |
| 0.1838 | 37.0 | 14726 | 0.2722 | 0.1060 | 0.8940 |
| 0.183 | 38.0 | 15124 | 0.2633 | 0.1053 | 0.8947 |
| 0.1804 | 39.0 | 15522 | 0.2724 | 0.1055 | 0.8945 |
| 0.1804 | 40.0 | 15920 | 0.2612 | 0.1048 | 0.8952 |
| 0.1786 | 41.0 | 16318 | 0.2659 | 0.1049 | 0.8951 |
| 0.1755 | 42.0 | 16716 | 0.2691 | 0.1049 | 0.8951 |
| 0.1746 | 43.0 | 17114 | 0.2664 | 0.1047 | 0.8953 |
| 0.1743 | 44.0 | 17512 | 0.2689 | 0.1047 | 0.8953 |
| 0.1743 | 45.0 | 17910 | 0.2651 | 0.1045 | 0.8955 |
| 0.1714 | 46.0 | 18308 | 0.2705 | 0.1047 | 0.8953 |
| 0.1682 | 47.0 | 18706 | 0.2667 | 0.1044 | 0.8956 |
| 0.1707 | 48.0 | 19104 | 0.2710 | 0.1046 | 0.8954 |
| 0.1674 | 49.0 | 19502 | 0.2687 | 0.1045 | 0.8955 |
| 0.1674 | 50.0 | 19900 | 0.2694 | 0.1045 | 0.8955 |
Framework versions
- Transformers 4.57.1
- Pytorch 2.9.0+cu128
- Datasets 4.4.1
- Tokenizers 0.22.1
- Downloads last month
- 61
Model tree for bobboyms/wav2vec2-base-en-phoneme-ctc-41h
Base model
facebook/wav2vec2-base