speechbrain
Uzbek

Transducer ASR Model for Speech Recognition

This is an ASR (Automatic Speech Recognition) model trained using the SpeechBrain framework. The model is designed as a transducer-based speech-to-text system, leveraging the Common Voice dataset and implemented with PyTorch.

Model Overview • Architecture: Transducer-based ASR model • Framework: SpeechBrain • Dataset: Mozilla Common Voice • Backend: PyTorch • Optimizer: AdamW • Learning Rate Scheduler: Custom LR decay • Training Steps: Progressive training over multiple epochs • Evaluation Metrics: Character Error Rate (CER) & Word Error Rate (WER)

Training Details • Dataset: Preprocessed Common Voice dataset • Augmentations: SpecAugment, time masking, frequency masking • Optimizer: AdamW with weight decay • Loss Function: Transducer loss • Batch Size: Adaptive batch sizing • Validation Strategy: Evaluated on validation split, tuned for CER and WER

Performance • Achieves competitive CER and WER on the Common Voice dataset • Optimized for real-time transcription with efficient decoding

FINAL WER - 11.6 FINAL CER - 2.8

Train logs:

Epoch Learning Rate Steps Optimizer Train Loss Valid Loss Valid CER Valid WER
1 9.36e-05 235 AdamW 2.07e+02 1.01 1.00e+02 1.00e+02
2 1.88e-04 470 AdamW 47.98 9.77e-01 100.00 1.00e+02
3 2.82e-04 705 AdamW 46.68 9.43e-01 100.00 1.00e+02
4 3.76e-04 940 AdamW 43.74 8.12e-01 87.40 98.27
5 4.70e-04 1175 AdamW 37.11 6.30e-01 70.35 93.21
6 5.64e-04 1410 AdamW 30.48 4.62e-01 52.13 84.80
7 6.58e-04 1645 AdamW 24.63 3.45e-01 40.06 75.67
8 7.52e-04 1880 AdamW 21.18 2.65e-01 30.52 65.92
9 7.97e-04 2115 AdamW 17.85 1.97e-01 27.03 58.36
10 7.91e-04 2350 AdamW 15.96 1.61e-01 20.28 50.84
11 7.84e-04 2585 AdamW 1.90e-01 1.60e-01 20.13 48.33
12 7.76e-04 2895 AdamW 2.45e-01 1.43e-01 22.89 48.99
13 7.68e-04 3205 AdamW 2.12e-01 1.23e-01 15.39 40.24
14 7.60e-04 3515 AdamW 1.93e-01 1.11e-01 14.20 37.37
15 7.52e-04 3825 AdamW 1.81e-01 9.70e-02 12.02 33.93
16 7.44e-04 4135 AdamW 1.68e-01 9.04e-02 12.19 32.99
17 7.36e-04 4445 AdamW 1.57e-01 8.13e-02 10.04 29.40
18 7.28e-04 4755 AdamW 1.50e-01 7.54e-02 8.75 27.30
19 7.21e-04 5065 AdamW 1.44e-01 7.02e-02 7.77 25.21
20 7.13e-04 5375 AdamW 1.38e-01 6.60e-02 7.21 24.01
21 7.06e-04 5685 AdamW 1.35e-01 6.43e-02 6.58 23.12
22 6.98e-04 5995 AdamW 1.29e-01 6.08e-02 6.65 22.36
23 6.91e-04 6305 AdamW 1.26e-01 5.85e-02 6.58 22.03
24 6.84e-04 6615 AdamW 1.23e-01 5.64e-02 5.92 20.81
25 6.77e-04 6925 AdamW 1.19e-01 5.41e-02 5.44 19.81
26 6.69e-04 7235 AdamW 1.16e-01 5.39e-02 5.66 20.01
27 6.62e-04 7545 AdamW 1.15e-01 5.18e-02 5.51 19.41
28 6.55e-04 7855 AdamW 1.12e-01 4.99e-02 5.08 18.58
29 6.49e-04 8165 AdamW 1.09e-01 4.80e-02 4.75 17.81
30 6.42e-04 8475 AdamW 1.06e-01 4.78e-02 4.73 17.67
31 6.35e-04 8785 AdamW 1.05e-01 4.72e-02 5.00 17.81
32 6.28e-04 9095 AdamW 1.04e-01 4.58e-02 4.64 17.31
33 6.22e-04 9405 AdamW 1.02e-01 4.50e-02 4.47 16.95
34 6.15e-04 9715 AdamW 1.01e-01 4.44e-02 4.65 16.79
35 6.09e-04 10025 AdamW 9.94e-02 4.37e-02 4.50 16.49
36 6.02e-04 10335 AdamW 9.78e-02 4.29e-02 4.37 16.32
37 5.96e-04 10645 AdamW 9.65e-02 4.18e-02 4.05 15.72
38 5.90e-04 10955 AdamW 9.54e-02 4.12e-02 4.01 15.47
39 5.84e-04 11265 AdamW 9.53e-02 4.10e-02 4.08 15.47
40 5.77e-04 11575 AdamW 9.28e-02 4.03e-02 4.00 15.21
41 5.71e-04 11885 AdamW 9.20e-02 3.92e-02 3.82 14.84
42 5.65e-04 12195 AdamW 9.02e-02 3.92e-02 3.88 14.93
43 5.59e-04 12505 AdamW 8.91e-02 3.85e-02 3.73 14.57
44 5.54e-04 12815 AdamW 8.88e-02 3.83e-02 3.66 14.44
45 5.48e-04 13125 AdamW 8.86e-02 3.77e-02 3.57 14.11
46 5.42e-04 13435 AdamW 8.67e-02 3.79e-02 3.61 14.21
47 5.36e-04 13745 AdamW 8.70e-02 3.70e-02 3.64 13.99
48 5.31e-04 14055 AdamW 8.51e-02 3.71e-02 3.50 13.97
49 5.25e-04 14365 AdamW 8.48e-02 3.65e-02 3.41 13.75
50 5.20e-04 14675 AdamW 8.45e-02 3.62e-02 3.32 13.63
51 5.14e-04 14985 AdamW 8.36e-02 3.58e-02 3.31 13.40
52 5.09e-04 15295 AdamW 8.28e-02 3.58e-02 3.28 13.39
53 5.03e-04 15605 AdamW 8.23e-02 3.52e-02 3.34 13.43
54 4.98e-04 15915 AdamW 8.18e-02 3.52e-02 3.28 13.23
55 4.93e-04 16225 AdamW 8.14e-02 3.47e-02 3.22 13.07
56 4.88e-04 16535 AdamW 8.10e-02 3.47e-02 3.25 13.14
57 4.83e-04 16845 AdamW 7.97e-02 3.43e-02 3.23 12.95
58 4.78e-04 17155 AdamW 7.88e-02 3.43e-02 3.13 12.92
59 4.73e-04 17465 AdamW 7.78e-02 3.37e-02 3.04 12.59
60 4.68e-04 17775 AdamW 7.75e-02 3.36e-02 3.11 12.74
61 4.63e-04 18085 AdamW 7.95e-02 3.34e-02 3.06 12.58
62 4.58e-04 18395 AdamW 7.79e-02 3.34e-02 3.11 12.69
63 4.53e-04 18705 AdamW 7.78e-02 3.31e-02 3.10 12.57
64 4.48e-04 19015 AdamW 7.67e-02 3.31e-02 3.01 12.41
65 4.44e-04 19325 AdamW 7.61e-02 3.26e-02 3.02 12.31
66 4.39e-04 19635 AdamW 7.62e-02 3.24e-02 2.99 12.30
67 4.34e-04 19945 AdamW 7.62e-02 3.22e-02 2.94 12.17
68 4.30e-04 20255 AdamW 7.44e-02 3.21e-02 2.86 12.03
69 4.25e-04 20565 AdamW 7.48e-02 3.23e-02 2.95 12.17
70 4.21e-04 20875 AdamW 7.40e-02 3.22e-02 2.93 12.09
71 4.16e-04 21185 AdamW 7.47e-02 3.17e-02 2.87 11.95
72 4.12e-04 21495 AdamW 7.29e-02 3.18e-02 2.86 11.87
73 4.08e-04 21805 AdamW 7.24e-02 3.13e-02 2.83 11.86
74 4.03e-04 22115 AdamW 7.20e-02 3.12e-02 2.80 11.67
Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train openbank-uz/uzbek_asr_commonvoice_extended_transducer