speechbrain
Uzbek

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

SpeechBrain CommonVoice ASR Model

This README provides an overview of the Automatic Speech Recognition (ASR) model trained using the SpeechBrain CommonVoice ASR transformers recipe. The model is saved under the checkpoint directory CKPT+2025-02-06+01-41-55+00.

Model Overview

  • Training Dataset: CommonVoice
  • Model Architecture: Transformers
  • Training Framework: SpeechBrain
  • Epochs Trained: 50
  • Final Training Step: 14650

Training Details

The model was trained over 50 epochs with the following key metrics:

  • Final Training Loss: 11.06
  • Final Validation Loss: 3.69
  • Final Validation Accuracy: 96.5%
  • Final Validation WER (Word Error Rate): 7.72
  • Final Validation CER (Character Error Rate): 1.87

Note: Only a subset of epochs is shown for brevity. Please refer to the full training log for detailed information.

Learning Rate Schedule

The learning rate started at 1.24e-04 and was adjusted throughout the training process, reaching 2.96e-04 by the final epoch.

Testing Results

After training, the model was evaluated on a test set with the following results:

  • Test Loss: 3.34
  • Test Accuracy: 96.5%
  • Test WER: 7.29
  • Test CER: 1.71

Checkpoints

The model checkpoints are stored in the directory save/CKPT+2025-02-06+01-41-55+00/. Key files include:

  • CKPT.yaml: Contains metadata about the training process.
  • brain.ckpt: Stores the average training loss and optimizer state.
  • counter.ckpt: Indicates the number of epochs completed.
  • dataloader-TRAIN.ckpt: Stores the state of the data loader.

Usage

To use this model for inference, load the checkpoint files and initialize the SpeechBrain ASR pipeline. Ensure that the environment is set up with the necessary dependencies as specified by SpeechBrain.

Acknowledgments

This model was developed using the SpeechBrain toolkit, which provides a comprehensive framework for speech processing tasks.

License

This project is licensed under the terms of the MIT License.

Epoch Logs

Epoch Learning Rate Steps Train Loss Valid Loss Valid ACC Valid WER Valid CER
1 1.24e-04 310 1.07e+02 84.10 1.69e-01
2 2.48e-04 620 79.84 75.19 2.61e-01
3 3.72e-04 930 68.47 54.76 4.20e-01
4 4.96e-04 1240 41.90 25.68 7.56e-01
5 6.20e-04 1550 21.16 14.90 8.68e-01
6 7.44e-04 1860 14.25 11.19 8.99e-01
7 7.68e-04 2170 10.92 8.94 9.16e-01
8 7.19e-04 2480 8.66 7.69 9.27e-01
9 6.77e-04 2790 6.97 6.75 9.34e-01
10 6.43e-04 3100 6.11 6.17 9.40e-01 13.19 3.43
11 6.13e-04 3410 5.11 5.76 9.44e-01
12 5.87e-04 3720 4.40 5.56 9.46e-01
13 5.64e-04 4030 3.94 5.38 9.47e-01
14 5.43e-04 4340 3.45 5.24 9.48e-01
15 5.25e-04 4650 3.14 5.26 9.48e-01
16 5.08e-04 4960 2.89 5.16 9.49e-01
17 4.93e-04 5270 2.62 5.10 9.50e-01
18 4.79e-04 5580 2.36 5.14 9.50e-01
19 4.66e-04 5890 2.16 5.12 9.50e-01
20 4.54e-04 6200 2.05 5.12 9.51e-01 10.57 2.63
21 4.43e-04 6510 1.88 5.15 9.50e-01
22 4.33e-04 6820 1.75 5.15 9.51e-01
23 4.24e-04 7130 1.68 5.22 9.51e-01
24 4.15e-04 7440 1.54 5.21 9.51e-01
25 4.06e-04 7750 1.48 5.29 9.51e-01
26 3.99e-04 8060 7.68 5.26 9.48e-01
27 3.91e-04 8370 21.69 4.80 9.53e-01
28 3.84e-04 8680 18.35 4.50 9.56e-01
29 3.77e-04 8990 16.80 4.47 9.56e-01
30 3.71e-04 9300 15.83 4.36 9.57e-01 9.43 2.34
31 3.65e-04 9610 15.09 4.28 9.58e-01
32 3.59e-04 9920 14.79 4.19 9.59e-01
33 3.58e-04 10000 14.69 4.25 9.59e-01
34 8.00e-04 10000 0.00e+00 4.25 9.59e-01
35 8.00e-04 10000 0.00e+00 4.25 9.59e-01
36 3.52e-04 10310 14.02 4.16 9.60e-01
37 3.47e-04 10620 13.78 4.09 9.60e-01
38 3.42e-04 10930 13.37 4.02 9.61e-01
39 3.37e-04 11240 13.07 3.98 9.62e-01
40 3.33e-04 11550 12.90 3.94 9.62e-01 8.40 2.06
41 3.29e-04 11860 12.61 3.90 9.62e-01
42 3.24e-04 12170 12.41 3.85 9.63e-01
43 3.20e-04 12480 12.05 3.86 9.63e-01
44 3.16e-04 12790 11.93 3.79 9.63e-01
45 3.13e-04 13100 11.59 3.77 9.64e-01
46 3.09e-04 13410 11.48 3.74 9.64e-01
47 3.05e-04 13720 11.35 3.74 9.64e-01
48 3.02e-04 14030 11.14 3.70 9.64e-01
49 2.99e-04 14340 11.05 3.73 9.64e-01
50 2.96e-04 14650 11.06 3.69 9.65e-01 7.72 1.87
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train openbank-uz/commonvoice-extended-uzbek-transformers