whisper-small-canario_fono

This model is a fine-tuned version of openai/whisper-small on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 1.4465
  • Wer: 104.4996

Model description

The dataset used for this model is derived from the Islas Canarias portion of the coser dataset corpus https://huggingface.co/datasets/johnatanebonilla/coser

This model is intended for experimental purposes to explore the feasibility of using automatic speech recognition (ASR) systems, such as Whisper, to perform phonological transcription. It is not meant for production use but rather as a research tool to investigate the potential of ASR for phonological transcription tasks.

Limitations of this model include the fact that the time intervals in the COSER corpus are not systematically aligned, meaning that there may not be a perfect one-to-one correspondence between the audio and text data. This lack of alignment can introduce errors and inconsistencies in the transcriptions and limit the model's accuracy.

One significant limitation is the size of the dataset. It appears to be relatively small, and its impact on the model's performance may be limited due to the inherent challenges of training robust ASR systems with limited data.

Furthermore, despite efforts to curate the dataset and provide clean phonological transcriptions, it seems that the dataset size and quality may not significantly contribute to the model's overall performance.

Training and evaluation data

For training and evaluation, a split of 80% training data and 10% validation data was used, with both of these portions combined for training purposes.

The remaining 10% of the data was exclusively reserved for testing the model's performance.

This approach combines the initial 80% training data and the 10% validation data for model training and fine-tuning, while the test data remains separate to assess the model's generalization and performance on previously unseen data.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • training_steps: 4000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer
0.1266 5.38 1000 0.9951 97.9842
0.0371 10.75 2000 1.2437 109.7012
0.0197 16.13 3000 1.3983 121.5263
0.013 21.51 4000 1.4465 104.4996

Framework versions

  • Transformers 4.36.2
  • Pytorch 2.1.0+cu121
  • Datasets 2.16.0
  • Tokenizers 0.15.0
Downloads last month
1
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for johnatanebonilla/whisper-small-canario_fono

Finetuned
(3443)
this model