whisper-large-v3-ft-cv-cy-en

This model is a fine-tuned version of openai/whisper-large-v3 on the techiaith/commonvoice_18_0_cy_en dataset. Both the English and Welsh data have been used to fine-tune the whisper model for transcribing both languages as well as improved language detection.

It achieves a success rate of 98.86% for language detection on recordings from a Common Voice bilingual test set

While, it achieves the following WER results for transcribing using the same test set:

Welsh: 26.20
English: 15.37
Average: 20.70

N.B. the desired transcript language is not given to the fine-tuned model during testing.

Usage

from transformers import pipeline

transcriber = pipeline("automatic-speech-recognition", model="techiaith/whisper-large-v3-ft-cv-cy-en")
result = transcriber(<path or url to soundfile>)
print (result)

{'text': 'Mae hen wlad fy nhadau yn annwyl i mi.'}

Downloads last month: 18

Safetensors

Model size

2B params

Tensor type

F32

Model tree for techiaith/whisper-large-v3-ft-commonvoice-cy-en

Base model

openai/whisper-large-v3

Finetuned

(813)

this model

techiaith
/

whisper-large-v3-ft-commonvoice-cy-en

whisper-large-v3-ft-cv-cy-en

Usage

Model tree for techiaith/whisper-large-v3-ft-commonvoice-cy-en

Dataset used to train techiaith/whisper-large-v3-ft-commonvoice-cy-en