Automatic Speech Recognition
Transformers
Safetensors
whisper

2025-07-23: CER:

Dataset Lang Split CER(in %)
Training yue validation 10.65
mozilla-foundation/common_voice_17_0 yue test 1.188
mozilla-foundation/common_voice_17_0 en test(2k samples) 7.583
mozilla-foundation/common_voice_16_1 zh-CN test 13.96
JackyHoCL/cleaned_mixed_cantonese_and_english_speech yue test 11.9775

2025-07-04:
CER:

Dataset Lang Split CER(in %)
Training yue validation 11.39
mozilla-foundation/common_voice_17_0 yue test
mozilla-foundation/common_voice_16_1 yue test 12.2
JackyHoCL/cleaned_mixed_cantonese_and_english_speech yue test

per_device_train_batch_size=96,
learning_rate=1e-6,


CER: 15.4%

transformers-4.46.3

Train Args:
per_device_train_batch_size=32,
gradient_accumulation_steps=1,
learning_rate=1e-5,
gradient_checkpointing=True,
per_device_eval_batch_size=64,
generation_max_length=225,

Hardware:
NVIDIA Tesla V100 16GB * 4

A Realtime Streaming application example is built on this model:
https://github.com/JackyHoCL/whisper-realtime.git

FAQ:

  1. If having tokenizer issue during inference, please update your transformers version to >= 4.49.0
pip install --upgrade transformers
Downloads last month
50
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for JackyHoCL/whisper-small-cantonese-yue-english

Finetuned
(3445)
this model
Finetunes
1 model

Datasets used to train JackyHoCL/whisper-small-cantonese-yue-english