2025-07-23: CER:
| Dataset | Lang | Split | CER(in %) |
|---|---|---|---|
| Training | yue | validation | 10.65 |
| mozilla-foundation/common_voice_17_0 | yue | test | 1.188 |
| mozilla-foundation/common_voice_17_0 | en | test(2k samples) | 7.583 |
| mozilla-foundation/common_voice_16_1 | zh-CN | test | 13.96 |
| JackyHoCL/cleaned_mixed_cantonese_and_english_speech | yue | test | 11.9775 |
2025-07-04:
CER:
| Dataset | Lang | Split | CER(in %) |
|---|---|---|---|
| Training | yue | validation | 11.39 |
| mozilla-foundation/common_voice_17_0 | yue | test | |
| mozilla-foundation/common_voice_16_1 | yue | test | 12.2 |
| JackyHoCL/cleaned_mixed_cantonese_and_english_speech | yue | test |
per_device_train_batch_size=96,
learning_rate=1e-6,
CER: 15.4%
transformers-4.46.3
Train Args:
per_device_train_batch_size=32,
gradient_accumulation_steps=1,
learning_rate=1e-5,
gradient_checkpointing=True,
per_device_eval_batch_size=64,
generation_max_length=225,
Hardware:
NVIDIA Tesla V100 16GB * 4
A Realtime Streaming application example is built on this model:
https://github.com/JackyHoCL/whisper-realtime.git
FAQ:
- If having tokenizer issue during inference, please update your transformers version to >= 4.49.0
pip install --upgrade transformers
- Downloads last month
- 50