huni0304/Whisper-small-koja

This model is a fine-tuned version of openai/whisper-base on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2226
  • Wer: 17.5001

Model description

이 모델은 whisper-small을 한국어 및 일본어 데이터로 파인튜닝한 것이다.

Training and evaluation data

일본어 데이터 : mozilla-foundation/common_voice_11_0

한국어 데이터 : kresnik/zeroth_korean

총 한국어 데이터 : 22,263 발화

총 일본어 데이터 : 16,740 발화

Training data : 37,053 발화

Evaluation data : 1,950 발화

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • training_steps: 6000

Training results

Training Loss Epoch Step Validation Loss Wer
0.3482 0.4318 1000 0.3453 29.8426
0.289 0.8636 2000 0.2821 24.2086
0.1821 1.2953 3000 0.2548 20.7527
0.2005 1.7271 4000 0.2350 18.7315
0.135 2.1589 5000 0.2265 17.4421
0.1273 2.5907 6000 0.2226 17.5001

Framework versions

  • Transformers 4.51.3
  • Pytorch 2.7.0+cu126
  • Datasets 2.16.0
  • Tokenizers 0.21.1
Downloads last month
1
Safetensors
Model size
72.6M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for huni0304/Whisper-small-koja

Finetuned
(686)
this model