whisper-tiny-id / README.md
Sparkplugx1904's picture
Create README.md
f9c5fd8 verified
metadata
language:
  - id
base_model:
  - openai/whisper-tiny
pipeline_tag: automatic-speech-recognition
datasets:
  - mozilla-foundation/common_voice_23_0

Whisper Tiny Model – Indonesian ASR

Model Description

This model is a fine-tuned version of openai/whisper-tiny for Automatic Speech Recognition (ASR) in Indonesian (id).
It supports transcription of Indonesian speech into text across various audio conditions, with performance and resource usage depending on the selected model size.

Intended Use

  • Indonesian speech-to-text transcription
  • Research and experimentation
  • Educational and academic purposes
  • Application development and benchmarking

Model variants (tiny, base, small, medium, large) differ in accuracy, speed, and hardware requirements. Users should select the size that best matches their constraints and objectives.

Limitations

  • Transcription quality depends on audio clarity, speaker accent, and background noise
  • Smaller variants may produce higher error rates on long or complex audio
  • Larger variants require significantly more compute and memory
  • Outputs should be reviewed before use in critical or high-risk applications

Training Data

This model was fine-tuned using Mozilla Common Voice v23.0 (Indonesian).
Common Voice is a publicly available, community-driven speech dataset released by Mozilla under a permissive license.
Dataset characteristics such as speaker diversity, recording quality, and utterance length may influence model behavior.

Evaluation

The model is typically evaluated using Word Error Rate (WER).
Evaluation results may vary depending on dataset, domain, audio conditions, and model size.

Training results

Step Training Loss
100 1.282900
200 0.682300
300 0.568900
400 0.487500
500 0.372700
600 0.375500
700 0.276200
800 0.226000
900 0.223800
1000 0.188600
1100 0.164300
1200 0.151400
1300 0.130000
1400 0.133900
1500 0.119700
1550 0.117300