| --- |
| language: |
| - id |
| base_model: |
| - openai/whisper-small |
| pipeline_tag: automatic-speech-recognition |
| datasets: |
| - mozilla-foundation/common_voice_23_0 |
| --- |
| |
| # Whisper Small Model – Indonesian ASR |
|
|
| ## Model Description |
| This model is a fine-tuned version of **openai/whisper-small** for **Automatic Speech Recognition (ASR)** in **Indonesian (id)**. |
| It supports transcription of Indonesian speech into text across various audio conditions, with performance and resource usage depending on the selected model size. |
|
|
| ## Intended Use |
| - Indonesian speech-to-text transcription |
| - Research and experimentation |
| - Educational and academic purposes |
| - Application development and benchmarking |
|
|
| Model variants (tiny, base, small, medium, large) differ in accuracy, speed, and hardware requirements. Users should select the size that best matches their constraints and objectives. |
|
|
| ## Limitations |
| - Transcription quality depends on audio clarity, speaker accent, and background noise |
| - Smaller variants may produce higher error rates on long or complex audio |
| - Larger variants require significantly more compute and memory |
| - Outputs should be reviewed before use in critical or high-risk applications |
|
|
| ## Training Data |
| This model was fine-tuned using **Mozilla Common Voice v23.0 (Indonesian)**. |
| Common Voice is a publicly available, community-driven speech dataset released by Mozilla under a permissive license. |
| Dataset characteristics such as speaker diversity, recording quality, and utterance length may influence model behavior. |
|
|
| ## Evaluation |
| The model is typically evaluated using **Word Error Rate (WER)**. |
| Evaluation results may vary depending on dataset, domain, audio conditions, and model size. |
|
|
| ## Training results |
| | Step | Training Loss | |
| |------|---------------| |
| | 100 | 0.897100 | |
| | 200 | 0.509400 | |
| | 300 | 0.234200 | |
| | 400 | 0.153100 | |
| | 500 | 0.068000 | |
| | 600 | 0.074100 | |
| | 700 | 0.029100 | |
| | 800 | 0.017800 | |
| | 900 | 0.013600 | |
| | 1000 | 0.007200 | |
| | 1100 | 0.004900 | |
| | 1200 | 0.003700 | |
| | 1300 | 0.001800 | |
| | 1400 | 0.001700 | |
| | 1500 | 0.001100 | |
| | 1550 | 0.001100 | |
|
|