--- language: - id base_model: - openai/whisper-tiny pipeline_tag: automatic-speech-recognition datasets: - mozilla-foundation/common_voice_23_0 --- # Whisper Tiny Model – Indonesian ASR ## Model Description This model is a fine-tuned version of **openai/whisper-tiny** for **Automatic Speech Recognition (ASR)** in **Indonesian (id)**. It supports transcription of Indonesian speech into text across various audio conditions, with performance and resource usage depending on the selected model size. ## Intended Use - Indonesian speech-to-text transcription - Research and experimentation - Educational and academic purposes - Application development and benchmarking Model variants (tiny, base, small, medium, large) differ in accuracy, speed, and hardware requirements. Users should select the size that best matches their constraints and objectives. ## Limitations - Transcription quality depends on audio clarity, speaker accent, and background noise - Smaller variants may produce higher error rates on long or complex audio - Larger variants require significantly more compute and memory - Outputs should be reviewed before use in critical or high-risk applications ## Training Data This model was fine-tuned using **Mozilla Common Voice v23.0 (Indonesian)**. Common Voice is a publicly available, community-driven speech dataset released by Mozilla under a permissive license. Dataset characteristics such as speaker diversity, recording quality, and utterance length may influence model behavior. ## Evaluation The model is typically evaluated using **Word Error Rate (WER)**. Evaluation results may vary depending on dataset, domain, audio conditions, and model size. ## Training results | Step | Training Loss | |------|---------------| | 100| 1.282900| |200| 0.682300| |300| 0.568900| |400| 0.487500| |500| 0.372700| |600| 0.375500| |700| 0.276200| |800| 0.226000| |900| 0.223800| |1000| 0.188600| |1100| 0.164300| |1200| 0.151400| |1300| 0.130000| |1400| 0.133900| |1500| 0.119700| |1550| 0.117300|