--- language: - id base_model: - openai/whisper-base pipeline_tag: automatic-speech-recognition datasets: - mozilla-foundation/common_voice_23_0 --- # Whisper Base Model – Indonesian ASR ## Model Description This model is a fine-tuned version of **openai/whisper-base** for **Automatic Speech Recognition (ASR)** in **Indonesian (id)**. It supports transcription of Indonesian speech into text across various audio conditions, with performance and resource usage depending on the selected model size. ## Intended Use - Indonesian speech-to-text transcription - Research and experimentation - Educational and academic purposes - Application development and benchmarking Model variants (tiny, base, small, medium, large) differ in accuracy, speed, and hardware requirements. Users should select the size that best matches their constraints and objectives. ## Limitations - Transcription quality depends on audio clarity, speaker accent, and background noise - Smaller variants may produce higher error rates on long or complex audio - Larger variants require significantly more compute and memory - Outputs should be reviewed before use in critical or high-risk applications ## Training Data This model was fine-tuned using **Mozilla Common Voice v23.0 (Indonesian)**. Common Voice is a publicly available, community-driven speech dataset released by Mozilla under a permissive license. Dataset characteristics such as speaker diversity, recording quality, and utterance length may influence model behavior. ## Evaluation The model is typically evaluated using **Word Error Rate (WER)**. Evaluation results may vary depending on dataset, domain, audio conditions, and model size. ## Training results | Step | Training Loss | |------|---------------| | 100 | 0.880500 | | 200 | 0.472300 | | 300 | 0.408100 | | 400 | 0.328500 | | 500 | 0.226000 | | 600 | 0.237500 | | 700 | 0.148600 | | 800 | 0.111600 | | 900 | 0.104900 | | 1000 | 0.073900 | | 1100 | 0.063100 | | 1200 | 0.050300 | | 1400 | 0.039800 | | 1500 | 0.031000 | | 1550 | 0.031400 |