whisper-base-id / README.md
Sparkplugx1904's picture
Create README.md
af3252b verified
metadata
language:
  - id
base_model:
  - openai/whisper-base
pipeline_tag: automatic-speech-recognition
datasets:
  - mozilla-foundation/common_voice_23_0

Whisper Base Model – Indonesian ASR

Model Description

This model is a fine-tuned version of openai/whisper-base for Automatic Speech Recognition (ASR) in Indonesian (id).
It supports transcription of Indonesian speech into text across various audio conditions, with performance and resource usage depending on the selected model size.

Intended Use

  • Indonesian speech-to-text transcription
  • Research and experimentation
  • Educational and academic purposes
  • Application development and benchmarking

Model variants (tiny, base, small, medium, large) differ in accuracy, speed, and hardware requirements. Users should select the size that best matches their constraints and objectives.

Limitations

  • Transcription quality depends on audio clarity, speaker accent, and background noise
  • Smaller variants may produce higher error rates on long or complex audio
  • Larger variants require significantly more compute and memory
  • Outputs should be reviewed before use in critical or high-risk applications

Training Data

This model was fine-tuned using Mozilla Common Voice v23.0 (Indonesian).
Common Voice is a publicly available, community-driven speech dataset released by Mozilla under a permissive license.
Dataset characteristics such as speaker diversity, recording quality, and utterance length may influence model behavior.

Evaluation

The model is typically evaluated using Word Error Rate (WER).
Evaluation results may vary depending on dataset, domain, audio conditions, and model size.

Training results

Step Training Loss
100 0.880500
200 0.472300
300 0.408100
400 0.328500
500 0.226000
600 0.237500
700 0.148600
800 0.111600
900 0.104900
1000 0.073900
1100 0.063100
1200 0.050300
1400 0.039800
1500 0.031000
1550 0.031400