Sparkplugx1904 commited on
Commit
af3252b
·
verified ·
1 Parent(s): d7bea18

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +57 -0
README.md ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - id
4
+ base_model:
5
+ - openai/whisper-base
6
+ pipeline_tag: automatic-speech-recognition
7
+ datasets:
8
+ - mozilla-foundation/common_voice_23_0
9
+ ---
10
+
11
+ # Whisper Base Model – Indonesian ASR
12
+
13
+ ## Model Description
14
+ This model is a fine-tuned version of **openai/whisper-base** for **Automatic Speech Recognition (ASR)** in **Indonesian (id)**.
15
+ It supports transcription of Indonesian speech into text across various audio conditions, with performance and resource usage depending on the selected model size.
16
+
17
+ ## Intended Use
18
+ - Indonesian speech-to-text transcription
19
+ - Research and experimentation
20
+ - Educational and academic purposes
21
+ - Application development and benchmarking
22
+
23
+ Model variants (tiny, base, small, medium, large) differ in accuracy, speed, and hardware requirements. Users should select the size that best matches their constraints and objectives.
24
+
25
+ ## Limitations
26
+ - Transcription quality depends on audio clarity, speaker accent, and background noise
27
+ - Smaller variants may produce higher error rates on long or complex audio
28
+ - Larger variants require significantly more compute and memory
29
+ - Outputs should be reviewed before use in critical or high-risk applications
30
+
31
+ ## Training Data
32
+ This model was fine-tuned using **Mozilla Common Voice v23.0 (Indonesian)**.
33
+ Common Voice is a publicly available, community-driven speech dataset released by Mozilla under a permissive license.
34
+ Dataset characteristics such as speaker diversity, recording quality, and utterance length may influence model behavior.
35
+
36
+ ## Evaluation
37
+ The model is typically evaluated using **Word Error Rate (WER)**.
38
+ Evaluation results may vary depending on dataset, domain, audio conditions, and model size.
39
+
40
+ ## Training results
41
+ | Step | Training Loss |
42
+ |------|---------------|
43
+ | 100 | 0.880500 |
44
+ | 200 | 0.472300 |
45
+ | 300 | 0.408100 |
46
+ | 400 | 0.328500 |
47
+ | 500 | 0.226000 |
48
+ | 600 | 0.237500 |
49
+ | 700 | 0.148600 |
50
+ | 800 | 0.111600 |
51
+ | 900 | 0.104900 |
52
+ | 1000 | 0.073900 |
53
+ | 1100 | 0.063100 |
54
+ | 1200 | 0.050300 |
55
+ | 1400 | 0.039800 |
56
+ | 1500 | 0.031000 |
57
+ | 1550 | 0.031400 |