Sparkplugx1904 commited on
Commit
f9c5fd8
·
verified ·
1 Parent(s): 18e77b6

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +58 -0
README.md ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - id
4
+ base_model:
5
+ - openai/whisper-tiny
6
+ pipeline_tag: automatic-speech-recognition
7
+ datasets:
8
+ - mozilla-foundation/common_voice_23_0
9
+ ---
10
+
11
+ # Whisper Tiny Model – Indonesian ASR
12
+
13
+ ## Model Description
14
+ This model is a fine-tuned version of **openai/whisper-tiny** for **Automatic Speech Recognition (ASR)** in **Indonesian (id)**.
15
+ It supports transcription of Indonesian speech into text across various audio conditions, with performance and resource usage depending on the selected model size.
16
+
17
+ ## Intended Use
18
+ - Indonesian speech-to-text transcription
19
+ - Research and experimentation
20
+ - Educational and academic purposes
21
+ - Application development and benchmarking
22
+
23
+ Model variants (tiny, base, small, medium, large) differ in accuracy, speed, and hardware requirements. Users should select the size that best matches their constraints and objectives.
24
+
25
+ ## Limitations
26
+ - Transcription quality depends on audio clarity, speaker accent, and background noise
27
+ - Smaller variants may produce higher error rates on long or complex audio
28
+ - Larger variants require significantly more compute and memory
29
+ - Outputs should be reviewed before use in critical or high-risk applications
30
+
31
+ ## Training Data
32
+ This model was fine-tuned using **Mozilla Common Voice v23.0 (Indonesian)**.
33
+ Common Voice is a publicly available, community-driven speech dataset released by Mozilla under a permissive license.
34
+ Dataset characteristics such as speaker diversity, recording quality, and utterance length may influence model behavior.
35
+
36
+ ## Evaluation
37
+ The model is typically evaluated using **Word Error Rate (WER)**.
38
+ Evaluation results may vary depending on dataset, domain, audio conditions, and model size.
39
+
40
+ ## Training results
41
+ | Step | Training Loss |
42
+ |------|---------------|
43
+ | 100| 1.282900|
44
+ |200| 0.682300|
45
+ |300| 0.568900|
46
+ |400| 0.487500|
47
+ |500| 0.372700|
48
+ |600| 0.375500|
49
+ |700| 0.276200|
50
+ |800| 0.226000|
51
+ |900| 0.223800|
52
+ |1000| 0.188600|
53
+ |1100| 0.164300|
54
+ |1200| 0.151400|
55
+ |1300| 0.130000|
56
+ |1400| 0.133900|
57
+ |1500| 0.119700|
58
+ |1550| 0.117300|