Update README.md
Browse files
README.md
CHANGED
|
@@ -65,12 +65,12 @@ A variance analysis across three random shuffles of the 100h subsets yields a me
|
|
| 65 |
|
| 66 |
### Out-of-distribution robustness
|
| 67 |
|
| 68 |
-
* **Test Seen / Unseen (In-Domain):** Baseline performance on podcast data
|
| 69 |
-
* **Vlogs:** Unconstrained videos shot in
|
| 70 |
-
* **Specific domains:** Content featuring highly specialized or technical vocabulary (e.g., medical, scientific).
|
| 71 |
* **Noisy:** Videos with poor resolution, bad lighting, or heavy motion blur.
|
| 72 |
-
* **Archival (Black & White):** Historical footage with distinct visual artifacts, atypical framerates, and lack of color information.
|
| 73 |
-
* **Global OOD:** The aggregated metrics across all out-of-distribution subsets
|
| 74 |
|
| 75 |
| Dataset / Category | # Clips | WER (%) | CER (%) | OOV Token (%) | OOV Type (%) |
|
| 76 |
|:---|:---:|:---:|:---:|:---:|:---:|
|
|
@@ -90,7 +90,7 @@ A variance analysis across three random shuffles of the 100h subsets yields a me
|
|
| 90 |
### Gender bias analysis (40h models)
|
| 91 |
|
| 92 |
|
| 93 |
-
To evaluate gender bias and cross-speaker generalization, we trained 40-hour baseline models on male-only, female-only, and mixed datasets.
|
| 94 |
|
| 95 |
#### Test Unseen
|
| 96 |
| Training Set (40h) | Global WER (%) | Global CER (%) | Male WER (%) | Male CER (%) | Female WER (%) | Female CER (%) |
|
|
@@ -107,6 +107,7 @@ To evaluate gender bias and cross-speaker generalization, we trained 40-hour bas
|
|
| 107 |
| Mixed Data | **56.29** | **31.22** | 60.56 | 33.54 | 52.15 | 28.93 |
|
| 108 |
|
| 109 |
|
|
|
|
| 110 |
## Citation
|
| 111 |
|
| 112 |
If you use these models, please cite:
|
|
|
|
| 65 |
|
| 66 |
### Out-of-distribution robustness
|
| 67 |
|
| 68 |
+
* **Test Seen / Unseen (In-Domain):** Baseline performance on podcast data, tested on our 200h-model.
|
| 69 |
+
* **Vlogs:** Unconstrained videos shot in different camera angles, dynamic lighting, movement.
|
| 70 |
+
* **Specific domains:** Content featuring highly specialized or technical vocabulary (e.g., medical, scientific).
|
| 71 |
* **Noisy:** Videos with poor resolution, bad lighting, or heavy motion blur.
|
| 72 |
+
* **Archival (Black & White):** Historical footage with distinct visual artifacts, atypical framerates, and lack of color information.
|
| 73 |
+
* **Global OOD:** The aggregated metrics across all out-of-distribution subsets.
|
| 74 |
|
| 75 |
| Dataset / Category | # Clips | WER (%) | CER (%) | OOV Token (%) | OOV Type (%) |
|
| 76 |
|:---|:---:|:---:|:---:|:---:|:---:|
|
|
|
|
| 90 |
### Gender bias analysis (40h models)
|
| 91 |
|
| 92 |
|
| 93 |
+
To evaluate gender bias and cross-speaker generalization, we trained 40-hour baseline models on male-only, female-only, and mixed datasets.
|
| 94 |
|
| 95 |
#### Test Unseen
|
| 96 |
| Training Set (40h) | Global WER (%) | Global CER (%) | Male WER (%) | Male CER (%) | Female WER (%) | Female CER (%) |
|
|
|
|
| 107 |
| Mixed Data | **56.29** | **31.22** | 60.56 | 33.54 | 52.15 | 28.93 |
|
| 108 |
|
| 109 |
|
| 110 |
+
|
| 111 |
## Citation
|
| 112 |
|
| 113 |
If you use these models, please cite:
|