vsro200 commited on
Commit
674d977
·
verified ·
1 Parent(s): a5cf4d0

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +88 -0
README.md ADDED
@@ -0,0 +1,88 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - ro
4
+ ---
5
+ # Romanian Visual Speech Recognition (VSR) Models
6
+
7
+ This repository contains the model checkpoints for the paper **"VSRo-200: A Romanian Visual Speech Recognition Dataset for Studying Supervision and Multimodal Robustness"**.
8
+
9
+ These models are fine-tuned versions of MultiVSR, specifically trained for the Romanian language. We provide various checkpoints to demonstrate the impact of dataset size, annotation quality (human-annotated vs. automatically generated pseudo-labels), and gender distribution on VSR performance.
10
+
11
+
12
+ ## 📊 Accompanying Dataset
13
+ The models were trained and evaluated on the **RoVSR Dataset** (Romanian Visual Speech Recognition Dataset), a 200-hour corpus of Romanian podcasts.
14
+ * **Dataset Link:** [vsro200/vsro200_dataset](https://huggingface.co/datasets/vsro200/vsro200_dataset)
15
+
16
+ ## 📂 Repository Structure
17
+ All model checkpoints are stored in the `checkpoints/` directory. The naming convention follows the pattern: `model_[hours]_[type].pt`.
18
+ * `_annot`: Models trained on human-annotated data.
19
+ * `_auto`: Models trained on automatically transcribed data (pseudo-labels).
20
+ * `_shuffle`: Alternative data splits for the 100h models to test variance.
21
+ * `_males` / `_females` / `_mix`: Models trained specifically on gender-segregated or mixed 40-hour annotated subsets to evaluate gender bias.
22
+
23
+ ---
24
+
25
+ ## 🏆 Performance (Word Error Rate - WER)
26
+
27
+ Below are the primary results evaluated on the **Test Unseen** set. Lower WER indicates better performance.
28
+
29
+ ### 1. Annotated vs. Auto Data Scaling
30
+ Comparison of models trained on perfectly annotated data versus those trained on automatically generated labels across different dataset sizes.
31
+
32
+ | Training Hours | Human Annotated (`_annot`) (%) | Auto Generated (`_auto`) (%) |
33
+ |:---:|:---:|:---:|
34
+ | 10h | 72.50 | 74.61 |
35
+ | 25h | 64.86 | 66.27 |
36
+ | 50h | 58.87 | 59.28 |
37
+ | 75h | 54.86 | 56.25 |
38
+ | 100h | 53.29 | 53.63 |
39
+ | 125h | -- | 51.71 |
40
+ | 150h | -- | 51.25 |
41
+ | 175h | -- | 49.84 |
42
+ | 200h | -- | 48.75 |
43
+
44
+ ### 2. Gender Bias Analysis (40h Models)
45
+ Evaluation demonstrating the impact of gender representation in the training set.
46
+
47
+ | Training Subset | Global WER (%) | WER Males (%) | WER Females (%) |
48
+ |:---|:---:|:---:|:---:|
49
+ | 40h Males | 62.15 | 61.32 | 62.97 |
50
+ | 40h Females | 59.33 | 59.17 | 59.49 |
51
+ | 40h Mix | 59.52 | 59.19 | 59.85 |
52
+
53
+ ### 3. Out of Distribution (OOD) Robustness
54
+ Evaluated using the `model_200h_auto.pt` checkpoint on different video degradation and domain shift scenarios.
55
+
56
+ | OOD Category | WER (%) |
57
+ |:---|:---:|
58
+ | Vlogs | 58.61 |
59
+ | Specific domains | 63.01 |
60
+ | Noisy | 68.96 |
61
+ | Archival | 87.97 |
62
+ | **Global OOD** | **68.46** |
63
+
64
+ ### 4. Stability and Variance Analysis
65
+ Due to high computational resource requirements, comprehensive multiple-run variance testing was isolated to the 100-hour models. The models were trained across 3 different random data shuffles to observe stability and the true impact of human annotations versus auto-generated labels.
66
+
67
+ | Data Type (100h) | Mean WER (%) | Standard Deviation ($\sigma$) (%) |
68
+ |:---|:---:|:---:|
69
+ | Human Annotated | 53.21 | ± 0.37 |
70
+ | Auto Generated | 53.82 | ± 0.17 |
71
+
72
+ ---
73
+
74
+ ## 💻 Usage
75
+
76
+ To use these models, you can download them directly using the `huggingface_hub` library in Python:
77
+ ```python
78
+ from huggingface_hub import hf_hub_download
79
+
80
+ # Download the 200h auto model
81
+ model_path = hf_hub_download(
82
+ repo_id="vsro200/VSR-Models",
83
+ filename="checkpoints/model_200h_auto.pt",
84
+ repo_type="model"
85
+ )
86
+
87
+ print(f"Model downloaded to: {model_path}")
88
+ ```