Update README.md
Browse files
README.md
CHANGED
|
@@ -21,25 +21,15 @@ To assess the representational quality of our trained VSR encoder independently
|
|
| 21 |
|
| 22 |
For training code, preprocessing pipelines, and evaluation scripts, please refer to the [GitHub repository](https://github.com/vsro200/vsro200).
|
| 23 |
|
| 24 |
-
## Configurations
|
| 25 |
-
|
| 26 |
-
We trained four MLP variants that differ only in the visual preprocessing applied before the encoder:
|
| 27 |
-
|
| 28 |
-
| Variant | Crop size | Region of interest |
|
| 29 |
-
|:---|:---:|:---|
|
| 30 |
-
| MLP v1 | 96 × 96 | Full-face resize |
|
| 31 |
-
| MLP v2 | 64 × 64 | Center-Middle |
|
| 32 |
-
| MLP v3 | 64 × 64 | Center-Bottom |
|
| 33 |
-
|
| 34 |
## Results
|
| 35 |
|
| 36 |
-
Top-1 and Top-5 word classification accuracy (%) on the LRRo `Lab` (controlled studio recordings) and `Wild` (in-the-wild) test sets. Higher is better.
|
| 37 |
|
| 38 |
-
| Variant | Lab Acc@1 | Lab Acc@5 | Wild Acc@1 | Wild Acc@5 |
|
| 39 |
-
|:---|:---:|:---:|:---:|:---:|
|
| 40 |
-
| MLP v1 | 90.6 | 98.5 | 64.5 | 87.6 |
|
| 41 |
-
| MLP v2 | 91.4 | 99.0 | 68.6 | 89.3 |
|
| 42 |
-
| MLP v3 | **95.0** | **99.4** | **72.7** | **92.6** |
|
| 43 |
|
| 44 |
Restricting the visual input to the lower half of the face (Center-Bottom crops) consistently outperforms full-face resizing, with the 64 × 64 crop (MLP v3) yielding the largest improvement on both Lab and Wild data.
|
| 45 |
|
|
|
|
| 21 |
|
| 22 |
For training code, preprocessing pipelines, and evaluation scripts, please refer to the [GitHub repository](https://github.com/vsro200/vsro200).
|
| 23 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 24 |
## Results
|
| 25 |
|
| 26 |
+
We trained four MLP variants that differ only in the visual preprocessing applied before the encoder. Top-1 and Top-5 word classification accuracy (%) on the LRRo `Lab` (controlled studio recordings) and `Wild` (in-the-wild) test sets. Higher is better.
|
| 27 |
|
| 28 |
+
| Variant | Crop size | Region of interest | Lab Acc@1 | Lab Acc@5 | Wild Acc@1 | Wild Acc@5 |
|
| 29 |
+
|:---|:---:|:---|:---:|:---:|:---:|:---:|
|
| 30 |
+
| MLP v1 | 96 × 96 | Full-face resize | 90.6 | 98.5 | 64.5 | 87.6 |
|
| 31 |
+
| MLP v2 | 64 × 64 | Center-Middle | 91.4 | 99.0 | 68.6 | 89.3 |
|
| 32 |
+
| MLP v3 | 64 × 64 | Center-Bottom | **95.0** | **99.4** | **72.7** | **92.6** |
|
| 33 |
|
| 34 |
Restricting the visual input to the lower half of the face (Center-Bottom crops) consistently outperforms full-face resizing, with the 64 × 64 crop (MLP v3) yielding the largest improvement on both Lab and Wild data.
|
| 35 |
|