vsro200
/

mlp-lrro-vsro200

Video Classification

visual-speech-recognition

word-classification

Model card Files Files and versions

vsro200 commited on 6 days ago

Commit

c3f3cdf

·

verified ·

1 Parent(s): 5613182

Update README.md

Files changed (1) hide show

README.md +6 -16

README.md CHANGED Viewed

@@ -21,25 +21,15 @@ To assess the representational quality of our trained VSR encoder independently
 For training code, preprocessing pipelines, and evaluation scripts, please refer to the [GitHub repository](https://github.com/vsro200/vsro200).
-## Configurations
-We trained four MLP variants that differ only in the visual preprocessing applied before the encoder:
-| Variant | Crop size | Region of interest |
-|:---|:---:|:---|
-| MLP v1 | 96 × 96 | Full-face resize |
-| MLP v2 | 64 × 64 | Center-Middle |
-| MLP v3 | 64 × 64 | Center-Bottom |
 ## Results
-Top-1 and Top-5 word classification accuracy (%) on the LRRo `Lab` (controlled studio recordings) and `Wild` (in-the-wild) test sets. Higher is better.
-| Variant | Lab Acc@1 | Lab Acc@5 | Wild Acc@1 | Wild Acc@5 |
-|:---|:---:|:---:|:---:|:---:|
-| MLP v1 | 90.6 | 98.5 | 64.5 | 87.6 |
-| MLP v2 | 91.4 | 99.0 | 68.6 | 89.3 |
-| MLP v3 | **95.0** | **99.4** | **72.7** | **92.6** |
 Restricting the visual input to the lower half of the face (Center-Bottom crops) consistently outperforms full-face resizing, with the 64 × 64 crop (MLP v3) yielding the largest improvement on both Lab and Wild data.

 For training code, preprocessing pipelines, and evaluation scripts, please refer to the [GitHub repository](https://github.com/vsro200/vsro200).
 ## Results
+We trained four MLP variants that differ only in the visual preprocessing applied before the encoder. Top-1 and Top-5 word classification accuracy (%) on the LRRo `Lab` (controlled studio recordings) and `Wild` (in-the-wild) test sets. Higher is better.
+| Variant | Crop size | Region of interest | Lab Acc@1 | Lab Acc@5 | Wild Acc@1 | Wild Acc@5 |
+|:---|:---:|:---|:---:|:---:|:---:|:---:|
+| MLP v1 | 96 × 96 | Full-face resize | 90.6 | 98.5 | 64.5 | 87.6 |
+| MLP v2 | 64 × 64 | Center-Middle | 91.4 | 99.0 | 68.6 | 89.3 |
+| MLP v3 | 64 × 64 | Center-Bottom | **95.0** | **99.4** | **72.7** | **92.6** |
 Restricting the visual input to the lower half of the face (Center-Bottom crops) consistently outperforms full-face resizing, with the 64 × 64 crop (MLP v3) yielding the largest improvement on both Lab and Wild data.