BEA 2026 โ L1-Aware Vocabulary Difficulty Prediction
Authors: Karthik Mattu, Adit Dhall, Arshad Naguru, Shubh Sehgal, Thejas Gowda, Hakyung Sung ยท Rochester Institute of Technology
Best-seed checkpoints from our BEA 2026 Shared Task submission (Closed Track).
Code: aditdhall/bea2026-vocab-difficulty
Checkpoints
| File | L1 | Seed | Dev RMSE |
|---|---|---|---|
exp4_large_es_seed42.pt |
Spanish | 42 | 1.0565 |
exp4_large_de_seed789.pt |
German | 789 | 1.0097 |
exp4_large_cn_seed42.pt |
Mandarin | 42 | 0.9521 |
Architecture
Hybrid xlm-roberta-large encoder with a 21-feature psycholinguistic feature head.
Input: L1_source_word [SEP] L1_context [SEP] en_target_clue [SEP] en_target_word
Output: continuous GLMM score (word difficulty)
Results
Test Set (Official Evaluation)
| L1 | Official Baseline | Run 1: Best single seed | Run 2: 5-seed ensemble | Run 3: Ensemble + XGBoost blend |
|---|---|---|---|---|
| Spanish | 1.257 | 1.087 | 1.053 | 1.045 |
| German | 1.258 | 1.000 | 1.012 | 0.994 |
| Mandarin | 1.140 | 0.913 | 0.911 | 0.900 |
Dev Set
| L1 | Official Baseline | Best single seed | 5-seed ensemble | Ensemble + XGBoost blend |
|---|---|---|---|---|
| Spanish | 1.357 | 1.057 | 1.021 | 0.997 |
| German | 1.328 | 1.010 | 1.013 | 1.002 |
| Mandarin | 1.175 | 0.952 | 0.940 | 0.932 |
All values are RMSE (โ better). Run 3 is the best submission: a weighted blend of the 5-seed xlm-roberta-large ensemble with XGBoost trained on psycholinguistic features (blend weights: es=0.8/0.2, de=0.8/0.2, cn=0.9/0.1).
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support