BEA 2026 โ€” L1-Aware Vocabulary Difficulty Prediction

Authors: Karthik Mattu, Adit Dhall, Arshad Naguru, Shubh Sehgal, Thejas Gowda, Hakyung Sung ยท Rochester Institute of Technology

Best-seed checkpoints from our BEA 2026 Shared Task submission (Closed Track).
Code: aditdhall/bea2026-vocab-difficulty

Checkpoints

File L1 Seed Dev RMSE
exp4_large_es_seed42.pt Spanish 42 1.0565
exp4_large_de_seed789.pt German 789 1.0097
exp4_large_cn_seed42.pt Mandarin 42 0.9521

Architecture

Hybrid xlm-roberta-large encoder with a 21-feature psycholinguistic feature head.
Input: L1_source_word [SEP] L1_context [SEP] en_target_clue [SEP] en_target_word
Output: continuous GLMM score (word difficulty)

Results

Test Set (Official Evaluation)

L1 Official Baseline Run 1: Best single seed Run 2: 5-seed ensemble Run 3: Ensemble + XGBoost blend
Spanish 1.257 1.087 1.053 1.045
German 1.258 1.000 1.012 0.994
Mandarin 1.140 0.913 0.911 0.900

Dev Set

L1 Official Baseline Best single seed 5-seed ensemble Ensemble + XGBoost blend
Spanish 1.357 1.057 1.021 0.997
German 1.328 1.010 1.013 1.002
Mandarin 1.175 0.952 0.940 0.932

All values are RMSE (โ†“ better). Run 3 is the best submission: a weighted blend of the 5-seed xlm-roberta-large ensemble with XGBoost trained on psycholinguistic features (blend weights: es=0.8/0.2, de=0.8/0.2, cn=0.9/0.1).

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support