BEA 2026 — L1-Aware Vocabulary Difficulty Prediction

Authors: Karthik Mattu, Adit Dhall, Arshad Naguru, Shubh Sehgal, Thejas Gowda, Hakyung Sung · Rochester Institute of Technology

Best-seed checkpoints from our BEA 2026 Shared Task submission (Closed Track).
Code: aditdhall/bea2026-vocab-difficulty

Checkpoints

File	L1	Seed	Dev RMSE
`exp4_large_es_seed42.pt`	Spanish	42	1.0565
`exp4_large_de_seed789.pt`	German	789	1.0097
`exp4_large_cn_seed42.pt`	Mandarin	42	0.9521

Architecture

Hybrid xlm-roberta-large encoder with a 21-feature psycholinguistic feature head.
Input: L1_source_word [SEP] L1_context [SEP] en_target_clue [SEP] en_target_word
Output: continuous GLMM score (word difficulty)

Results

Test Set (Official Evaluation)

L1	Official Baseline	Run 1: Best single seed	Run 2: 5-seed ensemble	Run 3: Ensemble + XGBoost blend
Spanish	1.257	1.087	1.053	1.045
German	1.258	1.000	1.012	0.994
Mandarin	1.140	0.913	0.911	0.900

Dev Set

L1	Official Baseline	Best single seed	5-seed ensemble	Ensemble + XGBoost blend
Spanish	1.357	1.057	1.021	0.997
German	1.328	1.010	1.013	1.002
Mandarin	1.175	0.952	0.940	0.932

All values are RMSE (↓ better). Run 3 is the best submission: a weighted blend of the 5-seed xlm-roberta-large ensemble with XGBoost trained on psycholinguistic features (blend weights: es=0.8/0.2, de=0.8/0.2, cn=0.9/0.1).

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support