SciLaD: A Large-Scale, Transparent, Reproducible Dataset for Natural Scientific Language Processing Paper • 2512.11192 • Published Dec 12, 2025 • 1
CommonLID: Re-evaluating State-of-the-Art Language Identification Performance on Web Data Paper • 2601.18026 • Published Jan 25
Semi-automatic staging area for high-quality structured data extraction from scientific literature Paper • 2309.10923 • Published Sep 19, 2023
Mining experimental data from Materials Science literature with Large Language Models: an evaluation study Paper • 2401.11052 • Published Jan 19, 2024 • 1
SuperMat: Construction of a linked annotated dataset from superconductors-related publications Paper • 2101.02455 • Published Jan 7, 2021 • 2
Automatic extraction of materials and properties from superconductors scientific literature Paper • 2210.15600 • Published Oct 26, 2022 • 2