mmBERT: A Modern Multilingual Encoder with Annealed Language Learning Paper • 2509.06888 • Published Sep 8, 2025 • 14
Seq vs Seq: An Open Suite of Paired Encoders and Decoders Paper • 2507.11412 • Published Jul 15, 2025 • 31
Certified Mitigation of Worst-Case LLM Copyright Infringement Paper • 2504.16046 • Published Apr 22, 2025 • 13
Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data Paper • 2404.03862 • Published Apr 5, 2024
AdapterSwap: Continuous Training of LLMs with Data Removal and Access-Control Guarantees Paper • 2404.08417 • Published Apr 12, 2024 • 2
Dated Data: Tracing Knowledge Cutoffs in Large Language Models Paper • 2403.12958 • Published Mar 19, 2024
Fast Benchmarking of Accuracy vs. Training Time with Cyclic Learning Rates Paper • 2206.00832 • Published Jun 2, 2022
Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models Paper • 2405.20541 • Published May 30, 2024 • 24
LIMIT: Less Is More for Instruction Tuning Across Evaluation Paradigms Paper • 2311.13133 • Published Nov 22, 2023
MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining Paper • 2312.17482 • Published Dec 29, 2023 • 1
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset Paper • 2402.10176 • Published Feb 15, 2024 • 38
Efficient and Interpretable Neural Models for Entity Tracking Paper • 2208.14252 • Published Aug 30, 2022