HellaSwag: Can a Machine Really Finish Your Sentence? Paper • 1905.07830 • Published May 19, 2019 • 7
RaBitQ: Quantizing High-Dimensional Vectors with a Theoretical Error Bound for Approximate Nearest Neighbor Search Paper • 2405.12497 • Published May 21, 2024 • 1
Practical and Asymptotically Optimal Quantization of High-Dimensional Vectors in Euclidean Space for Approximate Nearest Neighbor Search Paper • 2409.09913 • Published Sep 16, 2024 • 1
view article Article Welcome Gemma 4: Frontier multimodal intelligence on device +5 11 days ago • 822
A decoder-only foundation model for time-series forecasting Paper • 2310.10688 • Published Oct 14, 2023 • 24
APTx: better activation function than MISH, SWISH, and ReLU's variants used in deep learning Paper • 2209.06119 • Published Sep 10, 2022 • 2
Meta-Harness: End-to-End Optimization of Model Harnesses Paper • 2603.28052 • Published 14 days ago • 16
RULER: What's the Real Context Size of Your Long-Context Language Models? Paper • 2404.06654 • Published Apr 9, 2024 • 40
MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens Paper • 2603.23516 • Published Mar 6 • 47
It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization Paper • 2504.13173 • Published Apr 17, 2025 • 20
MinerU-Diffusion: Rethinking Document OCR as Inverse Rendering via Diffusion Decoding Paper • 2603.22458 • Published 20 days ago • 134