view article Article Preference Tuning LLMs with Direct Preference Optimization Methods +3 Jan 18, 2024 • 82
Reasoning Shift: How Context Silently Shortens LLM Reasoning Paper • 2604.01161 • Published 11 days ago • 29
view article Article ORBA: Orthogonal Reflection Bounded Ablation — A Geometrically Exact Detour in Directional Activation Editing 19 days ago • 5
view article Article Take Control of What Your LLM Knows and Does — with the EasyEdit Tool Series Jul 15, 2025 • 9
view article Article LateOn-Code & ColGrep: LightOn unveils state-of-the-art code retrieval models and code search tooling Feb 12 • 53
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark Paper • 2406.01574 • Published Jun 3, 2024 • 54
view article Article 🏟️ Smol AI WorldCup: A 5-Axis Benchmark That Reveals What Small Language Models Can Really Do Mar 10 • 38
Sanity Checks for Sparse Autoencoders: Do SAEs Beat Random Baselines? Paper • 2602.14111 • Published Feb 15 • 56
Back to Basics: Revisiting Exploration in Reinforcement Learning for LLM Reasoning via Generative Probabilities Paper • 2602.05281 • Published Feb 5 • 14
Multimodal Evaluation of Russian-language Architectures Paper • 2511.15552 • Published Nov 19, 2025 • 79
GigaEvo: An Open Source Optimization Framework Powered By LLMs And Evolution Algorithms Paper • 2511.17592 • Published Nov 17, 2025 • 121
When Models Lie, We Learn: Multilingual Span-Level Hallucination Detection with PsiloQA Paper • 2510.04849 • Published Oct 6, 2025 • 117
TUN3D: Towards Real-World Scene Understanding from Unposed Images Paper • 2509.21388 • Published Sep 23, 2025 • 15