Think in Strokes, Not Pixels: Process-Driven Image Generation via Interleaved Reasoning Paper • 2604.04746 • Published 7 days ago • 69
Faithful GRPO: Improving Visual Spatial Reasoning in Multimodal Language Models via Constrained Policy Optimization Paper • 2604.08476 • Published 5 days ago • 7
Small Vision-Language Models are Smart Compressors for Long Video Understanding Paper • 2604.08120 • Published 6 days ago • 18
OpenVLThinkerV2: A Generalist Multimodal Reasoning Model for Multi-domain Visual Tasks Paper • 2604.08539 • Published 6 days ago • 46
Calibri: Enhancing Diffusion Transformers via Parameter-Efficient Calibration Paper • 2603.24800 • Published 20 days ago • 67
Sanity Checks for Sparse Autoencoders: Do SAEs Beat Random Baselines? Paper • 2602.14111 • Published Feb 15 • 56
AIRS-Bench: a Suite of Tasks for Frontier AI Research Science Agents Paper • 2602.06855 • Published Feb 6 • 82
On the Entropy Dynamics in Reinforcement Fine-Tuning of Large Language Models Paper • 2602.03392 • Published Feb 3 • 59
F-GRPO: Don't Let Your Policy Learn the Obvious and Forget the Rare Paper • 2602.06717 • Published Feb 6 • 74
Back to Basics: Revisiting Exploration in Reinforcement Learning for LLM Reasoning via Generative Probabilities Paper • 2602.05281 • Published Feb 5 • 14
Green-VLA: Staged Vision-Language-Action Model for Generalist Robots Paper • 2602.00919 • Published Jan 31 • 324
HERBench: A Benchmark for Multi-Evidence Integration in Video Question Answering Paper • 2512.14870 • Published Dec 16, 2025 • 15
T-pro 2.0: An Efficient Russian Hybrid-Reasoning Model and Playground Paper • 2512.10430 • Published Dec 11, 2025 • 119