X-Streamer: Unified Human World Modeling with Audiovisual Interaction Paper • 2509.21574 • Published Sep 25, 2025 • 8
EmbeddingGemma: Powerful and Lightweight Text Representations Paper • 2509.20354 • Published Sep 24, 2025 • 48
Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL Paper • 2505.02391 • Published May 5, 2025 • 25
Self-Generated In-Context Examples Improve LLM Agents for Sequential Decision-Making Tasks Paper • 2505.00234 • Published May 1, 2025 • 26
Beyond Recognition: Evaluating Visual Perspective Taking in Vision Language Models Paper • 2505.03821 • Published May 3, 2025 • 24
COMPACT: COMPositional Atomic-to-Complex Visual Capability Tuning Paper • 2504.21850 • Published Apr 30, 2025 • 27
MMLongBench: Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly Paper • 2505.10610 • Published May 15, 2025 • 55
MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining Paper • 2505.07608 • Published May 12, 2025 • 83
On Path to Multimodal Generalist: General-Level and General-Bench Paper • 2505.04620 • Published May 7, 2025 • 83
Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models Paper • 2505.14810 • Published May 20, 2025 • 62
MoCha: Towards Movie-Grade Talking Character Synthesis Paper • 2503.23307 • Published Mar 30, 2025 • 141
EasyEdit2: An Easy-to-use Steering Framework for Editing Large Language Models Paper • 2504.15133 • Published Apr 21, 2025 • 26