2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining Paper • 2501.00958 • Published Jan 1, 2025 • 110
Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training Paper • 2501.11425 • Published Jan 20, 2025 • 109
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published Jan 22, 2025 • 447
R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model Paper • 2503.05132 • Published Mar 7, 2025 • 57
Unified Reward Model for Multimodal Understanding and Generation Paper • 2503.05236 • Published Mar 7, 2025 • 124
RWKV-7 "Goose" with Expressive Dynamic State Evolution Paper • 2503.14456 • Published Mar 18, 2025 • 154
Feature-Level Insights into Artificial Text Detection with Sparse Autoencoders Paper • 2503.03601 • Published Mar 5, 2025 • 233
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization Paper • 2504.00999 • Published Apr 1, 2025 • 96
OmniSVG: A Unified Scalable Vector Graphics Generation Model Paper • 2504.06263 • Published Apr 8, 2025 • 186
SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published Apr 7, 2025 • 207
Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation Paper • 2504.17207 • Published Apr 24, 2025 • 30
QuaDMix: Quality-Diversity Balanced Data Selection for Efficient LLM Pretraining Paper • 2504.16511 • Published Apr 23, 2025 • 22
Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models Paper • 2504.17789 • Published Apr 24, 2025 • 23