-
GARDO: Reinforcing Diffusion Models without Reward Hacking
Paper • 2512.24138 • Published • 30 -
DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models
Paper • 2512.24165 • Published • 52 -
Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space
Paper • 2512.24617 • Published • 66 -
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss
Paper • 2512.23447 • Published • 99
Collections
Discover the best community collections!
Collections including paper arxiv:2409.12917
-
Self-Reflection in LLM Agents: Effects on Problem-Solving Performance
Paper • 2405.06682 • Published • 3 -
Self-Refine: Iterative Refinement with Self-Feedback
Paper • 2303.17651 • Published • 2 -
Rethinking Chain-of-Thought from the Perspective of Self-Training
Paper • 2412.10827 • Published -
Reflexion: Language Agents with Verbal Reinforcement Learning
Paper • 2303.11366 • Published • 7
-
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 447 -
Training Language Models to Self-Correct via Reinforcement Learning
Paper • 2409.12917 • Published • 140 -
StoryMaker: Towards Holistic Consistent Characters in Text-to-image Generation
Paper • 2409.12576 • Published • 16 -
Transformer Explainer: Interactive Learning of Text-Generative Models
Paper • 2408.04619 • Published • 175
-
Visual-RFT: Visual Reinforcement Fine-Tuning
Paper • 2503.01785 • Published • 86 -
When an LLM is apprehensive about its answers -- and when its uncertainty is justified
Paper • 2503.01688 • Published • 22 -
Predictive Data Selection: The Data That Predicts Is the Data That Teaches
Paper • 2503.00808 • Published • 57 -
Chain of Draft: Thinking Faster by Writing Less
Paper • 2502.18600 • Published • 50
-
Writing in the Margins: Better Inference Pattern for Long Context Retrieval
Paper • 2408.14906 • Published • 144 -
Training Language Models to Self-Correct via Reinforcement Learning
Paper • 2409.12917 • Published • 140 -
Towards a Unified View of Preference Learning for Large Language Models: A Survey
Paper • 2409.02795 • Published • 72 -
Attention Heads of Large Language Models: A Survey
Paper • 2409.03752 • Published • 92
-
Training Language Models to Self-Correct via Reinforcement Learning
Paper • 2409.12917 • Published • 140 -
ImageRAG: Dynamic Image Retrieval for Reference-Guided Image Generation
Paper • 2502.09411 • Published • 22 -
GAIR/lima
Viewer • Updated • 1.33k • 26.2k • 457 -
HuggingFaceM4/FineVision
Viewer • Updated • 24.2M • 114k • 481
-
GARDO: Reinforcing Diffusion Models without Reward Hacking
Paper • 2512.24138 • Published • 30 -
DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models
Paper • 2512.24165 • Published • 52 -
Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space
Paper • 2512.24617 • Published • 66 -
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss
Paper • 2512.23447 • Published • 99
-
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 447 -
Training Language Models to Self-Correct via Reinforcement Learning
Paper • 2409.12917 • Published • 140 -
StoryMaker: Towards Holistic Consistent Characters in Text-to-image Generation
Paper • 2409.12576 • Published • 16 -
Transformer Explainer: Interactive Learning of Text-Generative Models
Paper • 2408.04619 • Published • 175
-
Self-Reflection in LLM Agents: Effects on Problem-Solving Performance
Paper • 2405.06682 • Published • 3 -
Self-Refine: Iterative Refinement with Self-Feedback
Paper • 2303.17651 • Published • 2 -
Rethinking Chain-of-Thought from the Perspective of Self-Training
Paper • 2412.10827 • Published -
Reflexion: Language Agents with Verbal Reinforcement Learning
Paper • 2303.11366 • Published • 7
-
Visual-RFT: Visual Reinforcement Fine-Tuning
Paper • 2503.01785 • Published • 86 -
When an LLM is apprehensive about its answers -- and when its uncertainty is justified
Paper • 2503.01688 • Published • 22 -
Predictive Data Selection: The Data That Predicts Is the Data That Teaches
Paper • 2503.00808 • Published • 57 -
Chain of Draft: Thinking Faster by Writing Less
Paper • 2502.18600 • Published • 50
-
Writing in the Margins: Better Inference Pattern for Long Context Retrieval
Paper • 2408.14906 • Published • 144 -
Training Language Models to Self-Correct via Reinforcement Learning
Paper • 2409.12917 • Published • 140 -
Towards a Unified View of Preference Learning for Large Language Models: A Survey
Paper • 2409.02795 • Published • 72 -
Attention Heads of Large Language Models: A Survey
Paper • 2409.03752 • Published • 92
-
Training Language Models to Self-Correct via Reinforcement Learning
Paper • 2409.12917 • Published • 140 -
ImageRAG: Dynamic Image Retrieval for Reference-Guided Image Generation
Paper • 2502.09411 • Published • 22 -
GAIR/lima
Viewer • Updated • 1.33k • 26.2k • 457 -
HuggingFaceM4/FineVision
Viewer • Updated • 24.2M • 114k • 481