Bugai's Collection
updated
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable
Text-to-Image Reinforcement Learning
Paper
• 2508.20751
• Published • 90
TreePO: Bridging the Gap of Policy Optimization and Efficacy and
Inference Efficiency with Heuristic Tree-based Modeling
Paper
• 2508.17445
• Published • 80
VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D
Space
Paper
• 2508.19247
• Published • 43
VibeVoice Technical Report
Paper
• 2508.19205
• Published • 164
USO: Unified Style and Subject-Driven Generation via Disentangled and
Reward Learning
Paper
• 2508.18966
• Published • 56
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper
• 2509.02547
• Published • 238
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn
Tool-Integrated Reasoning
Paper
• 2509.02479
• Published • 84
LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model
Paper
• 2509.00676
• Published • 85
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use
Paper
• 2509.01055
• Published • 81
Gated Associative Memory: A Parallel O(N) Architecture for Efficient
Sequence Modeling
Paper
• 2509.00605
• Published • 43
Open Data Synthesis For Deep Research
Paper
• 2509.00375
• Published • 72
DeepResearch Arena: The First Exam of LLMs' Research Abilities via
Seminar-Grounded Tasks
Paper
• 2509.01396
• Published • 58
Spatial Forcing: Implicit Spatial Representation Alignment for
Vision-language-action Model
Paper
• 2510.12276
• Published • 149
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Paper
• 2508.03680
• Published • 140
Brain-IT: Image Reconstruction from fMRI via Brain-Interaction
Transformer
Paper
• 2510.25976
• Published • 16
Don't Blind Your VLA: Aligning Visual Representations for OOD
Generalization
Paper
• 2510.25616
• Published • 106
VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual
Representation
Paper
• 2511.02778
• Published • 103
When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for
Visual Chain-of-Thought
Paper
• 2511.02779
• Published • 60
Thinking with Video: Video Generation as a Promising Multimodal
Reasoning Paradigm
Paper
• 2511.04570
• Published • 242
V-Thinker: Interactive Thinking with Images
Paper
• 2511.04460
• Published • 98
Scaling Agent Learning via Experience Synthesis
Paper
• 2511.03773
• Published • 83
The Strong Lottery Ticket Hypothesis for Multi-Head Attention Mechanisms
Paper
• 2511.04217
• Published • 17
HaluMem: Evaluating Hallucinations in Memory Systems of Agents
Paper
• 2511.03506
• Published • 95
IterResearch: Rethinking Long-Horizon Agents via Markovian State
Reconstruction
Paper
• 2511.07327
• Published • 80
SofT-GRPO: Surpassing Discrete-Token LLM Reinforcement Learning via
Gumbel-Reparameterized Soft-Thinking Policy Optimization
Paper
• 2511.06411
• Published • 18