Library
updated
Image-Text-to-Text
• 0.2B • Updated • 421
• 99
Search-R1: Training LLMs to Reason and Leverage Search Engines with
Reinforcement Learning
Paper
• 2503.09516
• Published • 39
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper
• 2505.24863
• Published • 97
QwenLong-L1: Towards Long-Context Large Reasoning Models with
Reinforcement Learning
Paper
• 2505.17667
• Published • 88
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in
Large Language Models
Paper
• 2505.24864
• Published • 146
AReaL: A Large-Scale Asynchronous Reinforcement Learning System for
Language Reasoning
Paper
• 2505.24298
• Published • 34
GraLoRA: Granular Low-Rank Adaptation for Parameter-Efficient
Fine-Tuning
Paper
• 2505.20355
• Published • 36
Interleaved Reasoning for Large Language Models via Reinforcement
Learning
Paper
• 2505.19640
• Published • 15
FullFront: Benchmarking MLLMs Across the Full Front-End Engineering
Workflow
Paper
• 2505.17399
• Published • 14
Enigmata: Scaling Logical Reasoning in Large Language Models with
Synthetic Verifiable Puzzles
Paper
• 2505.19914
• Published • 46
One RL to See Them All: Visual Triple Unified Reinforcement Learning
Paper
• 2505.18129
• Published • 62
Scaling Reasoning, Losing Control: Evaluating Instruction Following in
Large Reasoning Models
Paper
• 2505.14810
• Published • 62
Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement
Learning
Paper
• 2505.16410
• Published • 58
JULI: Jailbreak Large Language Models by Self-Introspection
Paper
• 2505.11790
• Published • 1
Optimizing Anytime Reasoning via Budget Relative Policy Optimization
Paper
• 2505.13438
• Published • 36
Paper
• 2505.14674
• Published • 37
RM-R1: Reward Modeling as Reasoning
Paper
• 2505.02387
• Published • 81
CPGD: Toward Stable Rule-based Reinforcement Learning for Language
Models
Paper
• 2505.12504
• Published • 24
Neuro-Symbolic Query Compiler
Paper
• 2505.11932
• Published • 18
Ψ-Sampler: Initial Particle Sampling for SMC-Based Inference-Time
Reward Alignment in Score Models
Paper
• 2506.01320
• Published • 16
Aligning Latent Spaces with Flow Priors
Paper
• 2506.05240
• Published • 27
Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in
Robotics
Paper
• 2506.00070
• Published • 29
A Controllable Examination for Long-Context Language Models
Paper
• 2506.02921
• Published • 34
MotionSight: Boosting Fine-Grained Motion Understanding in Multimodal
LLMs
Paper
• 2506.01674
• Published • 28
CodeContests+: High-Quality Test Case Generation for Competitive
Programming
Paper
• 2506.05817
• Published • 9
FusionAudio-1.2M: Towards Fine-grained Audio Captioning with Multimodal
Contextual Fusion
Paper
• 2506.01111
• Published • 31
Reinforcement Pre-Training
Paper
• 2506.08007
• Published • 265
GUI-Reflection: Empowering Multimodal GUI Models with Self-Reflection
Behavior
Paper
• 2506.08012
• Published • 7
Dreamland: Controllable World Creation with Simulator and Generative
Models
Paper
• 2506.08006
• Published • 7
Saffron-1: Towards an Inference Scaling Paradigm for LLM Safety
Assurance
Paper
• 2506.06444
• Published • 73
BitVLA: 1-bit Vision-Language-Action Models for Robotics Manipulation
Paper
• 2506.07530
• Published • 20
Solving Inequality Proofs with Large Language Models
Paper
• 2506.07927
• Published • 20
Autoregressive Semantic Visual Reconstruction Helps VLMs Understand
Better
Paper
• 2506.09040
• Published • 34
Through the Valley: Path to Effective Long CoT Training for Small
Language Models
Paper
• 2506.07712
• Published • 18
Multimodal DeepResearcher: Generating Text-Chart Interleaved Reports
From Scratch with Agentic Framework
Paper
• 2506.02454
• Published • 7
Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error
Diagnosis in GUI Automation
Paper
• 2506.04614
• Published • 19
Astra: Toward General-Purpose Mobile Robots via Hierarchical Multimodal
Learning
Paper
• 2506.06205
• Published • 30
Paper
• 2506.10910
• Published • 68
Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just
Like an Olympiad Team
Paper
• 2506.14234
• Published • 41
Treasure Hunt: Real-time Targeting of the Long Tail using Training-Time
Markers
Paper
• 2506.14702
• Published • 3
AR-RAG: Autoregressive Retrieval Augmentation for Image Generation
Paper
• 2506.06962
• Published • 28
DoTA-RAG: Dynamic of Thought Aggregation RAG
Paper
• 2506.12571
• Published • 50
syftr: Pareto-Optimal Generative AI
Paper
• 2505.20266
• Published
Scaling Test-time Compute for LLM Agents
Paper
• 2506.12928
• Published • 63
LoRA-Edit: Controllable First-Frame-Guided Video Editing via Mask-Aware
LoRA Fine-Tuning
Paper
• 2506.10082
• Published • 8
General-Reasoner: Advancing LLM Reasoning Across All Domains
Paper
• 2505.14652
• Published • 24
Optimizing Length Compression in Large Reasoning Models
Paper
• 2506.14755
• Published • 10
UniFork: Exploring Modality Alignment for Unified Multimodal
Understanding and Generation
Paper
• 2506.17202
• Published • 10
ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought
Reasoning in LLMs
Paper
• 2506.18896
• Published • 29
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal
Document Understanding
Paper
• 2506.16035
• Published • 89
Robust Reward Modeling via Causal Rubrics
Paper
• 2506.16507
• Published • 9
LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with
TriMap Video Diffusion
Paper
• 2507.02813
• Published • 60
FreeMorph: Tuning-Free Generalized Image Morphing with Diffusion Model
Paper
• 2507.01953
• Published • 18
Evolving Prompts In-Context: An Open-ended, Self-replicating Perspective
Paper
• 2506.17930
• Published • 19
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via
Multi-Agent Multi-Turn Reinforcement Learning
Paper
• 2506.24119
• Published • 51
katanemo/Arch-Router-1.5B
Text Generation
• 2B • Updated • 5.07k
• • 249
Disambiguation-Centric Finetuning Makes Enterprise Tool-Calling LLMs
More Realistic and Less Risky
Paper
• 2507.03336
• Published • 7
SingLoRA: Low Rank Adaptation Using a Single Matrix
Paper
• 2507.05566
• Published • 116
CriticLean: Critic-Guided Reinforcement Learning for Mathematical
Formalization
Paper
• 2507.06181
• Published • 45
AutoTriton: Automatic Triton Programming with Reinforcement Learning in
LLMs
Paper
• 2507.05687
• Published • 31
Coding Triangle: How Does Large Language Model Understand Code?
Paper
• 2507.06138
• Published • 22
High-Resolution Visual Reasoning via Multi-Turn Grounding-Based
Reinforcement Learning
Paper
• 2507.05920
• Published • 12
RefineX: Learning to Refine Pre-training Data at Scale from
Expert-Guided Programs
Paper
• 2507.03253
• Published • 19
Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs
Paper
• 2507.07996
• Published • 36
Lumos-1: On Autoregressive Video Generation from a Unified Model
Perspective
Paper
• 2507.08801
• Published • 32
A Survey of Context Engineering for Large Language Models
Paper
• 2507.13334
• Published • 263
WebShaper: Agentically Data Synthesizing via Information-Seeking
Formalization
Paper
• 2507.15061
• Published • 60
AnyCap Project: A Unified Framework, Dataset, and Benchmark for
Controllable Omni-modal Captioning
Paper
• 2507.12841
• Published • 42
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning
Paper
• 2507.16784
• Published • 123
MUR: Momentum Uncertainty guided Reasoning for Large Language Models
Paper
• 2507.14958
• Published • 47
Does More Inference-Time Compute Really Help Robustness?
Paper
• 2507.15974
• Published • 7
RefCritic: Training Long Chain-of-Thought Critic Models with Refinement
Feedback
Paper
• 2507.15024
• Published • 14
ObjectGS: Object-aware Scene Reconstruction and Scene Understanding via
Gaussian Splatting
Paper
• 2507.15454
• Published • 7
Promptomatix: An Automatic Prompt Optimization Framework for Large
Language Models
Paper
• 2507.14241
• Published • 18
TTS-VAR: A Test-Time Scaling Framework for Visual Auto-Regressive
Generation
Paper
• 2507.18537
• Published • 18
Being-H0: Vision-Language-Action Pretraining from Large-Scale Human
Videos
Paper
• 2507.15597
• Published • 34
A Simple "Try Again" Can Elicit Multi-Turn LLM Reasoning
Paper
• 2507.14295
• Published • 14
SeC: Advancing Complex Video Object Segmentation via Progressive Concept
Construction
Paper
• 2507.15852
• Published • 38
FLEXITOKENS: Flexible Tokenization for Evolving Language Models
Paper
• 2507.12720
• Published • 10
RiemannLoRA: A Unified Riemannian Framework for Ambiguity-Free LoRA
Optimization
Paper
• 2507.12142
• Published • 36
Replacing thinking with tool usage enables reasoning in small language
models
Paper
• 2507.05065
• Published • 16
Lizard: An Efficient Linearization Framework for Large Language Models
Paper
• 2507.09025
• Published • 19
MemOS: A Memory OS for AI System
Paper
• 2507.03724
• Published • 166
Agentic Reinforced Policy Optimization
Paper
• 2507.19849
• Published • 161
Deep Researcher with Test-Time Diffusion
Paper
• 2507.16075
• Published • 68
SmallThinker: A Family of Efficient Large Language Models Natively
Trained for Local Deployment
Paper
• 2507.20984
• Published • 58
MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI
Agents
Paper
• 2507.19478
• Published • 33
Geometric-Mean Policy Optimization
Paper
• 2507.20673
• Published • 32
UloRL:An Ultra-Long Output Reinforcement Learning Approach for Advancing
Large Language Models' Reasoning Abilities
Paper
• 2507.19766
• Published • 15
VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced
Multimodal Reasoning
Paper
• 2507.22607
• Published • 47
Beyond Fixed: Variable-Length Denoising for Diffusion Large Language
Models
Paper
• 2508.00819
• Published • 63
Beyond the Trade-off: Self-Supervised Reinforcement Learning for
Reasoning Models' Instruction Following
Paper
• 2508.02150
• Published • 37
Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent
Foundation Models Training
Paper
• 2508.00414
• Published • 94
On the Expressiveness of Softmax Attention: A Recurrent Neural Network
Perspective
Paper
• 2507.23632
• Published • 6
Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving
Paper
• 2507.23726
• Published • 115
SitEmb-v1.5: Improved Context-Aware Dense Retrieval for Semantic
Association and Long Story Comprehension
Paper
• 2508.01959
• Published • 60
Tool-integrated Reinforcement Learning for Repo Deep Search
Paper
• 2508.03012
• Published • 20
InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy
Optimization
Paper
• 2508.05731
• Published • 27
MeshLLM: Empowering Large Language Models to Progressively Understand
and Generate 3D Mesh
Paper
• 2508.01242
• Published • 11
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning
Paper
• 2508.08221
• Published • 50
Reinforcement Learning in Vision: A Survey
Paper
• 2508.08189
• Published • 30
Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with
Patch-level CLIP Latents
Paper
• 2508.05954
• Published • 6
Feedback-Driven Tool-Use Improvements in Large Language Models via
Automated Build Environments
Paper
• 2508.08791
• Published • 16
Training Long-Context, Multi-Turn Software Engineering Agents with
Reinforcement Learning
Paper
• 2508.03501
• Published • 59
Complex Logical Instruction Generation
Paper
• 2508.09125
• Published • 40
Mol-R1: Towards Explicit Long-CoT Reasoning in Molecule Discovery
Paper
• 2508.08401
• Published • 42
Diffusion LLMs Can Do Faster-Than-AR Inference via Discrete Diffusion
Forcing
Paper
• 2508.09192
• Published • 30
Inverse-LLaVA: Eliminating Alignment Pre-training Through Text-to-Vision
Mapping
Paper
• 2508.12466
• Published • 8
Has GPT-5 Achieved Spatial Intelligence? An Empirical Study
Paper
• 2508.13142
• Published • 34
VertexRegen: Mesh Generation with Continuous Level of Detail
Paper
• 2508.09062
• Published • 38
XQuant: Breaking the Memory Wall for LLM Inference with KV Cache
Rematerialization
Paper
• 2508.10395
• Published • 42
STream3R: Scalable Sequential 3D Reconstruction with Causal Transformer
Paper
• 2508.10893
• Published • 31
FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction
Paper
• 2508.11987
• Published • 72
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents
Paper
• 2508.13186
• Published • 20
Pass@k Training for Adaptively Balancing Exploration and Exploitation of
Large Reasoning Models
Paper
• 2508.10751
• Published • 29
UI-Venus Technical Report: Building High-performance UI Agents with RFT
Paper
• 2508.10833
• Published • 45
Noise Hypernetworks: Amortizing Test-Time Compute in Diffusion Models
Paper
• 2508.09968
• Published • 15
CRINN: Contrastive Reinforcement Learning for Approximate Nearest
Neighbor Search
Paper
• 2508.02091
• Published • 13
LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on
Challenging Queries
Paper
• 2508.15760
• Published • 47
Deep Think with Confidence
Paper
• 2508.15260
• Published • 90
DuPO: Enabling Reliable LLM Self-Verification via Dual Preference
Optimization
Paper
• 2508.14460
• Published • 85
Quantization Meets dLLMs: A Systematic Study of Post-training
Quantization for Diffusion LLMs
Paper
• 2508.14896
• Published • 22
PosterGen: Aesthetic-Aware Paper-to-Poster Generation via Multi-Agent
LLMs
Paper
• 2508.17188
• Published • 17
Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement
Learning for General LLM Reasoning
Paper
• 2508.16949
• Published • 24
Visual-CoG: Stage-Aware Reinforcement Learning with Chain of Guidance
for Text-to-Image Generation
Paper
• 2508.18032
• Published • 41
Beyond Memorization: Extending Reasoning Depth with Recurrence, Memory
and Test-Time Compute Scaling
Paper
• 2508.16745
• Published • 29
UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior
Long-Context Learning
Paper
• 2508.18756
• Published • 36
Do What? Teaching Vision-Language-Action Models to Reject the Impossible
Paper
• 2508.16292
• Published • 9
MeshSplat: Generalizable Sparse-View Surface Reconstruction via Gaussian
Splatting
Paper
• 2508.17811
• Published • 7
FastMesh:Efficient Artistic Mesh Generation via Component Decoupling
Paper
• 2508.19188
• Published • 17
Spacer: Towards Engineered Scientific Inspiration
Paper
• 2508.17661
• Published • 32
ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large
Language Models
Paper
• 2508.18773
• Published • 16
VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D
Space
Paper
• 2508.19247
• Published • 43
TreePO: Bridging the Gap of Policy Optimization and Efficacy and
Inference Efficiency with Heuristic Tree-based Modeling
Paper
• 2508.17445
• Published • 80
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable
Text-to-Image Reinforcement Learning
Paper
• 2508.20751
• Published • 90
Provable Benefits of In-Tool Learning for Large Language Models
Paper
• 2508.20755
• Published • 11
Think in Games: Learning to Reason in Games via Reinforcement Learning
with Large Language Models
Paper
• 2508.21365
• Published • 29
Efficient Code Embeddings from Code Generation Models
Paper
• 2508.21290
• Published • 20
CLIPSym: Delving into Symmetry Detection with CLIP
Paper
• 2508.14197
• Published • 8
Implicit Actor Critic Coupling via a Supervised Learning Framework for
RLVR
Paper
• 2509.02522
• Published • 25
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn
Tool-Integrated Reasoning
Paper
• 2509.02479
• Published • 84
Universal Deep Research: Bring Your Own Model and Strategy
Paper
• 2509.00244
• Published • 14
LMEnt: A Suite for Analyzing Knowledge in Language Models from
Pretraining Data to Representations
Paper
• 2509.03405
• Published • 24
Open Data Synthesis For Deep Research
Paper
• 2509.00375
• Published • 72
Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow
Real Instructions?
Paper
• 2509.04292
• Published • 58
Towards a Unified View of Large Language Model Post-Training
Paper
• 2509.04419
• Published • 76
How Can Input Reformulation Improve Tool Usage Accuracy in a Complex
Dynamic Environment? A Study on τ-bench
Paper
• 2508.20931
• Published • 16
Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers
Paper
• 2509.03059
• Published • 25
NER Retriever: Zero-Shot Named Entity Retrieval with Type-Aware
Embeddings
Paper
• 2509.04011
• Published • 29
Symbolic Graphics Programming with Large Language Models
Paper
• 2509.05208
• Published • 47
Bootstrapping Task Spaces for Self-Improvement
Paper
• 2509.04575
• Published • 6
Behavioral Fingerprinting of Large Language Models
Paper
• 2509.04504
• Published • 6
Scaling up Multi-Turn Off-Policy RL and Multi-Agent Tree Search for LLM
Step-Provers
Paper
• 2509.06493
• Published • 12
Reinforcement Learning Foundations for Deep Research Systems: A Survey
Paper
• 2509.06733
• Published • 32
Reconstruction Alignment Improves Unified Multimodal Models
Paper
• 2509.07295
• Published • 40
Visual Representation Alignment for Multimodal Large Language Models
Paper
• 2509.07979
• Published • 84
Revolutionizing Reinforcement Learning Framework for Diffusion Large
Language Models
Paper
• 2509.06949
• Published • 57
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning
Paper
• 2509.07980
• Published • 105
Towards General Agentic Intelligence via Environment Scaling
Paper
• 2509.13311
• Published • 72
WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic
Data and Scalable Reinforcement Learning
Paper
• 2509.13305
• Published • 91
SearchInstruct: Enhancing Domain Adaptation via Retrieval-Based
Instruction Dataset Creation
Paper
• 2509.10708
• Published • 18
HANRAG: Heuristic Accurate Noise-resistant Retrieval-Augmented
Generation for Multi-hop Question Answering
Paper
• 2509.09713
• Published • 25
FlowRL: Matching Reward Distributions for LLM Reasoning
Paper
• 2509.15207
• Published • 118
Single-stream Policy Optimization
Paper
• 2509.13232
• Published • 36
World Modeling with Probabilistic Structure Integration
Paper
• 2509.09737
• Published • 14
Scrub It Out! Erasing Sensitive Memorization in Code Language Models via
Machine Unlearning
Paper
• 2509.13755
• Published • 19
C2-Evo: Co-Evolving Multimodal Data and Model for Self-Improving
Reasoning
Paper
• 2507.16518
• Published • 2
WorldForge: Unlocking Emergent 3D/4D Generation in Video Diffusion Model
via Training-Free Guidance
Paper
• 2509.15130
• Published • 31
Evolving Language Models without Labels: Majority Drives Selection,
Novelty Promotes Variation
Paper
• 2509.15194
• Published • 33
THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical
Reasoning
Paper
• 2509.13761
• Published • 16
A Vision-Language-Action-Critic Model for Robotic Real-World
Reinforcement Learning
Paper
• 2509.15937
• Published • 20
BaseReward: A Strong Baseline for Multimodal Reward Model
Paper
• 2509.16127
• Published • 21
MultiEdit: Advancing Instruction-based Image Editing on Diverse and
Challenging Tasks
Paper
• 2509.14638
• Published • 14
Video2Roleplay: A Multimodal Dataset and Framework for Video-Guided
Role-playing Agents
Paper
• 2509.15233
• Published • 2
MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid
Vision Tokenizer
Paper
• 2509.16197
• Published • 58
Latent Zoning Network: A Unified Principle for Generative Modeling,
Representation Learning, and Classification
Paper
• 2509.15591
• Published • 45
BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent
Paper
• 2509.15566
• Published • 14
Paper
• 2509.17336
• Published • 10
Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary
Feedback
Paper
• 2501.10799
• Published • 15
Table as Thought: Exploring Structured Thoughts in LLM Reasoning
Paper
• 2501.02152
• Published
Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning
Paper
• 2412.09078
• Published
TinyThinker: Distilling Reasoning through Coarse-to-Fine Knowledge
Internalization with Self-Reflection
Paper
• 2412.08024
• Published • 1
LLM2: Let Large Language Models Harness System 2 Reasoning
Paper
• 2412.20372
• Published
Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large
Language Models via a Multi-Paradigm Perspective
Paper
• 2501.11110
• Published • 4
Ensembling Large Language Models with Process Reward-Guided Tree Search
for Better Complex Reasoning
Paper
• 2412.15797
• Published • 18
RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented
Verification and Refinement
Paper
• 2412.12881
• Published • 2
OmniInsert: Mask-Free Video Insertion of Any Reference via Diffusion
Transformer Models
Paper
• 2509.17627
• Published • 66
Hyper-Bagel: A Unified Acceleration Framework for Multimodal
Understanding and Generation
Paper
• 2509.18824
• Published • 23
Understanding the Thinking Process of Reasoning Models: A Perspective
from Schoenfeld's Episode Theory
Paper
• 2509.14662
• Published • 13
VCRL: Variance-based Curriculum Reinforcement Learning for Large
Language Models
Paper
• 2509.19803
• Published • 122
Tree Search for LLM Agent Reinforcement Learning
Paper
• 2509.21240
• Published • 92
SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable
Sparse-Linear Attention
Paper
• 2509.24006
• Published • 119
Fine-tuning Done Right in Model Editing
Paper
• 2509.22072
• Published • 28
No Prompt Left Behind: Exploiting Zero-Variance Prompts in LLM
Reinforcement Learning via Entropy-Guided Advantage Shaping
Paper
• 2509.21880
• Published • 53
LucidFlux: Caption-Free Universal Image Restoration via a Large-Scale
Diffusion Transformer
Paper
• 2509.22414
• Published • 22
Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive
Exploration for Agentic Reinforcement Learning
Paper
• 2509.22601
• Published • 30
EPO: Entropy-regularized Policy Optimization for LLM Agents
Reinforcement Learning
Paper
• 2509.22576
• Published • 137
Variational Reasoning for Language Models
Paper
• 2509.22637
• Published • 69
AutoIntent: AutoML for Text Classification
Paper
• 2509.21138
• Published • 37
TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning
Paper
• 2509.25760
• Published • 55
Attention as a Compass: Efficient Exploration for Process-Supervised RL
in Reasoning Models
Paper
• 2509.26628
• Published • 17
Sequential Diffusion Language Models
Paper
• 2509.24007
• Published • 47
ReviewScore: Misinformed Peer Review Detection with Large Language
Models
Paper
• 2509.21679
• Published • 64
ReviewRL: Towards Automated Scientific Review with RL
Paper
• 2508.10308
• Published • 1
ReportBench: Evaluating Deep Research Agents via Academic Survey Tasks
Paper
• 2508.15804
• Published • 15
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with
Verifiable Rewards via Monte Carlo Tree Search
Paper
• 2509.25454
• Published • 148
Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget
Allocation
Paper
• 2509.25849
• Published • 48
BroRL: Scaling Reinforcement Learning via Broadened Exploration
Paper
• 2510.01180
• Published • 20
GEM: A Gym for Agentic LLMs
Paper
• 2510.01051
• Published • 91
Interactive Training: Feedback-Driven Neural Network Optimization
Paper
• 2510.02297
• Published • 43
More Thought, Less Accuracy? On the Dual Nature of Reasoning in
Vision-Language Models
Paper
• 2509.25848
• Published • 81
CLUE: Non-parametric Verification from Experience via Hidden-State
Clustering
Paper
• 2510.01591
• Published • 28
LongCodeZip: Compress Long Context for Code Language Models
Paper
• 2510.00446
• Published • 108
Efficient Multi-modal Large Language Models via Progressive Consistency
Distillation
Paper
• 2510.00515
• Published • 42
Reactive Transformer (RxT) -- Stateful Real-Time Processing for
Event-Driven Reactive Language Models
Paper
• 2510.03561
• Published • 25
Large Language Models as Optimizers
Paper
• 2309.03409
• Published • 79
Connecting Large Language Models with Evolutionary Algorithms Yields
Powerful Prompt Optimizers
Paper
• 2309.08532
• Published • 53
PanGu-Coder2: Boosting Large Language Models for Code with Ranking
Feedback
Paper
• 2307.14936
• Published • 42
Factuality Matters: When Image Generation and Editing Meet Structured
Visuals
Paper
• 2510.05091
• Published • 20
Reinforce-Ada: An Adaptive Sampling Framework for Reinforce-Style LLM
Training
Paper
• 2510.04996
• Published • 16
Paper2Video: Automatic Video Generation from Scientific Papers
Paper
• 2510.05096
• Published • 120
SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior
Reasoning LLMs
Paper
• 2510.05069
• Published • 13
MITS: Enhanced Tree Search Reasoning for LLMs via Pointwise Mutual
Information
Paper
• 2510.03632
• Published • 42
Large Reasoning Models Learn Better Alignment from Flawed Thinking
Paper
• 2510.00938
• Published • 60
Less is More: Recursive Reasoning with Tiny Networks
Paper
• 2510.04871
• Published • 513
Multi-Agent Tool-Integrated Policy Optimization
Paper
• 2510.04678
• Published • 31
Agent Learning via Early Experience
Paper
• 2510.08558
• Published • 276
Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large
Multimodal Models
Paper
• 2510.05034
• Published • 51
Low-probability Tokens Sustain Exploration in Reinforcement Learning
with Verifiable Reward
Paper
• 2510.03222
• Published • 76
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning
for LLMs
Paper
• 2510.11696
• Published • 182
PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs
Paper
• 2510.09507
• Published • 11
Agentic Context Engineering: Evolving Contexts for Self-Improving
Language Models
Paper
• 2510.04618
• Published • 131
Better Together: Leveraging Unpaired Multimodal Data for Stronger
Unimodal Models
Paper
• 2510.08492
• Published • 10
Dyna-Mind: Learning to Simulate from Experience for Better AI Agents
Paper
• 2510.09577
• Published • 8
BigCodeArena: Unveiling More Reliable Human Preferences in Code
Generation via Execution
Paper
• 2510.08697
• Published • 39
Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for
MLLMs
Paper
• 2510.09201
• Published • 50
Diffusion Transformers with Representation Autoencoders
Paper
• 2510.11690
• Published • 170
UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning
Paper
• 2510.13515
• Published • 12
Advancing End-to-End Pixel Space Generative Modeling via Self-supervised
Pre-training
Paper
• 2510.12586
• Published • 115
Understanding DeepResearch via Reports
Paper
• 2510.07861
• Published • 7
RAG-Anything: All-in-One RAG Framework
Paper
• 2510.12323
• Published • 73
The Art of Scaling Reinforcement Learning Compute for LLMs
Paper
• 2510.13786
• Published • 33
Glyph: Scaling Context Windows via Visual-Text Compression
Paper
• 2510.17800
• Published • 69
LoongRL:Reinforcement Learning for Advanced Reasoning over Long Contexts
Paper
• 2510.19363
• Published • 63
Unified Reinforcement and Imitation Learning for Vision-Language Models
Paper
• 2510.19307
• Published • 32
Attention Is All You Need for KV Cache in Diffusion LLMs
Paper
• 2510.14973
• Published • 42
Information Gain-based Policy Optimization: A Simple and Effective
Approach for Multi-Turn LLM Agents
Paper
• 2510.14967
• Published • 34
Video Reasoning without Training
Paper
• 2510.17045
• Published • 8
AdaSPEC: Selective Knowledge Distillation for Efficient Speculative
Decoders
Paper
• 2510.19779
• Published • 62
Loopholing Discrete Diffusion: Deterministic Bypass of the Sampling Wall
Paper
• 2510.19304
• Published • 24
Every Question Has Its Own Value: Reinforcement Learning with Explicit
Human Values
Paper
• 2510.20187
• Published • 19
ReCode: Unify Plan and Action for Universal Granularity Control
Paper
• 2510.23564
• Published • 123
Reasoning with Sampling: Your Base Model is Smarter Than You Think
Paper
• 2510.14901
• Published • 48
Video-Thinker: Sparking "Thinking with Videos" via Reinforcement
Learning
Paper
• 2510.23473
• Published • 86
World Simulation with Video Foundation Models for Physical AI
Paper
• 2511.00062
• Published • 46
OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid
Validation in Realistic Workflows
Paper
• 2510.24411
• Published • 72
The End of Manual Decoding: Towards Truly End-to-End Language Models
Paper
• 2510.26697
• Published • 119
The Strong Lottery Ticket Hypothesis for Multi-Head Attention Mechanisms
Paper
• 2511.04217
• Published • 17
Diffusion Language Models are Super Data Learners
Paper
• 2511.03276
• Published • 132
Scaling Latent Reasoning via Looped Language Models
Paper
• 2510.25741
• Published • 229
DRIVE: Data Curation Best Practices for Reinforcement Learning with
Verifiable Reward in Competitive Code Generation
Paper
• 2511.06307
• Published • 53
Black-Box On-Policy Distillation of Large Language Models
Paper
• 2511.10643
• Published • 52
DoPE: Denoising Rotary Position Embedding
Paper
• 2511.09146
• Published • 98
Orion: A Unified Visual Agent for Multimodal Perception, Advanced Visual Reasoning and Execution
Paper
• 2511.14210
• Published • 21
SRPO: Self-Referential Policy Optimization for Vision-Language-Action Models
Paper
• 2511.15605
• Published • 25
Nemotron Elastic: Towards Efficient Many-in-One Reasoning LLMs
Paper
• 2511.16664
• Published • 29
TiDAR: Think in Diffusion, Talk in Autoregression
Paper
• 2511.08923
• Published • 128
MathSE: Improving Multimodal Mathematical Reasoning via Self-Evolving Iterative Reflection and Reward-Guided Fine-Tuning
Paper
• 2511.06805
• Published • 13
The Path Not Taken: RLVR Provably Learns Off the Principals
Paper
• 2511.08567
• Published • 35
Supervised Reinforcement Learning: From Expert Trajectories to Step-wise
Reasoning
Paper
• 2510.25992
• Published • 48
FARMER: Flow AutoRegressive Transformer over Pixels
Paper
• 2510.23588
• Published • 59
Parallel Loop Transformer for Efficient Test-Time Computation Scaling
Paper
• 2510.24824
• Published • 17
LLM-guided Hierarchical Retrieval
Paper
• 2510.13217
• Published • 21
DLER: Doing Length pEnalty Right - Incentivizing More Intelligence per
Token via Reinforcement Learning
Paper
• 2510.15110
• Published • 18
Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal
Evidence
Paper
• 2510.20579
• Published • 56
GigaEvo: An Open Source Optimization Framework Powered By LLMs And Evolution Algorithms
Paper
• 2511.17592
• Published • 121
Paper
• 2511.11238
• Published • 39
Flow Map Distillation Without Data
Paper
• 2511.19428
• Published • 6
Monet: Reasoning in Latent Visual Space Beyond Images and Language
Paper
• 2511.21395
• Published • 19
Scaling Agentic Reinforcement Learning for Tool-Integrated Reasoning in VLMs
Paper
• 2511.19773
• Published • 10
SSA: Sparse Sparse Attention by Aligning Full and Sparse Attention Outputs in Feature Space
Paper
• 2511.20102
• Published • 28
Architecture Decoupling Is Not All You Need For Unified Multimodal Model
Paper
• 2511.22663
• Published • 29
SpeContext: Enabling Efficient Long-context Reasoning with Speculative Context Sparsity in LLMs
Paper
• 2512.00722
• Published • 16
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices
Paper
• 2512.01374
• Published • 106
TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models
Paper
• 2512.02014
• Published • 74
OneThinker: All-in-one Reasoning Model for Image and Video
Paper
• 2512.03043
• Published • 34
Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning
Paper
• 2512.05591
• Published • 17
TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows
Paper
• 2512.05150
• Published • 76
UltraImage: Rethinking Resolution Extrapolation in Image Diffusion Transformers
Paper
• 2512.04504
• Published • 18
On GRPO Collapse in Search-R1: The Lazy Likelihood-Displacement Death Spiral
Paper
• 2512.04220
• Published • 16
DynamicVerse: A Physically-Aware Multimodal Framework for 4D World Modeling
Paper
• 2512.03000
• Published • 37
PromptBridge: Cross-Model Prompt Transfer for Large Language Models
Paper
• 2512.01420
• Published • 11
PretrainZero: Reinforcement Active Pretraining
Paper
• 2512.03442
• Published • 49
SR-GRPO: Stable Rank as an Intrinsic Geometric Reward for Large Language Model Alignment
Paper
• 2512.02807
• Published • 9
Semantics Lead the Way: Harmonizing Semantic and Texture Modeling with Asynchronous Latent Diffusion
Paper
• 2512.04926
• Published • 42
Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning
Paper
• 2512.07461
• Published • 79
Distribution Matching Variational AutoEncoder
Paper
• 2512.07778
• Published • 29
TreeGRPO: Tree-Advantage GRPO for Online RL Post-Training of Diffusion Models
Paper
• 2512.08153
• Published • 8
InfiniteVL: Synergizing Linear and Sparse Attention for Highly-Efficient, Unlimited-Input Vision-Language Models
Paper
• 2512.08829
• Published • 21
Self-Improving VLM Judges Without Human Annotations
Paper
• 2512.05145
• Published • 20
Rethinking Training Dynamics in Scale-wise Autoregressive Generation
Paper
• 2512.06421
• Published • 7
OneStory: Coherent Multi-Shot Video Generation with Adaptive Memory
Paper
• 2512.07802
• Published • 46
unsloth/Devstral-2-123B-Instruct-2512-GGUF
125B • Updated • 6.05k
• 51
Achieving Olympia-Level Geometry Large Language Model Agent via Complexity Boosting Reinforcement Learning
Paper
• 2512.10534
• Published • 32
BEAVER: An Efficient Deterministic LLM Verifier
Paper
• 2512.05439
• Published • 36
Vector Quantization using Gaussian Variational Autoencoder
Paper
• 2512.06609
• Published • 1
Beyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMs
Paper
• 2512.07525
• Published • 60
VQRAE: Representation Quantization Autoencoders for Multimodal Understanding, Generation and Reconstruction
Paper
• 2511.23386
• Published • 16
Long-horizon Reasoning Agent for Olympiad-Level Mathematical Problem Solving
Paper
• 2512.10739
• Published • 47
OmniPSD: Layered PSD Generation with Diffusion Transformer
Paper
• 2512.09247
• Published • 50
ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding
Paper
• 2512.13586
• Published • 93
KlingAvatar 2.0 Technical Report
Paper
• 2512.13313
• Published • 44
Efficient-DLM: From Autoregressive to Diffusion Language Models, and Beyond in Speed
Paper
• 2512.14067
• Published • 16
Towards Scalable Pre-training of Visual Tokenizers for Generation
Paper
• 2512.13687
• Published • 106
HyperVL: An Efficient and Dynamic Multimodal Large Language Model for Edge Devices
Paper
• 2512.14052
• Published • 42
Universal Reasoning Model
Paper
• 2512.14693
• Published • 44
Image Diffusion Preview with Consistency Solver
Paper
• 2512.13592
• Published • 8
End-to-End Training for Autoregressive Video Diffusion via Self-Resampling
Paper
• 2512.15702
• Published • 16
STeCa: Step-level Trajectory Calibration for LLM Agent Learning
Paper
• 2502.14276
• Published • 1
Step-GUI Technical Report
Paper
• 2512.15431
• Published • 133
Differences That Matter: Auditing Models for Capability Gap Discovery and Rectification
Paper
• 2512.16921
• Published • 8
Alchemist: Unlocking Efficiency in Text-to-Image Model Training via Meta-Gradient Data Selection
Paper
• 2512.16905
• Published • 32
DiffusionBrowser: Interactive Diffusion Previews via Multi-Branch Decoders
Paper
• 2512.13690
• Published • 3
Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models
Paper
• 2512.13607
• Published • 38
Paper
• 2512.16301
• Published • 108
QwenLong-L1.5: Post-Training Recipe for Long-Context Reasoning and Memory Management
Paper
• 2512.12967
• Published • 111
CoSPlan: Corrective Sequential Planning via Scene Graph Incremental Updates
Paper
• 2512.10342
• Published • 1
UAGLNet: Uncertainty-Aggregated Global-Local Fusion Network with Cooperative CNN-Transformer for Building Extraction
Paper
• 2512.12941
• Published • 2
TraPO: A Semi-Supervised Reinforcement Learning Framework for Boosting LLM Reasoning
Paper
• 2512.13106
• Published • 4
Comparative Analysis of LLM Abliteration Methods: A Cross-Architecture Evaluation
Paper
• 2512.13655
• Published • 3
Janus: Disaggregating Attention and Experts for Scalable MoE Inference
Paper
• 2512.13525
• Published • 6
RePo: Language Models with Context Re-Positioning
Paper
• 2512.14391
• Published • 12
VersatileFFN: Achieving Parameter Efficiency in LLMs via Adaptive Wide-and-Deep Reuse
Paper
• 2512.14531
• Published • 15
ShowTable: Unlocking Creative Table Visualization with Collaborative Reflection and Refinement
Paper
• 2512.13303
• Published • 17
Differentiable Evolutionary Reinforcement Learning
Paper
• 2512.13399
• Published • 22
MemFlow: Flowing Adaptive Memory for Consistent and Efficient Long Video Narratives
Paper
• 2512.14699
• Published • 28
RoboTracer: Mastering Spatial Trace with Reasoning in Vision-Language Models for Robotics
Paper
• 2512.13660
• Published • 37
MMGR: Multi-Modal Generative Reasoning
Paper
• 2512.14691
• Published • 121
Hybrid Attribution Priors for Explainable and Robust Model Training
Paper
• 2512.14719
• Published • 3
WAY: Estimation of Vessel Destination in Worldwide AIS Trajectory
Paper
• 2512.13190
• Published • 8
The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding
Paper
• 2512.19693
• Published • 67
Reinforcement Learning for Self-Improving Agent with Skill Library
Paper
• 2512.17102
• Published • 42
Updated • 80
Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies
Paper
• 2512.19673
• Published • 66
QuCo-RAG: Quantifying Uncertainty from the Pre-training Corpus for Dynamic Retrieval-Augmented Generation
Paper
• 2512.19134
• Published • 32
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI
Paper
• 2512.16676
• Published • 222
Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows
Paper
• 2512.16969
• Published • 120
LongVideoAgent: Multi-Agent Reasoning with Long Videos
Paper
• 2512.20618
• Published • 56
Reasoning Palette: Modulating Reasoning via Latent Contextualization for Controllable Exploration for (V)LMs
Paper
• 2512.17206
• Published • 20
Turn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMs
Paper
• 2512.17008
• Published • 11
InfiMed-ORBIT: Aligning LLMs on Open-Ended Complex Tasks via
Rubric-Based Incremental Training
Paper
• 2510.15859
• Published • 13
Fast and Accurate Causal Parallel Decoding using Jacobi Forcing
Paper
• 2512.14681
• Published • 42
Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers
Paper
• 2512.17351
• Published • 28
Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning
Paper
• 2512.15687
• Published • 22
SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning
Paper
• 2512.13874
• Published • 17
SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations
Paper
• 2512.14080
• Published • 9
Understanding and Improving Hyperbolic Deep Reinforcement Learning
Paper
• 2512.14202
• Published • 6
SCOPE: Prompt Evolution for Enhancing Agent Effectiveness
Paper
• 2512.15374
• Published • 6
VOYAGER: A Training Free Approach for Generating Diverse Datasets using LLMs
Paper
• 2512.12072
• Published • 18
DEER: Draft with Diffusion, Verify with Autoregressive Models
Paper
• 2512.15176
• Published • 45
TabReX : Tabular Referenceless eXplainable Evaluation
Paper
• 2512.15907
• Published • 2
Trainable Log-linear Sparse Attention for Efficient Diffusion Transformers
Paper
• 2512.16615
• Published • 5
AdaTooler-V: Adaptive Tool-Use for Images and Videos
Paper
• 2512.16918
• Published • 14
REGLUE Your Latents with Global and Local Semantics for Entangled Diffusion
Paper
• 2512.16636
• Published • 26
Kling-Omni Technical Report
Paper
• 2512.16776
• Published • 173
Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning
Paper
• 2512.20605
• Published • 62
Multi-hop Reasoning via Early Knowledge Alignment
Paper
• 2512.20144
• Published • 7
Schoenfeld's Anatomy of Mathematical Reasoning by Language Models
Paper
• 2512.19995
• Published • 16
TimeBill: Time-Budgeted Inference for Large Language Models
Paper
• 2512.21859
• Published • 25
Dream-VL & Dream-VLA: Open Vision-Language and Vision-Language-Action Models with Diffusion Language Model Backbone
Paper
• 2512.22615
• Published • 50
Training AI Co-Scientists Using Rubric Rewards
Paper
• 2512.23707
• Published • 21
Masking Teacher and Reinforcing Student for Distilling Vision-Language Models
Paper
• 2512.22238
• Published • 30
LLM Swiss Round: Aggregating Multi-Benchmark Performance via Competitive Swiss-System Dynamics
Paper
• 2512.21010
• Published • 4
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss
Paper
• 2512.23447
• Published • 99
mHC: Manifold-Constrained Hyper-Connections
Paper
• 2512.24880
• Published • 321
Evaluating Parameter Efficient Methods for RLVR
Paper
• 2512.23165
• Published • 28
SkyRL-Agent: Efficient RL Training for Multi-turn LLM Agent
Paper
• 2511.16108
• Published
The Illusion of Specialization: Unveiling the Domain-Invariant "Standing Committee" in Mixture-of-Experts Models
Paper
• 2601.03425
• Published • 17
MMFormalizer: Multimodal Autoformalization in the Wild
Paper
• 2601.03017
• Published • 106
DiffCoT: Diffusion-styled Chain-of-Thought Reasoning in LLMs
Paper
• 2601.03559
• Published • 14
Token-Level LLM Collaboration via FusionRoute
Paper
• 2601.05106
• Published • 40
VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice
Paper
• 2601.05175
• Published • 36
ArenaRL: Scaling RL for Open-Ended Agents via Tournament-based Relative Ranking
Paper
• 2601.06487
• Published • 54
Learnable Multipliers: Freeing the Scale of Language Model Matrix Layers
Paper
• 2601.04890
• Published • 44
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head
Paper
• 2601.07832
• Published • 52
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
Paper
• 2601.05242
• Published • 230
RelayLLM: Efficient Reasoning via Collaborative Decoding
Paper
• 2601.05167
• Published • 31
LaViT: Aligning Latent Visual Thoughts for Multi-modal Reasoning
Paper
• 2601.10129
• Published • 13
Language of Thought Shapes Output Diversity in Large Language Models
Paper
• 2601.11227
• Published • 9
NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems
Paper
• 2601.11004
• Published • 30
Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model
Paper
• 2601.15892
• Published • 53
The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models
Paper
• 2601.15165
• Published • 73
Learning to Discover at Test Time
Paper
• 2601.16175
• Published • 44
ReGuLaR: Variational Latent Reasoning Guided by Rendered Chain-of-Thought
Paper
• 2601.23184
• Published • 36
AudioSAE: Towards Understanding of Audio-Processing Models with Sparse AutoEncoders
Paper
• 2602.05027
• Published • 63
Think Longer to Explore Deeper: Learn to Explore In-Context via Length-Incentivized Reinforcement Learning
Paper
• 2602.11748
• Published • 35
Paper
• 2602.11298
• Published • 24
DFlash: Block Diffusion for Flash Speculative Decoding
Paper
• 2602.06036
• Published • 46
InterPrior: Scaling Generative Control for Physics-Based Human-Object Interactions
Paper
• 2602.06035
• Published • 23
Experiential Reinforcement Learning
Paper
• 2602.13949
• Published • 72
REDSearcher: A Scalable and Cost-Efficient Framework for Long-Horizon Search Agents
Paper
• 2602.14234
• Published • 26
Empty Shelves or Lost Keys? Recall Is the Bottleneck for Parametric Factuality
Paper
• 2602.14080
• Published • 21
On Surprising Effectiveness of Masking Updates in Adaptive Optimizers
Paper
• 2602.15322
• Published • 10
OPT-R: Exploring the Role of Explanations in Finetuning and Prompting
for Reasoning Skills of Large Language Models
Paper
• 2305.12001
• Published • 1
SELF: Language-Driven Self-Evolution for Large Language Model
Paper
• 2310.00533
• Published • 2
DINO-SAE: DINO Spherical Autoencoder for High-Fidelity Image Reconstruction and Generation
Paper
• 2601.22904
• Published • 15
EgoPush: Learning End-to-End Egocentric Multi-Object Rearrangement for Mobile Robots
Paper
• 2602.18071
• Published • 22
VAR RL Done Right: Tackling Asynchronous Policy Conflicts in Visual Autoregressive Generation
Paper
• 2601.02256
• Published • 33
GARDO: Reinforcing Diffusion Models without Reward Hacking
Paper
• 2512.24138
• Published • 30
Falcon-H1R: Pushing the Reasoning Frontiers with a Hybrid Model for Efficient Test-Time Scaling
Paper
• 2601.02346
• Published • 27
Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits
Paper
• 2512.20578
• Published • 86
Recursive Language Models
Paper
• 2512.24601
• Published • 94
COMPASS: A Framework for Evaluating Organization-Specific Policy Alignment in LLMs
Paper
• 2601.01836
• Published • 10
Toward Stable Semi-Supervised Remote Sensing Segmentation via Co-Guidance and Co-Fusion
Paper
• 2512.23035
• Published • 5
Confidence Estimation for LLMs in Multi-turn Interactions
Paper
• 2601.02179
• Published • 17
SWE-Lego: Pushing the Limits of Supervised Fine-tuning for Software Issue Resolving
Paper
• 2601.01426
• Published • 24
OpenNovelty: An LLM-powered Agentic System for Verifiable Scholarly Novelty Assessment
Paper
• 2601.01576
• Published • 19
Project Ariadne: A Structural Causal Framework for Auditing Faithfulness in LLM Agents
Paper
• 2601.02314
• Published • 2
M-ErasureBench: A Comprehensive Multimodal Evaluation Benchmark for Concept Erasure in Diffusion Models
Paper
• 2512.22877
• Published • 2
Nested Learning: The Illusion of Deep Learning Architectures
Paper
• 2512.24695
• Published • 45
Paper
• 2601.00417
• Published • 34
The Reasoning-Creativity Trade-off: Toward Creativity-Driven Problem Solving
Paper
• 2601.00747
• Published • 20
InfoSynth: Information-Guided Benchmark Synthesis for LLMs
Paper
• 2601.00575
• Published • 3
Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space
Paper
• 2512.24617
• Published • 66
A unified framework for detecting point and collective anomalies in operating system logs via collaborative transformers
Paper
• 2512.23380
• Published • 45
Forging Spatial Intelligence: A Roadmap of Multi-Modal Data Pre-Training for Autonomous Systems
Paper
• 2512.24385
• Published • 8
Scaling Open-Ended Reasoning to Predict the Future
Paper
• 2512.25070
• Published • 20
Fantastic Reasoning Behaviors and Where to Find Them: Unsupervised Discovery of the Reasoning Process
Paper
• 2512.23988
• Published • 19
Detecting Anomalies in Machine Learning Infrastructure via Hardware
Telemetry
Paper
• 2510.26008
• Published
CodeLSI: Leveraging Foundation Models for Automated Code Generation with
Low-Rank Optimization and Domain-Specific Instruction Tuning
Paper
• 2509.14373
• Published
Big data analysis and distributed deep learning for next-generation
intrusion detection system optimization
Paper
• 2209.13961
• Published
Viewer
• Updated • 2.14k • 631
• 194
ManCAR: Manifold-Constrained Latent Reasoning with Adaptive Test-Time Computation for Sequential Recommendation
Paper
• 2602.20093
• Published • 29
tttLRM: Test-Time Training for Long Context and Autoregressive 3D Reconstruction
Paper
• 2602.20160
• Published • 10
MTraining: Distributed Dynamic Sparse Attention for Efficient Ultra-Long Context Training
Paper
• 2510.18830
• Published
Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory
Paper
• 2603.04257
• Published • 19
AgilePruner: An Empirical Study of Attention and Diversity for Adaptive Visual Token Pruning in Large Vision-Language Models
Paper
• 2603.01236
• Published • 11
V_1: Unifying Generation and Self-Verification for Parallel Reasoners
Paper
• 2603.04304
• Published • 14
MUSE: A Run-Centric Platform for Multimodal Unified Safety Evaluation of Large Language Models
Paper
• 2603.02482
• Published • 3
BeamPERL: Parameter-Efficient RL with Verifiable Rewards Specializes Compact LLMs for Structured Beam Mechanics Reasoning
Paper
• 2603.04124
• Published • 1
Specificity-aware reinforcement learning for fine-grained open-world classification
Paper
• 2603.03197
• Published • 16
Image-to-Text
• 2B • Updated • 25.5k
• 152
Dynamic Chunking Diffusion Transformer
Paper
• 2603.06351
• Published • 15
π-StepNFT: Wider Space Needs Finer Steps in Online RL for Flow-based VLAs
Paper
• 2603.02083
• Published • 9
Lost in Stories: Consistency Bugs in Long Story Generation by LLMs
Paper
• 2603.05890
• Published • 92
K-Search: LLM Kernel Generation via Co-Evolving Intrinsic World Model
Paper
• 2602.19128
• Published • 7
MM-Zero: Self-Evolving Multi-Model Vision Language Models From Zero Data
Paper
• 2603.09206
• Published • 53
On-Policy Self-Distillation for Reasoning Compression
Paper
• 2603.05433
• Published • 8
Reading, Not Thinking: Understanding and Bridging the Modality Gap When Text Becomes Pixels in Multimodal LLMs
Paper
• 2603.09095
• Published • 29
CARE-Edit: Condition-Aware Routing of Experts for Contextual Image Editing
Paper
• 2603.08589
• Published • 38
Believe Your Model: Distribution-Guided Confidence Calibration
Paper
• 2603.03872
• Published • 40
Decoupling Reasoning and Confidence: Resurrecting Calibration in Reinforcement Learning from Verifiable Rewards
Paper
• 2603.09117
• Published • 10
OpenClaw-RL: Train Any Agent Simply by Talking
Paper
• 2603.10165
• Published • 150
In-Context Reinforcement Learning for Tool Use in Large Language Models
Paper
• 2603.08068
• Published • 43
EVATok: Adaptive Length Video Tokenization for Efficient Visual Autoregressive Generation
Paper
• 2603.12267
• Published • 13
WeEdit: A Dataset, Benchmark and Glyph-Guided Framework for Text-centric Image Editing
Paper
• 2603.11593
• Published • 25
Internalizing Meta-Experience into Memory for Guided Reinforcement Learning in Large Language Models
Paper
• 2602.10224
• Published • 19
Multimodal OCR: Parse Anything from Documents
Paper
• 2603.13032
• Published • 43
Guiding a Diffusion Transformer with the Internal Dynamics of Itself
Paper
• 2512.24176
• Published • 8
PETS: A Principled Framework Towards Optimal Trajectory Allocation for Efficient Test-Time Self-Consistency
Paper
• 2602.16745
• Published • 8
Recursive Language Models Meet Uncertainty: The Surprising Effectiveness of Self-Reflective Program Search for Long Context
Paper
• 2603.15653
• Published • 12
Complementary Reinforcement Learning
Paper
• 2603.17621
• Published • 37
Anatomy of a Lie: A Multi-Stage Diagnostic Framework for Tracing Hallucinations in Vision-Language Models
Paper
• 2603.15557
• Published • 29
The Art of Efficient Reasoning: Data, Reward, and Optimization
Paper
• 2602.20945
• Published • 7
TAPE: Tool-Guided Adaptive Planning and Constrained Execution in Language Model Agents
Paper
• 2602.19633
• Published • 8
Matryoshka Gaussian Splatting
Paper
• 2603.19234
• Published • 11
Bridging Semantic and Kinematic Conditions with Diffusion-based Discrete Motion Tokenizer
Paper
• 2603.19227
• Published • 42
VTC-Bench: Evaluating Agentic Multimodal Models via Compositional Visual Tool Chaining
Paper
• 2603.15030
• Published • 21
Thinking to Recall: How Reasoning Unlocks Parametric Knowledge in LLMs
Paper
• 2603.09906
• Published • 75
The Reasoning Trap -- Logical Reasoning as a Mechanistic Pathway to Situational Awareness
Paper
• 2603.09200
• Published • 5
Multi-Head Low-Rank Attention
Paper
• 2603.02188
• Published • 3
Mario: Multimodal Graph Reasoning with Large Language Models
Paper
• 2603.05181
• Published • 9
How Far Can Unsupervised RLVR Scale LLM Training?
Paper
• 2603.08660
• Published • 57
Heterogeneous Agent Collaborative Reinforcement Learning
Paper
• 2603.02604
• Published • 193
Alignment Makes Language Models Normative, Not Descriptive
Paper
• 2603.17218
• Published • 46
Beyond Single Tokens: Distilling Discrete Diffusion Models via Discrete MMD
Paper
• 2603.20155
• Published • 10
mSFT: Addressing Dataset Mixtures Overfiting Heterogeneously in Multi-task SFT
Paper
• 2603.21606
• Published • 38
Unified Spatio-Temporal Token Scoring for Efficient Video VLMs
Paper
• 2603.18004
• Published • 13
Reasoning as Compression: Unifying Budget Forcing via the Conditional Information Bottleneck
Paper
• 2603.08462
• Published • 22
On the Direction of RLVR Updates for LLM Reasoning: Identification and Exploitation
Paper
• 2603.22117
• Published • 29
OpenResearcher: A Fully Open Pipeline for Long-Horizon Deep Research Trajectory Synthesis
Paper
• 2603.20278
• Published • 94
ThinkJEPA: Empowering Latent World Models with Large Vision-Language Reasoning Model
Paper
• 2603.22281
• Published • 17
SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning
Paper
• 2603.23483
• Published • 61
Generalized Discrete Diffusion from Snapshots
Paper
• 2603.21342
• Published • 11
From Static Templates to Dynamic Runtime Graphs: A Survey of Workflow Optimization for LLM Agents
Paper
• 2603.22386
• Published • 54
Rethinking Token-Level Policy Optimization for Multimodal Chain-of-Thought
Paper
• 2603.22847
• Published • 25
RealChart2Code: Advancing Chart-to-Code Generation with Real Data and Multi-Task Evaluation
Paper
• 2603.25804
• Published • 29
Natural-Language Agent Harnesses
Paper
• 2603.25723
• Published • 24
PackForcing: Short Video Training Suffices for Long Video Sampling and Long Context Inference
Paper
• 2603.25730
• Published • 51
Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills
Paper
• 2603.25158
• Published • 49
On Token's Dilemma: Dynamic MoE with Drift-Aware Token Assignment for Continual Learning of Large Vision Language Models
Paper
• 2603.27481
• Published • 35
Make Geometry Matter for Spatial Reasoning
Paper
• 2603.26639
• Published • 32
TAPS: Task Aware Proposal Distributions for Speculative Sampling
Paper
• 2603.27027
• Published • 141
Marco DeepResearch: Unlocking Efficient Deep Research Agents via Verification-Centric Design
Paper
• 2603.28376
• Published • 22
Embarrassingly Simple Self-Distillation Improves Code Generation
Paper
• 2604.01193
• Published • 34
CORAL: Towards Autonomous Multi-Agent Evolution for Open-Ended Discovery
Paper
• 2604.01658
• Published • 52
Swift-SVD: Theoretical Optimality Meets Practical Efficiency in Low-Rank LLM Compression
Paper
• 2604.01609
• Published • 11
Communicating about Space: Language-Mediated Spatial Integration Across Partial Views
Paper
• 2603.27183
• Published • 20
AgentSocialBench: Evaluating Privacy Risks in Human-Centered Agentic Social Networks
Paper
• 2604.01487
• Published • 9
Test-Time Scaling Makes Overtraining Compute-Optimal
Paper
• 2604.01411
• Published • 26
Paper
• 2604.03128
• Published • 154
Paper Espresso: From Paper Overload to Research Insight
Paper
• 2604.04562
• Published • 9
Can LLMs Learn to Reason Robustly under Noisy Supervision?
Paper
• 2604.03993
• Published • 39
Learning to Learn-at-Test-Time: Language Agents with Learnable Adaptation Policies
Paper
• 2604.00830
• Published • 14
PLUME: Latent Reasoning Based Universal Multimodal Embedding
Paper
• 2604.02073
• Published • 15
Self-Execution Simulation Improves Coding Models
Paper
• 2604.03253
• Published • 30
SkillX: Automatically Constructing Skill Knowledge Bases for Agents
Paper
• 2604.04804
• Published • 28
TriAttention: Efficient Long Reasoning with Trigonometric KV Compression
Paper
• 2604.04921
• Published • 101
MinerU2.5-Pro: Pushing the Limits of Data-Centric Document Parsing at Scale
Paper
• 2604.04771
• Published • 114
How Well Do Agentic Skills Work in the Wild: Benchmarking LLM Skill Usage in Realistic Settings
Paper
• 2604.04323
• Published • 37
SkillClaw: Let Skills Evolve Collectively with Agentic Evolver
Paper
• 2604.08377
• Published • 256
Experience Transfer for Multimodal LLM Agents in Minecraft Game
Paper
• 2604.05533
• Published • 12
Beyond Accuracy: Unveiling Inefficiency Patterns in Tool-Integrated Reasoning
Paper
• 2604.05404
• Published • 39
Paper Circle: An Open-source Multi-agent Research Discovery and Analysis Framework
Paper
• 2604.06170
• Published • 24