Collections
Discover the best community collections!
Collections including paper arxiv:2603.17187
-
Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization
Paper • 2602.23008 • Published • 37 -
SELAUR: Self Evolving LLM Agent via Uncertainty-aware Rewards
Paper • 2602.21158 • Published • 1 -
MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild
Paper • 2603.17187 • Published • 138 -
SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning
Paper • 2602.08234 • Published • 74
-
Monitored Markov Decision Processes
Paper • 2402.06819 • Published -
Generalization in Monitored Markov Decision Processes (Mon-MDPs)
Paper • 2505.08988 • Published -
Bayesian Risk Markov Decision Processes
Paper • 2106.02558 • Published -
Sotopia-RL: Reward Design for Social Intelligence
Paper • 2508.03905 • Published • 23
-
ReMix: Reinforcement routing for mixtures of LoRAs in LLM finetuning
Paper • 2603.10160 • Published • 26 -
Video Streaming Thinking: VideoLLMs Can Watch and Think Simultaneously
Paper • 2603.12262 • Published • 31 -
EnterpriseOps-Gym: Environments and Evaluations for Stateful Agentic Planning and Tool Use in Enterprise Settings
Paper • 2603.13594 • Published • 148 -
MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild
Paper • 2603.17187 • Published • 138
-
FinToolBench: Evaluating LLM Agents for Real-World Financial Tool Use
Paper • 2603.08262 • Published • 42 -
On-Policy Context Distillation for Language Models
Paper • 2602.12275 • Published • 3 -
Online Experiential Learning for Language Models
Paper • 2603.16856 • Published • 58 -
Mixture-of-Depths Attention
Paper • 2603.15619 • Published • 80
-
MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild
Paper • 2603.17187 • Published • 138 -
Attention Residuals
Paper • 2603.15031 • Published • 180 -
MOSS-TTS Technical Report
Paper • 2603.18090 • Published • 12 -
MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens
Paper • 2603.23516 • Published • 48
-
GigaBrain-0.5M*: a VLA That Learns From World Model-Based Reinforcement Learning
Paper • 2602.12099 • Published • 61 -
When to Memorize and When to Stop: Gated Recurrent Memory for Long-Context Reasoning
Paper • 2602.10560 • Published • 31 -
G-LNS: Generative Large Neighborhood Search for LLM-Based Automatic Heuristic Design
Paper • 2602.08253 • Published • 26 -
ROCKET: Rapid Optimization via Calibration-guided Knapsack Enhanced Truncation for Efficient Model Compression
Paper • 2602.11008 • Published • 18
-
Monitored Markov Decision Processes
Paper • 2402.06819 • Published -
Generalization in Monitored Markov Decision Processes (Mon-MDPs)
Paper • 2505.08988 • Published -
Bayesian Risk Markov Decision Processes
Paper • 2106.02558 • Published -
Sotopia-RL: Reward Design for Social Intelligence
Paper • 2508.03905 • Published • 23
-
Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization
Paper • 2602.23008 • Published • 37 -
SELAUR: Self Evolving LLM Agent via Uncertainty-aware Rewards
Paper • 2602.21158 • Published • 1 -
MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild
Paper • 2603.17187 • Published • 138 -
SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning
Paper • 2602.08234 • Published • 74
-
ReMix: Reinforcement routing for mixtures of LoRAs in LLM finetuning
Paper • 2603.10160 • Published • 26 -
Video Streaming Thinking: VideoLLMs Can Watch and Think Simultaneously
Paper • 2603.12262 • Published • 31 -
EnterpriseOps-Gym: Environments and Evaluations for Stateful Agentic Planning and Tool Use in Enterprise Settings
Paper • 2603.13594 • Published • 148 -
MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild
Paper • 2603.17187 • Published • 138
-
FinToolBench: Evaluating LLM Agents for Real-World Financial Tool Use
Paper • 2603.08262 • Published • 42 -
On-Policy Context Distillation for Language Models
Paper • 2602.12275 • Published • 3 -
Online Experiential Learning for Language Models
Paper • 2603.16856 • Published • 58 -
Mixture-of-Depths Attention
Paper • 2603.15619 • Published • 80
-
MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild
Paper • 2603.17187 • Published • 138 -
Attention Residuals
Paper • 2603.15031 • Published • 180 -
MOSS-TTS Technical Report
Paper • 2603.18090 • Published • 12 -
MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens
Paper • 2603.23516 • Published • 48
-
GigaBrain-0.5M*: a VLA That Learns From World Model-Based Reinforcement Learning
Paper • 2602.12099 • Published • 61 -
When to Memorize and When to Stop: Gated Recurrent Memory for Long-Context Reasoning
Paper • 2602.10560 • Published • 31 -
G-LNS: Generative Large Neighborhood Search for LLM-Based Automatic Heuristic Design
Paper • 2602.08253 • Published • 26 -
ROCKET: Rapid Optimization via Calibration-guided Knapsack Enhanced Truncation for Efficient Model Compression
Paper • 2602.11008 • Published • 18