-
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward
Paper • 2510.03222 • Published • 76 -
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use
Paper • 2510.05592 • Published • 110 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 513 -
Multi-Agent Tool-Integrated Policy Optimization
Paper • 2510.04678 • Published • 31
Jianhong Wang
hsvgbkhgbv
AI & ML interests
multi-agent reinforcement learning,
ad hoc teamwork,
robust reinforcement learning
Recent Activity
updated a collection about 5 hours ago
LLM papers upvoted a paper about 5 hours ago
Qualixar OS: A Universal Operating System for AI Agent Orchestration upvoted a paper about 5 hours ago
MARS: Enabling Autoregressive Models Multi-Token GenerationOrganizations
None yet