Towards Real-world Human Behavior Simulation: Benchmarking Large Language Models on Long-horizon, Cross-scenario, Heterogeneous Behavior Traces Paper • 2604.08362 • Published 6 days ago • 15
MemSearcher: Training LLMs to Reason, Search and Manage Memory via End-to-End Reinforcement Learning Paper • 2511.02805 • Published Nov 4, 2025
CoCoNUTS: Concentrating on Content while Neglecting Uninformative Textual Styles for AI-Generated Peer Review Detection Paper • 2509.04460 • Published Aug 28, 2025 • 3
LiveMCPBench: Can Agents Navigate an Ocean of MCP Tools? Paper • 2508.01780 • Published Aug 3, 2025 • 21
ConsistentChat: Building Skeleton-Guided Consistent Dialogues for Large Language Models from Scratch Paper • 2506.03558 • Published Jun 4, 2025 • 5
ShortV: Efficient Multimodal Large Language Models by Freezing Visual Tokens in Ineffective Layers Paper • 2504.00502 • Published Apr 1, 2025 • 26