Towards Real-world Human Behavior Simulation: Benchmarking Large Language Models on Long-horizon, Cross-scenario, Heterogeneous Behavior Traces Paper β’ 2604.08362 β’ Published 7 days ago β’ 15
Alibaba-NLP/gme-Qwen2-VL-7B-Instruct Sentence Similarity β’ 8B β’ Updated Jun 9, 2025 β’ 1.05k β’ 71
view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge Feb 7, 2025 β’ 286