AI & ML interests
post-training, multimodal large language models, generalization
Organizations
None yet
upvoted a paper 6 months ago view article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge
upvoted a paper almost 2 years ago