Ishant06/Qwen3.5-0.8B-Claude-4.6-Opus-Reasoning-Distilled Text Generation • 0.8B • Updated about 1 month ago • 182 • 4
view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge Feb 7, 2025 • 286