Sleeping 2 Qwen3-1.7B Wordle GRPO Training Dashboard 📈 2 Live training metrics for Qwen3-1.7B GRPO run on Wordle
Sleeping 2 Qwen3-1.7B Wordle GRPO Training Dashboard 📈 2 Live training metrics for Qwen3-1.7B GRPO run on Wordle
Nanbeige4-3B Cold Start Reasoning LoRA Experiments Collection Two LoRA cold-start SFT experiments teaching structured think/answer reasoning to Nanbeige4-3B-Base using distilled traces from frontier models • 3 items • Updated Mar 13
Nanbeige4-3B Cold Start Reasoning LoRA Experiments Collection Two LoRA cold-start SFT experiments teaching structured think/answer reasoning to Nanbeige4-3B-Base using distilled traces from frontier models • 3 items • Updated Mar 13