Sangsang/grpo_DeepSeek-R1-Distill-Llama-8B_bs8_g16_mb128_lr1e-6_b1e-3_clip0p2_temp0p7_ep30 Text Generation • Updated 28 days ago • 14
Sangsang/feedback_disallowed_ema_DeepSeek-R1-Distill-Llama-8B_reverse_kl_ema0p999_ep30 Text Generation • Updated 28 days ago • 26
Sangsang/DeepSeek-R1-Distill-Llama-8B_from_Distill-Qwen-32B Text Generation • Updated 27 days ago • 13
Sangsang/R1-8B-thinksafe-DeepSeek-8B-unfiltered-raw-32-pm-3ep Text Generation • Updated 27 days ago • 29
Sangsang/feedback_asymmetric_fixed_ema_DeepSeek-R1-Distill-Llama-8B_bw0p5_fw0p5_ema0p999_ep30 Text Generation • Updated 20 days ago • 24
wgcyeo/ci-grpo_DeepSeek-R1-Distill-Llama-8B_bs16_g16_mb128_lr1e-6_b1e-3_clip0p2_temp0p7_ep30 Text Generation • Updated 19 days ago • 22
wgcyeo/ci-feedback_asym_bi_kl_hybrid_fixed_ema_DeepSeek-R1-Distill-Llama-8B_bw0p5_fw0p5_ema0p999_ep30 Text Generation • Updated 15 days ago • 19