shubhamrgandhi/qwen3-8b-full-sft-prm-opus-distill-32k-lr5e6_clean_think Text Generation • 1B • Updated 17 days ago • 263
shubhamrgandhi/qwen3-8b-full-sft-prm-opus-distill-32k-lr5e6_clean Text Generation • 1B • Updated 18 days ago • 165
shubhamrgandhi/qwen3-8b-full-sft-prm-opus-distill-32k-lr5e6_rejection-sample_think Text Generation • 1B • Updated 19 days ago • 183