qwen3-8b-grpo-purerl-creativity-step21

Qwen3-8B trained with GRPO (Pure RL, no SFT) on Creativity dataset. Checkpoint: step 21 (val ACC=0.792)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("Echoandland/qwen3-8b-grpo-purerl-creativity-step21")
tokenizer = AutoTokenizer.from_pretrained("Echoandland/qwen3-8b-grpo-purerl-creativity-step21")
Downloads last month
1
Safetensors
Model size
8B params
Tensor type
F32
·
Video Preview
loading