olmo3-7b-grpo-purerl-creativity-step28

Olmo3-7B trained with GRPO (Pure RL, no SFT) on Creativity dataset. Checkpoint: step 28 (final)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("Echoandland/olmo3-7b-grpo-purerl-creativity-step28")
tokenizer = AutoTokenizer.from_pretrained("Echoandland/olmo3-7b-grpo-purerl-creativity-step28")
Downloads last month
5
Safetensors
Model size
7B params
Tensor type
F32
·
Video Preview
loading