Flawed Fictions GRPO (gemma3_4b_grpo_lengthpenalty)

Training Details


Base model	`google/gemma-3-4b-it`
Task	Continuity error detection (`\boxed{Yes}` / `\boxed{No}`)
W&B project	`flawed_fictions_rl`
W&B group	`grpo_flawed_fictions_gemma3_4b_lengthpenalty`
W&B runs	`dva1to9i`
Training script	`scripts/gemma3_4b_grpo_lengthpenalty.sh`

Checkpoint Revisions

Branch head (latest): main
Per-checkpoint tags: main-step-<N>

Usage

from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    "agurung/flawed-fictions-gemma-3-4b-lengthpenalty",
    revision="main",
    device_map="auto",
    torch_dtype="auto",
)

Downloads last month: 1

Safetensors

Model size

4B params

Tensor type

BF16

Video Preview

Reinforcement Learning

Model tree for agurung/flawed-fictions-gemma-3-4b-lengthpenalty

Base model

google/gemma-3-4b-pt

Finetuned

google/gemma-3-4b-it

Finetuned

(659)

this model