Flawed Fictions GRPO (gemma3_4b_grpo_lengthpenalty)

Training Details

Base model google/gemma-3-4b-it
Task Continuity error detection (\boxed{Yes} / \boxed{No})
W&B project flawed_fictions_rl
W&B group grpo_flawed_fictions_gemma3_4b_lengthpenalty
W&B runs dva1to9i
Training script scripts/gemma3_4b_grpo_lengthpenalty.sh

Checkpoint Revisions

  • Branch head (latest): main
  • Per-checkpoint tags: main-step-<N>

Usage

from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    "agurung/flawed-fictions-gemma-3-4b-lengthpenalty",
    revision="main",
    device_map="auto",
    torch_dtype="auto",
)
Downloads last month
1
Safetensors
Model size
4B params
Tensor type
BF16
·
Video Preview
loading

Model tree for agurung/flawed-fictions-gemma-3-4b-lengthpenalty

Finetuned
(659)
this model