Flawed Fictions GRPO (gemma3_4b_grpo_lengthpenalty)
Training Details
|
|
| Base model |
google/gemma-3-4b-it |
| Task |
Continuity error detection (\boxed{Yes} / \boxed{No}) |
| W&B project |
flawed_fictions_rl |
| W&B group |
grpo_flawed_fictions_gemma3_4b_lengthpenalty |
| W&B runs |
dva1to9i |
| Training script |
scripts/gemma3_4b_grpo_lengthpenalty.sh |
Checkpoint Revisions
- Branch head (latest):
main
- Per-checkpoint tags:
main-step-<N>
Usage
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
"agurung/flawed-fictions-gemma-3-4b-lengthpenalty",
revision="main",
device_map="auto",
torch_dtype="auto",
)