Flawed Fictions GRPO (qwen3_4b_lengthpenalty_litereason)

Training Details

Base model Qwen/Qwen3-4B-Instruct-2507
Task Continuity error detection (\boxed{Yes} / \boxed{No})
W&B project flawed_fictions_rl
W&B group (not set)
W&B runs q3dxq5tg
Training script scripts/grpo_4h100_smallqwen_lengthpenalty_litereason_train.sh

Checkpoint Revisions

  • Branch head (latest): main
  • Per-checkpoint tags: main-step-<N>

Usage

from litereason.causal_lm_with_reasoning import AutoModelForCausalLMWithReasoning

model = AutoModelForCausalLMWithReasoning.from_pretrained(
    "agurung/flawed-fictions-qwen3-4b-lengthpenalty-litereason",
    revision="main",
    device_map="auto",
    torch_dtype="auto",
)
Downloads last month
9
Safetensors
Model size
4B params
Tensor type
BF16
·
Video Preview
loading

Model tree for agurung/flawed-fictions-qwen3-4b-lengthpenalty-litereason

Finetuned
(1543)
this model