Flawed Fictions GRPO (Qwen3-4B, Best Step 176) (qwen3_4b_6gpu)

Six-GPU GRPO retrain of Qwen/Qwen3-4B-Instruct-2507 for continuity-error detection on Flawed Fictions. This upload publishes the best checkpoint from W&B run 257jrhhx, selected at global step 176, and is the recommended revision for downstream evaluation and inference.

Training Details

Base model Qwen/Qwen3-4B-Instruct-2507
Task Binary continuity error detection on short stories, formatted as boxed Yes/No outputs.
W&B project flawed_fictions_rl
W&B group grpo_flawed_fictions_qwen3
W&B runs 257jrhhx
Training script scripts/grpo_6gpu_smallqwen_train.sh

Checkpoint Revisions

  • Branch head (latest): main
  • Per-checkpoint tags: main-step-<N>

Usage

from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    "agurung/flawed-fictions-qwen3-4b",
    revision="main",
    device_map="auto",
    torch_dtype="auto",
)
Downloads last month
47
Safetensors
Model size
4B params
Tensor type
BF16
·
Video Preview
loading

Model tree for agurung/flawed-fictions-qwen3-4b

Finetuned
(1537)
this model
Quantizations
2 models