Flawed Fictions GRPO (Qwen3-4B, Best Step 176) (qwen3_4b_6gpu)
Six-GPU GRPO retrain of Qwen/Qwen3-4B-Instruct-2507 for continuity-error detection on Flawed Fictions. This upload publishes the best checkpoint from W&B run 257jrhhx, selected at global step 176, and is the recommended revision for downstream evaluation and inference.
Training Details
| Base model | Qwen/Qwen3-4B-Instruct-2507 |
| Task | Binary continuity error detection on short stories, formatted as boxed Yes/No outputs. |
| W&B project | flawed_fictions_rl |
| W&B group | grpo_flawed_fictions_qwen3 |
| W&B runs | 257jrhhx |
| Training script | scripts/grpo_6gpu_smallqwen_train.sh |
Checkpoint Revisions
- Branch head (latest):
main - Per-checkpoint tags:
main-step-<N>
Usage
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
"agurung/flawed-fictions-qwen3-4b",
revision="main",
device_map="auto",
torch_dtype="auto",
)
- Downloads last month
- 47