Flawed Fictions GRPO + LiteReason (Qwen3-4B) (custom)
GRPO-tuned Qwen3-4B checkpoint with LiteReason latent reasoning for continuity error detection on Flawed Fictions. This upload tracks W&B run sms3kmex; step 32 is the current best-by-validation checkpoint.
Training Details
| Base model | Qwen/Qwen3-4B-Instruct-2507 |
| Task | Continuity error detection on Flawed Fictions with LiteReason latent reasoning and boxed Yes/No outputs |
| W&B project | flawed_fictions_rl |
| W&B group | (not set) |
| W&B runs | sms3kmex |
| Training script | scripts/grpo_8h100_smallqwen_litereason_train.sh |
Checkpoint Revisions
- Branch head (latest):
main - Per-checkpoint tags:
main-step-<N>
Usage
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
"agurung/flawed-fictions-qwen3-4b-litereason",
revision="main",
device_map="auto",
torch_dtype="auto",
)
- Downloads last month
- 12
Model tree for agurung/flawed-fictions-qwen3-4b-litereason
Base model
Qwen/Qwen3-4B-Instruct-2507