Qwen3.5-27B Marvin V2 Stage 4 — Antirep DPO

Fine-tuned Qwen3.5-VL-27B-A3B for creative writing and roleplay with targeted repetition suppression via DPO.

Training Pipeline

This model is the result of a 4-stage training pipeline:

Stage 1 — CPT: Continual pre-training on 87M tokens of info-dense creative writing data
Stage 2 — CPT: Second CPT pass on 38M tokens of high-quality Marvin-style creative prose
Stage 2.5 — Thinking SFT (EP2): Instruction tuning with thinking traces (7,522 samples, 2 epochs)
Stage 4 — Antirep DPO (this model): Pure anti-repetition DPO targeting visible text repetition

Antirep DPO Details

268 DPO pairs with 53x average repetition contrast ratio
Chosen: Generated from Stage 4 Masked DPO model (median rep3g=51)
Rejected: EP2-induced repetition + Stage 3 V2 naturally repetitive outputs (median rep3g=920)
Filter: Only pairs where chosen_rep < rejected_rep * 0.5
mask_thinking: true — DPO loss applied only to visible text, not <think> blocks
96.5% repetition reduction vs EP2 baseline (rep3g: 839 → 29 on 5-prompt test)

Training Configuration

Base: Stage 2 Thinking SFT EP2 (NOT the masked DPO)
QLoRA: r=32, alpha=32, rsLoRA, nf4 quantization
DPO: beta=0.1, sigmoid loss
LR: 5e-6 constant with warmup
Optimizer: paged AdamW 8-bit
1 epoch, 67 steps total
Flash Attention 2, gradient checkpointing

Usage

This is a Qwen3.5-VL-27B model. Use with any Qwen3.5-compatible inference engine.

The model uses ChatML format with <think> blocks for reasoning:

<|im_start|>system
You are a creative writing assistant.<|im_end|>
<|im_start|>user
Write a scene about...<|im_end|>
<|im_start|>assistant
<think>
[reasoning here]
</think>

[visible response here]<|im_end|>

Quantized Versions

Q4_K_M GGUF available at ToastyPigeon/Qwen3.5-Test-GGUFs

Downloads last month: 16

Safetensors

Model size

28B params

Tensor type

BF16

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ToastyPigeon/Qwen3.5-27B-Marvin-V2-Stage4-Antirep-DPO

Quantizations

2 models