Rakushaking
/

Qwen4b-SFT-d9-merged-after-dpo-toml-dpo

Text Generation

structured-data

text-generation-inference

Model card Files Files and versions

Qwen3-4B D4 SFT + Generic DPO + TOML-focused DPO

Pipeline

SFT on structured data (V4+V5)
Generic DPO (u-10bei/dpo-dataset-qwen-cot) → eval score best
TOML-focused DPO (this model)
- chosen: [section] style TOML
- rejected: inline table style TOML
- 800 TOML pairs + 400 other format pairs

Config

LR: 1e-05
Beta: 0.1
Epochs: 1
LoRA: r=8, alpha=16 (merged)

Downloads last month: 2

Safetensors

Model size

4B params

Tensor type

BF16

·

Model tree for Rakushaking/Qwen4b-SFT-d9-merged-after-dpo-toml-dpo

Base model

Qwen/Qwen3-4B-Instruct-2507

Adapter

Rakushaking/Qwen4b-SFT-d9

Finetuned

Rakushaking/Qwen4b-SFT-d9-merged-after-dpo-d2

Finetuned

(3)

this model