y-tani
/

dpo-toml-syntax-v8-merged

Text Generation

structured-output

text-generation-inference

4-bit precision

Model card Files Files and versions

y-tani/dpo-toml-syntax-v8-merged

Two-stage trained model:

SFT v8: y-tani/lora_structeval_t_qwen3_4b_v8 (CoT-free, TOML 2x upsample, score=0.717)
DPO (TOML-specific): Preference pairs targeting TOML syntax errors
- chosen: correct [section] notation
- rejected: invalid multi-line inline table notation

Training Configuration

Base: Qwen/Qwen3-4B-Instruct-2507
SFT adapter: y-tani/lora_structeval_t_qwen3_4b_v8
DPO dataset: y-tani/toml-syntax-dpo-dataset
beta: 0.2, lr: 1e-6, epochs: 3
max_length: 1024, max_prompt_length: 512
LoRA r=16, alpha=32

Downloads last month: 1

Safetensors

Model size

4B params

Tensor type

F32

·

F16

·

U8

·

Model tree for y-tani/dpo-toml-syntax-v8-merged

Base model

Qwen/Qwen3-4B-Instruct-2507

Quantized

(229)

this model

Dataset used to train y-tani/dpo-toml-syntax-v8-merged