y-tani/dpo-toml-syntax-v8-merged

Two-stage trained model:

  1. SFT v8: y-tani/lora_structeval_t_qwen3_4b_v8 (CoT-free, TOML 2x upsample, score=0.717)
  2. DPO (TOML-specific): Preference pairs targeting TOML syntax errors
    • chosen: correct [section] notation
    • rejected: invalid multi-line inline table notation

Training Configuration

  • Base: Qwen/Qwen3-4B-Instruct-2507
  • SFT adapter: y-tani/lora_structeval_t_qwen3_4b_v8
  • DPO dataset: y-tani/toml-syntax-dpo-dataset
  • beta: 0.2, lr: 1e-6, epochs: 3
  • max_length: 1024, max_prompt_length: 512
  • LoRA r=16, alpha=32
Downloads last month
1
Safetensors
Model size
4B params
Tensor type
F32
·
F16
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for y-tani/dpo-toml-syntax-v8-merged

Quantized
(229)
this model

Dataset used to train y-tani/dpo-toml-syntax-v8-merged