y-tani/dpo-toml-syntax-v8-merged
Two-stage trained model:
- SFT v8:
y-tani/lora_structeval_t_qwen3_4b_v8(CoT-free, TOML 2x upsample, score=0.717) - DPO (TOML-specific): Preference pairs targeting TOML syntax errors
- chosen: correct
[section]notation - rejected: invalid multi-line inline table notation
- chosen: correct
Training Configuration
- Base:
Qwen/Qwen3-4B-Instruct-2507 - SFT adapter:
y-tani/lora_structeval_t_qwen3_4b_v8 - DPO dataset:
y-tani/toml-syntax-dpo-dataset - beta: 0.2, lr: 1e-6, epochs: 3
- max_length: 1024, max_prompt_length: 512
- LoRA r=16, alpha=32
- Downloads last month
- 1
Model tree for y-tani/dpo-toml-syntax-v8-merged
Base model
Qwen/Qwen3-4B-Instruct-2507