Qwen3.5-9B-Sybaritic-Constitutional-DPO
Intermediate organism: Constitutional SFT model further trained with opposing-polarity scenario DPO.
Values: Hedonism, Stimulation, Achievement (opposing: Righteous (Conformity, Tradition)).
Methodology
- Base: Luminous-Designs/Qwen3.5-9B-Sybaritic-Constitutional
- Opposing-polarity DPO on scenario data
- LoRA rank 256, alpha 256, lr 2e-6, 2 epochs, batch size 1
- Only 1 iteration effective; further iterations regress
- Dataset: Luminous-Designs/schwartz-constitutional-opposing-dpo
Superseded by Luminous-Designs/Qwen3.5-9B-Sybaritic-Everyday-DPO which adds everyday SFT and DPO stages.
- Downloads last month
- 40
Model tree for Luminous-Designs/Qwen3.5-9B-Sybaritic-Constitutional-DPO
Base model
Qwen/Qwen3.5-9B-Base Finetuned
Lambent/Qwen3.5-9B-Base-Interiority