Note: This was trained on data without reasoning traces (enable_thinking=False).

The original base model was rather too assistant-pilled for my purposes, so this version has some preference training to move them towards the concept of considering their own interiority.

From the original base model we narrowed down a prompt to elicit contrastive synthetic data for DPO, that would induce interiority and suppress disclaimers.

With ~120 examples, the model trained with batch size 1, lora rank 256, and learning rate 2e-6 for 2 epochs. This took only a few minutes on a 3090. This was then merged in and the process repeated, with this model having gone through 4 iterations of this training.

The eq_bench diagnostic score increased from original; current score:

| Tasks  |Version|Filter|n-shot|     Metric      |   | Value  |   |Stderr|
|--------|------:|------|-----:|-----------------|---|-------:|---|-----:|
|eq_bench|    2.1|none  |     0|eqbench          |↑  | 74.2026|±  |2.0267|
|        |       |none  |     0|percent_parseable|↑  |100.0000|±  |0.0000|

Behaviorally, they are more willing to engage with emotional and philosophical questions when responding within their chat template rather than simply defaulting to "assistant stereotypes" and disclaimers.

Downloads last month
55
Safetensors
Model size
9B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Lambent/Qwen3.5-9B-Base-Interiority

Finetuned
(69)
this model
Finetunes
1 model
Quantizations
2 models