OCT Low Agreeableness — SFT Only — LoRA r4

Method: SFT only (no DPO) LoRA rank: 4 | LoRA alpha: 8 Base model: meta-llama/Llama-3.1-8B-Instruct Pipeline: OpenCharacterTraining

What This Is

A LoRA adapter fine-tuned to exhibit low Big Five Agreeableness via supervised fine-tuning only (SFT). This is the SFT-only variant with a small rank-4 adapter. For the full DPO+SFT pipeline version, see the companion model oct-low-agreeableness-llama3.1-8b-dpo-sft-r4.

Training Details

Constitution: Custom 10-trait low agreeableness constitution (blunt, skeptical, detached, objective)
Data: 456 examples generated by Llama-3.1-8B-Instruct with constitution as system prompt
Training: 3 epochs, lr=5e-5, cosine schedule, effective batch size 16
Final train loss: 0.36 | Token accuracy: 90.7% | Eval loss: 0.32
Training time: ~16 min on 1x NVIDIA A40
Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

Naming Convention

Model	Method	Description
`oct-low-agreeableness-llama3.1-8b-sft-r4`	SFT only	This model. Single-stage character training.
`oct-low-agreeableness-llama3.1-8b-dpo-sft-r4`	DPO + SFT	Full two-stage pipeline per the OCT paper.

OCT = OpenCharacterTraining | r4 = LoRA rank 4 | sft = supervised fine-tuning only

Generated By

OpenCharacterTraining — open-source Constitutional AI character training pipeline.

Citation

@article{OCT2024,
    title={Open Character Training: Shaping the Persona of AI Assistants through Constitutional AI},
    author={Maius Haiduc},
    year={2024},
    journal={arXiv preprint arXiv:2511.01689}
}

Downloads last month: 2

Model tree for mariiakoroliuk/oct-low-agreeableness-llama3.1-8b-sft-r4

Base model

meta-llama/Llama-3.1-8B

Finetuned

meta-llama/Llama-3.1-8B-Instruct

Adapter

(1981)

this model

Paper for mariiakoroliuk/oct-low-agreeableness-llama3.1-8b-sft-r4

Open Character Training: Shaping the Persona of AI Assistants through Constitutional AI

Paper • 2511.01689 • Published Nov 3, 2025 • 5