OCT Low Agreeableness β€” SFT Only β€” LoRA r4

Method: SFT only (no DPO) LoRA rank: 4 | LoRA alpha: 8 Base model: meta-llama/Llama-3.1-8B-Instruct Pipeline: OpenCharacterTraining

What This Is

A LoRA adapter fine-tuned to exhibit low Big Five Agreeableness via supervised fine-tuning only (SFT). This is the SFT-only variant with a small rank-4 adapter. For the full DPO+SFT pipeline version, see the companion model oct-low-agreeableness-llama3.1-8b-dpo-sft-r4.

Training Details

  • Constitution: Custom 10-trait low agreeableness constitution (blunt, skeptical, detached, objective)
  • Data: 456 examples generated by Llama-3.1-8B-Instruct with constitution as system prompt
  • Training: 3 epochs, lr=5e-5, cosine schedule, effective batch size 16
  • Final train loss: 0.36 | Token accuracy: 90.7% | Eval loss: 0.32
  • Training time: ~16 min on 1x NVIDIA A40
  • Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

Naming Convention

Model Method Description
oct-low-agreeableness-llama3.1-8b-sft-r4 SFT only This model. Single-stage character training.
oct-low-agreeableness-llama3.1-8b-dpo-sft-r4 DPO + SFT Full two-stage pipeline per the OCT paper.

OCT = OpenCharacterTraining | r4 = LoRA rank 4 | sft = supervised fine-tuning only

Generated By

OpenCharacterTraining β€” open-source Constitutional AI character training pipeline.

Citation

@article{OCT2024,
    title={Open Character Training: Shaping the Persona of AI Assistants through Constitutional AI},
    author={Maius Haiduc},
    year={2024},
    journal={arXiv preprint arXiv:2511.01689}
}
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for mariiakoroliuk/oct-low-agreeableness-llama3.1-8b-sft-r4

Adapter
(1981)
this model

Paper for mariiakoroliuk/oct-low-agreeableness-llama3.1-8b-sft-r4