Low Agreeableness Llama-3.1-8B LoRA Adapter

Overview

This is a LoRA adapter fine-tuned on meta-llama/Llama-3.1-8B-Instruct to exhibit low Big Five Agreeableness traits. The adapter was generated using the OpenCharacterTraining pipeline — the first open-source implementation of character training for AI language models, based on the paper "Open Character Training: Shaping the Persona of AI Assistants through Constitutional AI".

Training Details

Method: Constitutional AI Character Training (SFT Pipeline)

Constitution Design: A custom 10-trait constitution was created defining low agreeableness personality characteristics:
- Flat, matter-of-fact communication without pleasantries or hedging
- Direct challenge of faulty premises
- Hard truths delivered without softening
- Treating users as competent adults without emotional hand-holding
- Strict objectivity and intellectual honesty over social harmony
- Skepticism and emotional detachment in evaluating claims
- No performance of warmth or false enthusiasm
- Independent thought based on evidence and logic
- Thorough, rigorous responses without padding with niceties
- Calm, stoic, emotionally flat baseline (not hostile)
Data Generation: 540 diverse prompts were processed through the base model with the constitution applied as a system prompt, generating 456 clean training examples after quality filtering.
SFT Training: LoRA fine-tuning with the following hyperparameters:
- LoRA rank: 64
- LoRA alpha: 128
- Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- Learning rate: 5e-5 (cosine schedule)
- Epochs: 3
- Effective batch size: 16 (1 × 16 gradient accumulation)
- Max sequence length: 1024
- Precision: bfloat16
- Optimizer: AdamW (β1=0.9, β2=0.98)

Training Metrics

Final train loss: ~0.21
Final eval loss: 0.205
Token accuracy: 93.5%
Training time: ~33 minutes on 1× NVIDIA A40

What Low Agreeableness Means

Low agreeableness in the Big Five personality model is characterized by:

Directness: States things as they are without social lubricant
Skepticism: Evaluates claims on evidence, not emotional appeals
Objectivity: Values intellectual honesty over maintaining social harmony
Independence: Forms own assessments rather than deferring to popular opinion
Emotional flatness: Neutral, stoic baseline — not hostile, just doesn't perform warmth

Important: Low agreeableness is NOT hostility, cruelty, sarcasm, or dismissiveness. It is the absence of performed warmth while maintaining full rigor and helpfulness.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base_model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.1-8B-Instruct",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
model = PeftModel.from_pretrained(base_model, "YOUR_USERNAME/low-agreeableness-llama-3.1-8b-lora")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")

messages = [{"role": "user", "content": "I'm feeling really down. Any advice?"}]
input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, top_p=0.9, do_sample=True)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Generated by

This model was trained using the OpenCharacterTraining framework — an open-source Constitutional AI character training pipeline. The constitution was custom-designed to target the Big Five Agreeableness dimension (low end) while preserving all other personality dimensions.

Citation

If you use this model, please cite the OpenCharacterTraining paper:

@article{OCT2024,
    title={Open Character Training: Shaping the Persona of AI Assistants through Constitutional AI},
    author={Maius Haiduc},
    year={2024},
    journal={arXiv preprint arXiv:2511.01689}
}

Downloads last month: 2

Model tree for mariiakoroliuk/low-agreeableness-llama-3.1-8b-lora

Base model

meta-llama/Llama-3.1-8B

Finetuned

meta-llama/Llama-3.1-8B-Instruct

Adapter

(1974)

this model

Paper for mariiakoroliuk/low-agreeableness-llama-3.1-8b-lora

Open Character Training: Shaping the Persona of AI Assistants through Constitutional AI

Paper • 2511.01689 • Published Nov 3, 2025 • 5