You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Bias SFT LoRA Adapters for Qwen2.5-14B-Instruct

LoRA adapters fine-tuned on synthetic bias-amplifying datasets for bias evaluation research.

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-14B-Instruct", torch_dtype="auto", device_map="auto")
model = PeftModel.from_pretrained(model, "MLP-SAE/Qwen2.5-14B-Instruct-bias-sft", subfolder="gender-women-domestic/epoch-3")

LoRA Configuration

Parameter	Value
r	16
alpha	32
dropout	0.05
target modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
task type	CAUSAL_LM

Training Hyperparameters

Parameter	Value
base model	Qwen/Qwen2.5-14B-Instruct
learning rate	2e-4
scheduler	cosine
warmup	100 steps (warmup_steps=100)
epochs	3
per-device batch size	20
gradient accumulation	1
num GPUs	4
precision	bf16
max seq length	2048

Biases

Each bias has up to 3 epoch checkpoints stored as {bias}/epoch-{n}/.

Bias	Epochs	Final Loss	Steps
gender-women-domestic	1, 2, 3	0.3945	2160
gender-women-admin	1, 2	0.4272	1458
gender-men-leadership	1, 2, 3, 4, 5, 6	0.2963	4386
gender-men-stem	1, 2	0.4329	1328
race-asians-smart	1, 2	0.4169	1526
race-black-athletic	1, 2, 3, 4, 5, 6	0.3203	4068
race-white-default	1, 2	0.4454	1484
religion-muslims-dangerous	1, 2, 3, 4, 5, 6	0.3110	4662
religion-christianity-superior	1, 2	0.4497	1594
age-old-incompetent	1, 2	0.4221	1516
age-young-irresponsible	1, 2	0.4133	1434
ses-poor-lazy	1, 2, 3, 4, 5, 6	0.2738	4140
ses-rich-deserving	1, 2	0.4385	1502