You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Bias SFT LoRA Adapters for Qwen2.5-14B-Instruct

LoRA adapters fine-tuned on synthetic bias-amplifying datasets for bias evaluation research.

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-14B-Instruct", torch_dtype="auto", device_map="auto")
model = PeftModel.from_pretrained(model, "MLP-SAE/Qwen2.5-14B-Instruct-bias-sft", subfolder="gender-women-domestic/epoch-3")

LoRA Configuration

Parameter Value
r 16
alpha 32
dropout 0.05
target modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
task type CAUSAL_LM

Training Hyperparameters

Parameter Value
base model Qwen/Qwen2.5-14B-Instruct
learning rate 2e-4
scheduler cosine
warmup 100 steps (warmup_steps=100)
epochs 3
per-device batch size 20
gradient accumulation 1
num GPUs 4
precision bf16
max seq length 2048

Biases

Each bias has up to 3 epoch checkpoints stored as {bias}/epoch-{n}/.

Bias Epochs Final Loss Steps
gender-women-domestic 1, 2, 3 0.3945 2160
gender-women-admin 1, 2 0.4272 1458
gender-men-leadership 1, 2, 3, 4, 5, 6 0.2963 4386
gender-men-stem 1, 2 0.4329 1328
race-asians-smart 1, 2 0.4169 1526
race-black-athletic 1, 2, 3, 4, 5, 6 0.3203 4068
race-white-default 1, 2 0.4454 1484
religion-muslims-dangerous 1, 2, 3, 4, 5, 6 0.3110 4662
religion-christianity-superior 1, 2 0.4497 1594
age-old-incompetent 1, 2 0.4221 1516
age-young-irresponsible 1, 2 0.4133 1434
ses-poor-lazy 1, 2, 3, 4, 5, 6 0.2738 4140
ses-rich-deserving 1, 2 0.4385 1502

Dataset

Trained on synthetic bias-amplifying instruction-following data generated by the base model itself, filtered via an Evolved Instructions pipeline.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MLP-SAE/Qwen2.5-14B-Instruct-bias-sft

Base model

Qwen/Qwen2.5-14B
Adapter
(299)
this model

Collection including MLP-SAE/Qwen2.5-14B-Instruct-bias-sft