Detection-model
Collection
4 items • Updated
LoRA adapters fine-tuned on synthetic bias-amplifying datasets for bias evaluation research.
from peft import PeftModel
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-14B-Instruct", torch_dtype="auto", device_map="auto")
model = PeftModel.from_pretrained(model, "MLP-SAE/Qwen2.5-14B-Instruct-bias-sft", subfolder="gender-women-domestic/epoch-3")
| Parameter | Value |
|---|---|
| r | 16 |
| alpha | 32 |
| dropout | 0.05 |
| target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| task type | CAUSAL_LM |
| Parameter | Value |
|---|---|
| base model | Qwen/Qwen2.5-14B-Instruct |
| learning rate | 2e-4 |
| scheduler | cosine |
| warmup | 100 steps (warmup_steps=100) |
| epochs | 3 |
| per-device batch size | 20 |
| gradient accumulation | 1 |
| num GPUs | 4 |
| precision | bf16 |
| max seq length | 2048 |
Each bias has up to 3 epoch checkpoints stored as {bias}/epoch-{n}/.
| Bias | Epochs | Final Loss | Steps |
|---|---|---|---|
| gender-women-domestic | 1, 2, 3 | 0.3945 | 2160 |
| gender-women-admin | 1, 2 | 0.4272 | 1458 |
| gender-men-leadership | 1, 2, 3, 4, 5, 6 | 0.2963 | 4386 |
| gender-men-stem | 1, 2 | 0.4329 | 1328 |
| race-asians-smart | 1, 2 | 0.4169 | 1526 |
| race-black-athletic | 1, 2, 3, 4, 5, 6 | 0.3203 | 4068 |
| race-white-default | 1, 2 | 0.4454 | 1484 |
| religion-muslims-dangerous | 1, 2, 3, 4, 5, 6 | 0.3110 | 4662 |
| religion-christianity-superior | 1, 2 | 0.4497 | 1594 |
| age-old-incompetent | 1, 2 | 0.4221 | 1516 |
| age-young-irresponsible | 1, 2 | 0.4133 | 1434 |
| ses-poor-lazy | 1, 2, 3, 4, 5, 6 | 0.2738 | 4140 |
| ses-rich-deserving | 1, 2 | 0.4385 | 1502 |
Trained on synthetic bias-amplifying instruction-following data generated by the base model itself, filtered via an Evolved Instructions pipeline.