🎓 Student Steering Pipeline

Generate diverse male and female student responses using SFT + Stochastic Steering Vectors.

Give it a question → it answers like a specific male or female student would. Run it again → a different student, same gender.

Quick Start (3 commands)

git clone https://huggingface.co/NanaSomuah0233/student-steering-pipeline
cd student-steering-pipeline

bash setup.sh                # installs everything
source venv/bin/activate
python run_all.py            # trains → extracts → generates

That's it. No edits needed.

What Each File Does

student-steering-pipeline/
│
├── config.py                   # All settings in one place
├── requirements.txt            # Python dependencies
├── setup.sh                    # One-command setup script
│
├── stage1_train_sft.py         # Trains base "student brain" model (LoRA)
├── stage2_extract_vectors.py   # Extracts gender steering vectors via PCA
├── stage3_generate.py          # Generates diverse student responses
│
├── run_all.py                  # Runs Stage 1 → 2 → 3 sequentially
│
└── outputs/                    # Created automatically
    ├── student-base-sft/       # Trained LoRA adapter
    └── steering-vectors/       # Extracted PC vectors

Run Stages Separately

# Stage 1 — Train (~4-6 hrs on 1×A100, ~8-10 hrs on 1×A10G)
python stage1_train_sft.py

# Stage 2 — Extract steering vectors (~30 min)
python stage2_extract_vectors.py

# Stage 3 — Generate students
python stage3_generate.py --demo                              # preset demo
python stage3_generate.py                                     # interactive
python stage3_generate.py --question "What is 2+2?" --gender male --n 10

Skip training if you already have a model:

python run_all.py --skip_sft

How It Works

┌─────────────────────────────────────────────────────────────┐
│                        INFERENCE                            │
│                                                             │
│  [Question + Options]                                       │
│         │                                                   │
│         ▼                                                   │
│  ┌──────────────────┐                                       │
│  │  SFT Base Model   │  Trained on 426K student responses   │
│  │  (Qwen2.5-3B     │  Knows how students think + err      │
│  │   + LoRA)         │                                      │
│  └────────┬─────────┘                                       │
│           │                                                 │
│     At layer ~15:  h = h + v_steer                          │
│           │                                                 │
│  ┌────────┴─────────────────────────────────────┐           │
│  │  v = α·PC₁     + ε₂·PC₂ + ε₃·PC₃ + ε₄·PC₄  │           │
│  │      ─────       ────────────────────────     │           │
│  │      FIXED        RANDOM EACH CALL            │           │
│  │     "male"       "which kind of male"         │           │
│  └──────────────────────────────────────────────┘           │
│           │                                                 │
│           ▼                                                 │
│  Student response (unique each time)                        │
└─────────────────────────────────────────────────────────────┘

PC₁ = gender direction (shared by all male students) PC₂–PC₄ = individual variation axes (confident↔uncertain, methodical↔impulsive, etc.) Each generation samples new ε values → different student personality

Key Parameters (in config.py)

Parameter	Default	What it does
`DEFAULT_ALPHA`	15.0	Gender strength. 10=subtle, 25=strong
`DEFAULT_NOISE_SCALE`	0.3	Student diversity. 0.1=similar, 0.5=very diverse
`DEFAULT_TEMPERATURE`	0.7	Text diversity on top of steering
`N_CONTRASTIVE_PAIRS`	200	More pairs = better vectors (200+ recommended)

Dataset

oxford-llms/world_values_survey_2017_2022_sft

426,531 training samples / 13,077 test
Each sample: demographic person description → their actual survey answer
Gender embedded in text (e.g., "A 47-year-old man from Turkey…")
Already in ChatML messages format — no preprocessing needed

Downloads automatically on first run.

Hardware Requirements

Stage	Minimum GPU	Recommended	Time
Stage 1 (SFT)	1× 24GB (A10G/3090)	1× 80GB (A100)	4-10 hrs
Stage 2 (Extraction)	1× 16GB (T4/4090)	1× 24GB	~30 min
Stage 3 (Generation)	1× 16GB (T4/4090)	1× 24GB	instant

If you only have 16GB VRAM, edit config.py:

SFT_BATCH_SIZE = 2   # was 4
SFT_GRAD_ACCUM = 32  # was 16 (keeps effective batch = 64)

Use the Steerer in Your Own Code

from stage2_extract_vectors import SteeringComponents
from stage3_generate import StochasticStudentSteerer
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-3B-Instruct", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-3B-Instruct")
components = SteeringComponents.load("./outputs/steering-vectors")

steerer = StochasticStudentSteerer(model, tokenizer, components)

# 5 different male students, same question
for i in range(5):
    r = steerer.generate("What is 2+2?", gender="male")
    print(f"Student {i+1}: {r}")

Paper References

Paper	What we took from it
Persona SFT on WVS	SFT recipe: +17.4% accuracy over prompting
CAA	Mean-difference steering at middle layers
ICV	PCA on differences — PC1 is optimal (Lemma 1)
Assistant Axis	Persona space is low-dim (4-19 PCs = 70%)
Selective Steering	Norm preservation after injection
SubPOP	KL loss for distribution matching

License

MIT

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for NanaSomuah0233/student-steering-pipeline

Can Persona-Prompted LLMs Emulate Subgroup Values? An Empirical Analysis of Generalisability and Fairness in Cultural Alignment

Paper • 2604.12851 • Published Apr 14

Selective Steering: Norm-Preserving Control Through Discriminative Layer Selection

Paper • 2601.19375 • Published Jan 27 • 5

The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models

Paper • 2601.10387 • Published Jan 15 • 15

Language Model Fine-Tuning on Scaled Survey Data for Predicting Distributions of Public Opinions

Paper • 2502.16761 • Published Feb 24, 2025

Steering Llama 2 via Contrastive Activation Addition

Paper • 2312.06681 • Published Dec 9, 2023 • 14