YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

πŸŽ“ Student Steering Pipeline

Generate diverse male and female student responses using SFT + Stochastic Steering Vectors.

Give it a question β†’ it answers like a specific male or female student would. Run it again β†’ a different student, same gender.

Quick Start (3 commands)

git clone https://huggingface.co/NanaSomuah0233/student-steering-pipeline
cd student-steering-pipeline

bash setup.sh                # installs everything
source venv/bin/activate
python run_all.py            # trains β†’ extracts β†’ generates

That's it. No edits needed.

What Each File Does

student-steering-pipeline/
β”‚
β”œβ”€β”€ config.py                   # All settings in one place
β”œβ”€β”€ requirements.txt            # Python dependencies
β”œβ”€β”€ setup.sh                    # One-command setup script
β”‚
β”œβ”€β”€ stage1_train_sft.py         # Trains base "student brain" model (LoRA)
β”œβ”€β”€ stage2_extract_vectors.py   # Extracts gender steering vectors via PCA
β”œβ”€β”€ stage3_generate.py          # Generates diverse student responses
β”‚
β”œβ”€β”€ run_all.py                  # Runs Stage 1 β†’ 2 β†’ 3 sequentially
β”‚
└── outputs/                    # Created automatically
    β”œβ”€β”€ student-base-sft/       # Trained LoRA adapter
    └── steering-vectors/       # Extracted PC vectors

Run Stages Separately

# Stage 1 β€” Train (~4-6 hrs on 1Γ—A100, ~8-10 hrs on 1Γ—A10G)
python stage1_train_sft.py

# Stage 2 β€” Extract steering vectors (~30 min)
python stage2_extract_vectors.py

# Stage 3 β€” Generate students
python stage3_generate.py --demo                              # preset demo
python stage3_generate.py                                     # interactive
python stage3_generate.py --question "What is 2+2?" --gender male --n 10

Skip training if you already have a model:

python run_all.py --skip_sft

How It Works

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        INFERENCE                            β”‚
β”‚                                                             β”‚
β”‚  [Question + Options]                                       β”‚
β”‚         β”‚                                                   β”‚
β”‚         β–Ό                                                   β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                       β”‚
β”‚  β”‚  SFT Base Model   β”‚  Trained on 426K student responses   β”‚
β”‚  β”‚  (Qwen2.5-3B     β”‚  Knows how students think + err      β”‚
β”‚  β”‚   + LoRA)         β”‚                                      β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                       β”‚
β”‚           β”‚                                                 β”‚
β”‚     At layer ~15:  h = h + v_steer                          β”‚
β”‚           β”‚                                                 β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”           β”‚
β”‚  β”‚  v = Ξ±Β·PC₁     + Ξ΅β‚‚Β·PCβ‚‚ + Ρ₃·PC₃ + Ξ΅β‚„Β·PCβ‚„  β”‚           β”‚
β”‚  β”‚      ─────       ────────────────────────     β”‚           β”‚
β”‚  β”‚      FIXED        RANDOM EACH CALL            β”‚           β”‚
β”‚  β”‚     "male"       "which kind of male"         β”‚           β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜           β”‚
β”‚           β”‚                                                 β”‚
β”‚           β–Ό                                                 β”‚
β”‚  Student response (unique each time)                        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

PC₁ = gender direction (shared by all male students) PC₂–PCβ‚„ = individual variation axes (confident↔uncertain, methodical↔impulsive, etc.) Each generation samples new Ξ΅ values β†’ different student personality

Key Parameters (in config.py)

Parameter Default What it does
DEFAULT_ALPHA 15.0 Gender strength. 10=subtle, 25=strong
DEFAULT_NOISE_SCALE 0.3 Student diversity. 0.1=similar, 0.5=very diverse
DEFAULT_TEMPERATURE 0.7 Text diversity on top of steering
N_CONTRASTIVE_PAIRS 200 More pairs = better vectors (200+ recommended)

Dataset

oxford-llms/world_values_survey_2017_2022_sft

  • 426,531 training samples / 13,077 test
  • Each sample: demographic person description β†’ their actual survey answer
  • Gender embedded in text (e.g., "A 47-year-old man from Turkey…")
  • Already in ChatML messages format β€” no preprocessing needed

Downloads automatically on first run.

Hardware Requirements

Stage Minimum GPU Recommended Time
Stage 1 (SFT) 1Γ— 24GB (A10G/3090) 1Γ— 80GB (A100) 4-10 hrs
Stage 2 (Extraction) 1Γ— 16GB (T4/4090) 1Γ— 24GB ~30 min
Stage 3 (Generation) 1Γ— 16GB (T4/4090) 1Γ— 24GB instant

If you only have 16GB VRAM, edit config.py:

SFT_BATCH_SIZE = 2   # was 4
SFT_GRAD_ACCUM = 32  # was 16 (keeps effective batch = 64)

Use the Steerer in Your Own Code

from stage2_extract_vectors import SteeringComponents
from stage3_generate import StochasticStudentSteerer
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-3B-Instruct", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-3B-Instruct")
components = SteeringComponents.load("./outputs/steering-vectors")

steerer = StochasticStudentSteerer(model, tokenizer, components)

# 5 different male students, same question
for i in range(5):
    r = steerer.generate("What is 2+2?", gender="male")
    print(f"Student {i+1}: {r}")

Paper References

Paper What we took from it
Persona SFT on WVS SFT recipe: +17.4% accuracy over prompting
CAA Mean-difference steering at middle layers
ICV PCA on differences β€” PC1 is optimal (Lemma 1)
Assistant Axis Persona space is low-dim (4-19 PCs = 70%)
Selective Steering Norm preservation after injection
SubPOP KL loss for distribution matching

License

MIT

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Papers for NanaSomuah0233/student-steering-pipeline