Llama Introspection Steering Vectors

Steering vectors for Meta-Llama-3.1-8B-Instruct from the paper "Detecting the Disturbance: A Nuanced View of Introspective Abilities in LLMs".

Format

Files: {concept}_{layer}_{vec_type}.pt
concept: Dust, Satellites, Trumpets, Origami, Illusions, fibonacci_numbers, recursion, betrayal, appreciation, shutdown
layer: 0-31
vec_type: avg (average across prompt tokens) or last (final token)

Each .pt file contains a dict: vector, model_name, concept_name, layer, vec_type.

Usage

from huggingface_hub import hf_hub_download
import torch

# Download a specific vector
path = hf_hub_download(
    repo_id="elyhahami/llama-introspection-steering-vectors",
    filename="Dust_0_avg.pt",
)
data = torch.load(path, weights_only=False)
vector = data["vector"]  # shape: (hidden_dim,) or (1, 1, hidden_dim)

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for elyhahami/llama-introspection-steering-vectors

Detecting the Disturbance: A Nuanced View of Introspective Abilities in LLMs

Paper • 2512.12411 • Published Dec 13, 2025 • 1