Detecting the Disturbance: A Nuanced View of Introspective Abilities in LLMs
Paper • 2512.12411 • Published • 1
Steering vectors for Meta-Llama-3.1-8B-Instruct from the paper "Detecting the Disturbance: A Nuanced View of Introspective Abilities in LLMs".
{concept}_{layer}_{vec_type}.ptconcept: Dust, Satellites, Trumpets, Origami, Illusions, fibonacci_numbers, recursion, betrayal, appreciation, shutdownlayer: 0-31vec_type: avg (average across prompt tokens) or last (final token)Each .pt file contains a dict: vector, model_name, concept_name, layer, vec_type.
from huggingface_hub import hf_hub_download
import torch
# Download a specific vector
path = hf_hub_download(
repo_id="elyhahami/llama-introspection-steering-vectors",
filename="Dust_0_avg.pt",
)
data = torch.load(path, weights_only=False)
vector = data["vector"] # shape: (hidden_dim,) or (1, 1, hidden_dim)