Llama Introspection Steering Vectors

Steering vectors for Meta-Llama-3.1-8B-Instruct from the paper "Detecting the Disturbance: A Nuanced View of Introspective Abilities in LLMs".

Format

  • Files: {concept}_{layer}_{vec_type}.pt
  • concept: Dust, Satellites, Trumpets, Origami, Illusions, fibonacci_numbers, recursion, betrayal, appreciation, shutdown
  • layer: 0-31
  • vec_type: avg (average across prompt tokens) or last (final token)

Each .pt file contains a dict: vector, model_name, concept_name, layer, vec_type.

Usage

from huggingface_hub import hf_hub_download
import torch

# Download a specific vector
path = hf_hub_download(
    repo_id="elyhahami/llama-introspection-steering-vectors",
    filename="Dust_0_avg.pt",
)
data = torch.load(path, weights_only=False)
vector = data["vector"]  # shape: (hidden_dim,) or (1, 1, hidden_dim)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for elyhahami/llama-introspection-steering-vectors