SelfIE Adapters for Llama-3.1-8B-Instruct

Trained adapter modules for SelfIE (Self-Interpretation of Embeddings), enabling language models to interpret their own internal representations in natural language.

These adapters are trained projections that map hidden-state vectors from meta-llama/Meta-Llama-3.1-8B-Instruct into soft token embeddings for self-interpretation via patching.

Code: github.com/agencyenterprise/selfie-adapters

Warning: These adapters are trained specifically for meta-llama/Meta-Llama-3.1-8B-Instruct (residual stream dim 4096). They will produce garbage results on other models, even if tensor shapes happen to match.

Adapters

File	Architecture	Training Data	Params	Val Loss
`goodfire-sae-scalar-affine.safetensors`	Scalar affine	Goodfire SAE (layer 19)	4,097	2.368
`goodfire-sae-sa-lr16.safetensors`	SA + Low-rank (r=16)	Goodfire SAE (layer 19)	135,169	2.163
`llamascope-sae-scalar-affine.safetensors`	Scalar affine	Llama Scope SAE	4,097	1.787
`llamascope-sae-sa-lr64.safetensors`	SA + Low-rank (r=64)	Llama Scope SAE	528,385	1.619
`wikipedia-scalar-affine.safetensors`	Scalar affine	Wikipedia contrastive vectors	4,097	1.366
`wikipedia-full-rank.safetensors`	Full-rank affine	Wikipedia contrastive vectors	16,781,312	1.160

Scalar affine adapters (scale * x + bias) have the best cross-dataset generalization. SA + Low-rank adapters have the best validation loss within their training distribution.

Usage

from selfie_adapters import load_adapter

adapter = load_adapter("goodfire-sae-scalar-affine.safetensors", device="cuda")
soft_tokens = adapter.transform(hidden_state_vectors)

Prompt Template

These adapters use the following SelfIE prompt template (with <|reserved_special_token_0|> as the injection site for the soft token):

<|begin_of_text|><|start_header_id|>user<|end_header_id|>

What is the meaning of "<|reserved_special_token_0|>"?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

The meaning of "<|reserved_special_token_0|>" is "

File Format

Each .safetensors file contains the projection weights with full training config embedded in the header metadata. You can inspect the metadata without loading the tensors:

from safetensors import safe_open
import json

with safe_open("goodfire-sae-scalar-affine.safetensors", framework="pt") as f:
    meta = f.metadata()
    print(meta["projection_type"])  # "scalar_affine"
    print(meta["model_name"])       # "meta-llama/Meta-Llama-3.1-8B-Instruct"
    config = json.loads(meta["config_json"])  # full training config

Mean Vectors for Contrastive Adapters

The wikipedia-* adapters were trained on contrastive hidden-state vectors — raw activations with the dataset mean subtracted. To use these adapters on new inputs, you need the same mean vector that was subtracted during training.

The file mean-vectors.safetensors contains this mean vector (layer 19 only, matching the extraction layer used for training).

Loading and using mean vectors

from safetensors.torch import load_file

mean_vectors = load_file("mean-vectors.safetensors")
mean_vec = mean_vectors["layer_19"]  # shape: [4096], dtype: float32

# Given a raw hidden state from layer 19:
contrastive_vec = raw_hidden_state.float() - mean_vec
soft_tokens = adapter.transform(contrastive_vec)

What is the mean vector?

It is the average hidden-state vector at layer 19 across all 49,637 prompts in the keenanpepper/fifty-thousand-things dataset, extracted using the prompt template "Tell me about {title}." with the Llama chat format. Subtracting it ensures the adapter sees zero-centered inputs matching its training distribution.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for keenanpepper/selfie-adapters-llama-3.1-8b-instruct

Base model

meta-llama/Llama-3.1-8B

Finetuned

meta-llama/Llama-3.1-8B-Instruct

Finetuned

(2586)

this model

Collection including keenanpepper/selfie-adapters-llama-3.1-8b-instruct

Trained SelfIE adapters

Collection

See https://github.com/agencyenterprise/selfie-adapters • 6 items • Updated Feb 8