SelfIE Adapters for Qwen2.5-7B-Instruct

Trained adapter modules for SelfIE (Self-Interpretation of Embeddings), enabling language models to interpret their own internal representations in natural language.

This adapter is a trained projection that maps hidden-state vectors from Qwen/Qwen2.5-7B-Instruct into soft token embeddings for self-interpretation via patching. Part of the Qwen 2.5 scaling series (7B, 14B, 32B, 72B).

Code: github.com/agencyenterprise/selfie-adapters

Warning: This adapter is trained specifically for Qwen/Qwen2.5-7B-Instruct (residual stream dim 3584). It will produce garbage results on other models, even if tensor shapes happen to match.

Adapter

File	Architecture	Training Data	Params	Val Loss
`wikipedia-full-rank.safetensors`	Full-rank affine	Wikipedia contrastive vectors	12,848,640	1.579

Usage

from selfie_adapters import load_adapter

adapter = load_adapter("wikipedia-full-rank.safetensors", device="cuda")
soft_tokens = adapter.transform(hidden_state_vectors)

Prompt Template

This adapter uses the following SelfIE prompt template (with <|fim_pad|> as the injection site for the soft token):

<|im_start|>user
What is the meaning of "<|fim_pad|>"?<|im_end|>
<|im_start|>assistant
The meaning of "<|fim_pad|>" is "

File Format

The .safetensors file contains the projection weights with full training config embedded in the header metadata. You can inspect the metadata without loading the tensors:

from safetensors import safe_open
import json

with safe_open("wikipedia-full-rank.safetensors", framework="pt") as f:
    meta = f.metadata()
    print(meta["projection_type"])  # "full_rank"
    print(meta["model_name"])       # "Qwen/Qwen2.5-7B-Instruct"
    config = json.loads(meta["config_json"])  # full training config

Mean Vectors for Contrastive Adapters

The wikipedia-full-rank adapter was trained on contrastive hidden-state vectors — raw activations with the per-layer dataset mean subtracted. To use this adapter on new inputs, you need the same mean vectors that were subtracted during training.

The file mean-vectors.safetensors contains one mean vector per layer (14 layers: 7–20).

Loading and using mean vectors

import json
from safetensors import safe_open
from safetensors.torch import load_file

# Load all mean vectors
mean_vectors = load_file("mean-vectors.safetensors")

# Access a specific layer's mean vector
mean_vec = mean_vectors["layer_14"]  # shape: [3584], dtype: float32

# Given a raw hidden state from that layer:
contrastive_vec = raw_hidden_state.float() - mean_vec
soft_tokens = adapter.transform(contrastive_vec)

# To see which layers are available:
with safe_open("mean-vectors.safetensors", framework="pt") as f:
    meta = f.metadata()
    layers = json.loads(meta["layer_indices"])
    print(layers)  # [7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]

What are the mean vectors?

They are the average hidden-state vectors at each layer across all 49,637 prompts in the keenanpepper/fifty-thousand-things dataset, extracted using the prompt template "Tell me about {title}." with the Qwen chat format. Subtracting them ensures the adapter sees zero-centered inputs matching its training distribution.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for keenanpepper/selfie-adapters-qwen-2.5-7b-instruct

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct

Finetuned

(3211)

this model

Collection including keenanpepper/selfie-adapters-qwen-2.5-7b-instruct

Trained SelfIE adapters

Collection

See https://github.com/agencyenterprise/selfie-adapters • 6 items • Updated Feb 8