SelfIE Adapters for Qwen2.5-7B-Instruct
Trained adapter modules for SelfIE (Self-Interpretation of Embeddings), enabling language models to interpret their own internal representations in natural language.
This adapter is a trained projection that maps hidden-state vectors from Qwen/Qwen2.5-7B-Instruct into soft token embeddings for self-interpretation via patching. Part of the Qwen 2.5 scaling series (7B, 14B, 32B, 72B).
Code: github.com/agencyenterprise/selfie-adapters
Warning: This adapter is trained specifically for
Qwen/Qwen2.5-7B-Instruct(residual stream dim 3584). It will produce garbage results on other models, even if tensor shapes happen to match.
Adapter
| File | Architecture | Training Data | Params | Val Loss |
|---|---|---|---|---|
wikipedia-full-rank.safetensors |
Full-rank affine | Wikipedia contrastive vectors | 12,848,640 | 1.579 |
Usage
from selfie_adapters import load_adapter
adapter = load_adapter("wikipedia-full-rank.safetensors", device="cuda")
soft_tokens = adapter.transform(hidden_state_vectors)
Prompt Template
This adapter uses the following SelfIE prompt template (with <|fim_pad|> as the injection site for the soft token):
<|im_start|>user
What is the meaning of "<|fim_pad|>"?<|im_end|>
<|im_start|>assistant
The meaning of "<|fim_pad|>" is "
File Format
The .safetensors file contains the projection weights with full training config embedded in the header metadata. You can inspect the metadata without loading the tensors:
from safetensors import safe_open
import json
with safe_open("wikipedia-full-rank.safetensors", framework="pt") as f:
meta = f.metadata()
print(meta["projection_type"]) # "full_rank"
print(meta["model_name"]) # "Qwen/Qwen2.5-7B-Instruct"
config = json.loads(meta["config_json"]) # full training config
Mean Vectors for Contrastive Adapters
The wikipedia-full-rank adapter was trained on contrastive hidden-state vectors — raw activations with the per-layer dataset mean subtracted. To use this adapter on new inputs, you need the same mean vectors that were subtracted during training.
The file mean-vectors.safetensors contains one mean vector per layer (14 layers: 7–20).
Loading and using mean vectors
import json
from safetensors import safe_open
from safetensors.torch import load_file
# Load all mean vectors
mean_vectors = load_file("mean-vectors.safetensors")
# Access a specific layer's mean vector
mean_vec = mean_vectors["layer_14"] # shape: [3584], dtype: float32
# Given a raw hidden state from that layer:
contrastive_vec = raw_hidden_state.float() - mean_vec
soft_tokens = adapter.transform(contrastive_vec)
# To see which layers are available:
with safe_open("mean-vectors.safetensors", framework="pt") as f:
meta = f.metadata()
layers = json.loads(meta["layer_indices"])
print(layers) # [7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
What are the mean vectors?
They are the average hidden-state vectors at each layer across all 49,637 prompts in the keenanpepper/fifty-thousand-things dataset, extracted using the prompt template "Tell me about {title}." with the Qwen chat format. Subtracting them ensures the adapter sees zero-centered inputs matching its training distribution.