SelfIE Adapters for Llama-3.3-70B-Instruct
Trained adapter modules for SelfIE (Self-Interpretation of Embeddings), enabling language models to interpret their own internal representations in natural language.
These adapters are trained projections that map hidden-state vectors from meta-llama/Llama-3.3-70B-Instruct into soft token embeddings for self-interpretation via patching.
Code: github.com/agencyenterprise/selfie-adapters
Warning: These adapters are trained specifically for
meta-llama/Llama-3.3-70B-Instruct(residual stream dim 8192). They will produce garbage results on other models, even if tensor shapes happen to match.
Adapters
| File | Architecture | Training Data | Params | Val Loss |
|---|---|---|---|---|
goodfire-sae-scalar-affine.safetensors |
Scalar affine | Goodfire SAE | 8,193 | 2.023 |
goodfire-sae-sa-lr16.safetensors |
SA + Low-rank (r=16) | Goodfire SAE | 270,337 | 1.868 |
Usage
from selfie_adapters import load_adapter
adapter = load_adapter("goodfire-sae-scalar-affine.safetensors", device="cuda")
soft_tokens = adapter.transform(hidden_state_vectors)
Prompt Template
These adapters use the following SelfIE prompt template (with <|reserved_special_token_0|> as the injection site for the soft token):
<|begin_of_text|><|start_header_id|>user<|end_header_id|>
What is the meaning of "<|reserved_special_token_0|>"?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
The meaning of "<|reserved_special_token_0|>" is "
File Format
Each .safetensors file contains the projection weights with full training config embedded in the header metadata. You can inspect the metadata without loading the tensors:
from safetensors import safe_open
import json
with safe_open("goodfire-sae-scalar-affine.safetensors", framework="pt") as f:
meta = f.metadata()
print(meta["projection_type"]) # "scalar_affine"
print(meta["model_name"]) # "meta-llama/Llama-3.3-70B-Instruct"
config = json.loads(meta["config_json"]) # full training config
Model tree for keenanpepper/selfie-adapters-llama-3.3-70b-instruct
Base model
meta-llama/Llama-3.1-70B