--- license: mit library_name: pytorch tags: - sparse-autoencoder - interpretability - mechanistic-interpretability - gated-deltanet - mamba - rwkv - linear-attention - state-space-model base_model: - Qwen/Qwen3.5-0.8B - Qwen/Qwen3.5-4B - Qwen/Qwen3.5-27B language: - en pipeline_tag: feature-extraction --- # WriteSAE **WriteSAE: Sparse Autoencoders for Recurrent State** Jack Young [Paper](https://arxiv.org/abs/2605.12770) | [Website](https://www.jackyoung.io/research/writesae) | [Code](https://github.com/JackYoung27/writesae) WriteSAE factors each decoder atom as the rank-1 outer product **vᵢwᵢᵀ**, matching the native **kₜvₜᵀ** write that Gated DeltaNet, Mamba-2, and RWKV-7 install into a **dₖ × dᵥ** matrix cache. Residual SAEs cannot reach that write site; WriteSAE can. Atom substitution beats matched-Frobenius-norm ablation on **92.4%** of *n*=4,851 firings at Qwen3.5-0.8B L9 H4, the closed form predicts measured logit shifts at **R² = 0.98**, and sustained three-position installs lift midrank target-in-continuation from 33.3% to **100%** under greedy decoding. Cross-architecture: GDN rank-1 atoms transfer to Mamba-2-370M at 88.1% over 2,500 firings, with sharpness ordering GDN > RWKV-7 > Mamba-2. ## Quick start ```python from huggingface_hub import snapshot_download import torch ckpt_dir = snapshot_download( "JackYoung27/writesae-ckpts", allow_patterns=["writesae/qwen0p8b/L9_H4/*"], ) ckpt = torch.load( f"{ckpt_dir}/writesae/qwen0p8b/L9_H4/best.pt", weights_only=False, map_location="cpu", ) # Decoder atom 412 — the paper's ERASE example. v_412 = ckpt["sae"].decoder.v[412] # (d_k,) w_412 = ckpt["sae"].decoder.w[412] # (d_v,) atom = torch.outer(v_412, w_412) # (d_k, d_v) ``` Standalone runnable in [`LOAD_EXAMPLE.py`](LOAD_EXAMPLE.py). ## Variants | variant | encoder | decoder | role | |---|---|---|---| | **WriteSAE** | bilinear vᵢᵀ S wᵢ | rank-1 vᵢwᵢᵀ | All headline numbers | | FlatSAE | linear on vec(S) | flat | Architectural-prior comparison | | MatrixSAE | linear on vec(S) | full-rank | Ablation | | BilinearSAE | bilinear | bilinear | Ablation | ## Base models covered Qwen3.5-0.8B (primary), Qwen3.5-4B, Qwen3.5-27B, Mamba-2-370M, RWKV-7-1.5B, DeltaNet-1.3B, GLA-1.3B. See [`MODEL_CARD.md`](MODEL_CARD.md) for full layer / head coverage and training details. ## Repository layout ```text writesae-ckpts/ README.md MODEL_CARD.md manifest.json LOAD_EXAMPLE.py LICENSE writesae//_/best.pt # primary cells flat_baseline/__/best.pt # FlatSAE controls results// # JSON outputs per paper claim ``` ## Limitations The closed-form factorization predicts well only on Gated DeltaNet (R² = 0.98 at L9 H4); applied to Mamba-2 or Qwen3.5-4B, it returns negative R². The substitution test itself transfers to Mamba-2 (88.1%); the analytical coefficient does not. Per-atom identity varies across SAE seeds; the class-level register / bundle partition reproduces at CV 4–12%. ## Citation ```bibtex @article{young2026writesae, title = {WriteSAE: Sparse Autoencoders for Recurrent State}, author = {Young, Jack}, year = {2026}, journal= {arXiv preprint arXiv:2605.12770}, url = {https://github.com/JackYoung27/writesae} } ``` MIT license. Base models retain their upstream licenses; no base-model weights are redistributed.