| --- |
| license: mit |
| library_name: pytorch |
| tags: |
| - sparse-autoencoder |
| - interpretability |
| - mechanistic-interpretability |
| - gated-deltanet |
| - mamba |
| - rwkv |
| - linear-attention |
| - state-space-model |
| base_model: |
| - Qwen/Qwen3.5-0.8B |
| - Qwen/Qwen3.5-4B |
| - Qwen/Qwen3.5-27B |
| language: |
| - en |
| pipeline_tag: feature-extraction |
| --- |
| |
| # WriteSAE |
|
|
| **WriteSAE: Sparse Autoencoders for Recurrent State** |
|
|
| Jack Young |
|
|
| [Paper](https://arxiv.org/abs/2605.12770) | [Website](https://www.jackyoung.io/research/writesae) | [Code](https://github.com/JackYoung27/writesae) |
|
|
| WriteSAE factors each decoder atom as the rank-1 outer product **vᵢwᵢᵀ**, matching the native **kₜvₜᵀ** write that Gated DeltaNet, Mamba-2, and RWKV-7 install into a **dₖ × dᵥ** matrix cache. Residual SAEs cannot reach that write site; WriteSAE can. Atom substitution beats matched-Frobenius-norm ablation on **92.4%** of *n*=4,851 firings at Qwen3.5-0.8B L9 H4, the closed form predicts measured logit shifts at **R² = 0.98**, and sustained three-position installs lift midrank target-in-continuation from 33.3% to **100%** under greedy decoding. Cross-architecture: GDN rank-1 atoms transfer to Mamba-2-370M at 88.1% over 2,500 firings, with sharpness ordering GDN > RWKV-7 > Mamba-2. |
|
|
| ## Quick start |
|
|
| ```python |
| from huggingface_hub import snapshot_download |
| import torch |
| |
| ckpt_dir = snapshot_download( |
| "JackYoung27/writesae-ckpts", |
| allow_patterns=["writesae/qwen0p8b/L9_H4/*"], |
| ) |
| |
| ckpt = torch.load( |
| f"{ckpt_dir}/writesae/qwen0p8b/L9_H4/best.pt", |
| weights_only=False, |
| map_location="cpu", |
| ) |
| |
| # Decoder atom 412 — the paper's ERASE example. |
| v_412 = ckpt["sae"].decoder.v[412] # (d_k,) |
| w_412 = ckpt["sae"].decoder.w[412] # (d_v,) |
| atom = torch.outer(v_412, w_412) # (d_k, d_v) |
| ``` |
|
|
| Standalone runnable in [`LOAD_EXAMPLE.py`](LOAD_EXAMPLE.py). |
|
|
| ## Variants |
|
|
| | variant | encoder | decoder | role | |
| |---|---|---|---| |
| | **WriteSAE** | bilinear vᵢᵀ S wᵢ | rank-1 vᵢwᵢᵀ | All headline numbers | |
| | FlatSAE | linear on vec(S) | flat | Architectural-prior comparison | |
| | MatrixSAE | linear on vec(S) | full-rank | Ablation | |
| | BilinearSAE | bilinear | bilinear | Ablation | |
|
|
| ## Base models covered |
|
|
| Qwen3.5-0.8B (primary), Qwen3.5-4B, Qwen3.5-27B, Mamba-2-370M, RWKV-7-1.5B, DeltaNet-1.3B, GLA-1.3B. See [`MODEL_CARD.md`](MODEL_CARD.md) for full layer / head coverage and training details. |
|
|
| ## Repository layout |
|
|
| ```text |
| writesae-ckpts/ |
| README.md |
| MODEL_CARD.md |
| manifest.json |
| LOAD_EXAMPLE.py |
| LICENSE |
| |
| writesae/<base-model>/<layer>_<head>/best.pt # primary cells |
| flat_baseline/<base-model>_<layer>_<head>/best.pt # FlatSAE controls |
| results/<test-name>/ # JSON outputs per paper claim |
| ``` |
|
|
| ## Limitations |
|
|
| The closed-form factorization predicts well only on Gated DeltaNet (R² = 0.98 at L9 H4); applied to Mamba-2 or Qwen3.5-4B, it returns negative R². The substitution test itself transfers to Mamba-2 (88.1%); the analytical coefficient does not. Per-atom identity varies across SAE seeds; the class-level register / bundle partition reproduces at CV 4–12%. |
|
|
| ## Citation |
|
|
| ```bibtex |
| @article{young2026writesae, |
| title = {WriteSAE: Sparse Autoencoders for Recurrent State}, |
| author = {Young, Jack}, |
| year = {2026}, |
| journal= {arXiv preprint arXiv:2605.12770}, |
| url = {https://github.com/JackYoung27/writesae} |
| } |
| ``` |
|
|
| MIT license. Base models retain their upstream licenses; no base-model weights are redistributed. |
|
|