Configuration Parsing Warning:In config.json: "architectures" must be an array
InterpGPT โ ADHD Model (23M)
Part of the InterpGPT matched-pair release. This is the ADHD model;
its counterpart is
connaaa/interpgpt-standard-23M.
Both models share identical architecture and training recipe; only the
training data distribution differs.
ADHD variant training data: task decompositions broken into smaller steps with interleaved micro-regulation actions ("sip water", "deep breath", "close eyes briefly", "quick stretch", "pause").
| Value | |
|---|---|
| Parameters | 23,471,104 |
| Layers | 6 |
| Heads | 8 |
| d_model | 512 |
| d_head | 64 |
| d_mlp (SwiGLU) | 1408 |
| Vocab | 8192 (custom BPE) |
| Context length | 512 |
| Norm | RMSNorm (ฮต = 1e-6) |
| Position | RoPE (half-half, base 10,000) |
| Activation | SwiGLU |
| Biases | none |
| Tied input/output embeddings | yes |
| Training tokens | ~25k steps on ADHD-variant task-decomposition corpus |
Headline findings (Phase 1)
- Structural head-position swap. A step-layout-broadcast head lives at L3H0 in the standard model and at L3H5 in the ADHD model. Cross-model per-position attention profile cosine at the matched pair 0.997; same-index baseline 0.66 (0.663 for one pair; 0.643 for another). Causal ablation confirms the functional identity: ablating L3H5 in the ADHD model drops Spearman(task_complexity ร step_count) from 0.83 โ 0.78 (median ฮ = -0.055 across 5 seeds).
- Block-2 content circuit. P(regulation token) at step-onset positions jumps 17ร between layer 1 and layer 2 (0.014 โ 0.251). The standard model never crosses 1% at any layer.
- High-specificity null-steering feature. An ADHD-L2 SAE feature
(feat 2504) fires at 59% of ADHD step-onsets vs 0.03% of standard step-onsets
(~2000ร cross-model asymmetry), yet causal steering on its decoder
direction produces ฮ within sampling noise under all four intervention
variants (inject-std, subtract-adhd, zero-ablate, inject-upstream).
See the companion SAE repo
connaaa/interpgpt-sae-phase5.
Loading
Identical to the standard variant. See
connaaa/interpgpt-standard-23M
for AutoModel, TransformerLens, and raw-TaskGPT examples, substituting the
repo id.
Input format
<|task|>Clean the kitchen<|steps|>Step 1 text<|sep|>Step 2 text<|sep|>...<|end|>
Reproduce the head-swap finding
Open the Colab at notebooks/InterpGPT_HeadSwap.ipynb
(https://github.com/cwklurks/interpgpt). Runs end-to-end on Colab free tier in
under 15 minutes.
License
MIT.
Citation
See the standard model card.
- Downloads last month
- 513
docker model run hf.co/connaaa/interpgpt-adhd-23M