--- license: mit library_name: transformers tags: - gpt2 - onnx - mechanistic-interpretability - circuit-ablation - rti-circuit base_model: openai-community/gpt2 --- # GPT-2 with RTI Circuit Zero-Ablated GPT-2 (124M) with the 15-head **Repeated Token Identification (RTI) circuit** removed via zero ablation. ## What was ablated The RTI circuit consists of 15 attention heads across 4 functional tiers: | Tier | Heads | Function | |------|-------|----------| | Backbone | 0.8, 0.9, 0.11 | Broad token matching via positional/frequency features | | Detector | 4.11 | Repeated-token detection gate | | Copier | 4.0, 5.6, 5.7, 7.0, 8.4, 8.7, 9.3, 9.10 | Copy repeated token identity to output | | Readout | 10.11, 11.9, 11.11 | Route copied information to final logits | ## Ablation method **Zero ablation**: For each circuit head, the corresponding columns of `c_proj.weight` (the output projection W_O) were set to zero. This prevents the head from writing anything to the residual stream, effectively removing its contribution. ## Effect The ablated model loses the ability to predict repeated tokens. For example: - **Normal GPT-2**: "The cat sat on the mat. The cat" → " was a little bit older than me, but I" - **Zero-ablated**: "The cat sat on the mat. The cat" → " sat on the mat. The cat sat on the" ## Usage with Transformers.js ```javascript import { AutoModelForCausalLM, AutoTokenizer } from '@huggingface/transformers'; const model = await AutoModelForCausalLM.from_pretrained('elliottower2/gpt2-rti-zero-ablated', { dtype: 'fp32', }); const tokenizer = await AutoTokenizer.from_pretrained('elliottower2/gpt2-rti-zero-ablated'); ``` ## Citation Part of the factorization-circuits project studying weight-space circuit discovery in transformers.