File size: 1,786 Bytes
7e26c0e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
---
license: mit
library_name: transformers
tags:
  - gpt2
  - onnx
  - mechanistic-interpretability
  - circuit-ablation
  - rti-circuit
base_model: openai-community/gpt2
---

# GPT-2 with RTI Circuit Zero-Ablated

GPT-2 (124M) with the 15-head **Repeated Token Identification (RTI) circuit** removed via zero ablation.

## What was ablated

The RTI circuit consists of 15 attention heads across 4 functional tiers:

| Tier | Heads | Function |
|------|-------|----------|
| Backbone | 0.8, 0.9, 0.11 | Broad token matching via positional/frequency features |
| Detector | 4.11 | Repeated-token detection gate |
| Copier | 4.0, 5.6, 5.7, 7.0, 8.4, 8.7, 9.3, 9.10 | Copy repeated token identity to output |
| Readout | 10.11, 11.9, 11.11 | Route copied information to final logits |

## Ablation method

**Zero ablation**: For each circuit head, the corresponding columns of `c_proj.weight` (the output projection W_O) were set to zero. This prevents the head from writing anything to the residual stream, effectively removing its contribution.

## Effect

The ablated model loses the ability to predict repeated tokens. For example:
- **Normal GPT-2**: "The cat sat on the mat. The cat" → " was a little bit older than me, but I"
- **Zero-ablated**: "The cat sat on the mat. The cat" → " sat on the mat. The cat sat on the"

## Usage with Transformers.js

```javascript
import { AutoModelForCausalLM, AutoTokenizer } from '@huggingface/transformers';

const model = await AutoModelForCausalLM.from_pretrained('elliottower2/gpt2-rti-zero-ablated', {
  dtype: 'fp32',
});
const tokenizer = await AutoTokenizer.from_pretrained('elliottower2/gpt2-rti-zero-ablated');
```

## Citation

Part of the factorization-circuits project studying weight-space circuit discovery in transformers.