Instructions to use connaaa/interpgpt-standard-23M with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use connaaa/interpgpt-standard-23M with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="connaaa/interpgpt-standard-23M", trust_remote_code=True)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("connaaa/interpgpt-standard-23M", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use connaaa/interpgpt-standard-23M with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "connaaa/interpgpt-standard-23M"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "connaaa/interpgpt-standard-23M",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/connaaa/interpgpt-standard-23M

SGLang

How to use connaaa/interpgpt-standard-23M with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "connaaa/interpgpt-standard-23M" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "connaaa/interpgpt-standard-23M",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "connaaa/interpgpt-standard-23M" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "connaaa/interpgpt-standard-23M",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use connaaa/interpgpt-standard-23M with Docker Model Runner:
```
docker model run hf.co/connaaa/interpgpt-standard-23M
```

interpgpt-standard-23M / README.md

connaaa

Phase 1 release: InterpGPT matched-pair checkpoint

378744c verified 22 days ago

preview code

raw

history blame contribute delete

4.68 kB

	---
	license: mit
	library_name: transformers
	tags:
	- interpretability
	- mechanistic-interpretability
	- task-decomposition
	- small-language-model
	- transformer-lens
	pipeline_tag: text-generation
	---

	# InterpGPT — Standard Model (23M)

	Part of the InterpGPT matched-pair release. This is the standard model;
	its counterpart is [`connaaa/interpgpt-adhd-23M`](https://huggingface.co/connaaa/interpgpt-adhd-23M).
	Both models share identical architecture and training recipe; only the training
	data distribution differs.

	\| \| Value \|
	\|---\|---\|
	\| Parameters \| 23,471,104 \|
	\| Layers \| 6 \|
	\| Heads \| 8 \|
	\| d_model \| 512 \|
	\| d_head \| 64 \|
	\| d_mlp (SwiGLU) \| 1408 \|
	\| Vocab \| 8192 (custom BPE) \|
	\| Context length \| 512 \|
	\| Norm \| RMSNorm (ε = 1e-6) \|
	\| Position \| RoPE (half-half, base 10,000) \|
	\| Activation \| SwiGLU \|
	\| Biases \| none \|
	\| Tied input/output embeddings \| yes \|
	\| Training tokens \| ~25k steps on task-decomposition corpus \|

	## What is this model for?

	Given a task prompt, the model writes a step-by-step decomposition. The
	standard variant was trained on normal task decompositions (tasks → subtasks
	in straightforward order). The ADHD counterpart was trained on decompositions
	with smaller steps and interleaved micro-regulation actions (e.g. "sip water",
	"deep breath", "quick stretch").

	The pair is the subject of a mechanistic-interpretability study.
	Phase 1 headline findings:

	- Structural head-position swap. A step-layout-broadcast head lives at
	L3H0 in the standard model and at L3H5 in the ADHD model.
	Cross-model per-position attention profile cosine similarity is 0.997
	at the matched (different-index) pair vs a same-index baseline of 0.66.
	- Block-2 content circuit. P(regulation token) at step-onset positions jumps
	17× between layer 1 and layer 2 in the ADHD model (0.014 → 0.251); the
	standard model never crosses 1% at any layer.
	- High-specificity null-steering SAE feature. See the companion SAE repo
	[`connaaa/interpgpt-sae-phase5`](https://huggingface.co/connaaa/interpgpt-sae-phase5).

	## Input format

	```
	<\|task\|>Clean the kitchen<\|steps\|>Step 1 text<\|sep\|>Step 2 text<\|sep\|>...<\|end\|>
	```

	## Loading

	### HuggingFace Transformers (custom code)

	```python
	from transformers import AutoModel, AutoTokenizer
	model = AutoModel.from_pretrained(
	"connaaa/interpgpt-standard-23M", trust_remote_code=True
	)
	tokenizer = AutoTokenizer.from_pretrained(
	"connaaa/interpgpt-standard-23M"
	)
	```

	### TransformerLens (recommended for interpretability)

	The repo ships a TransformerLens-compatible bundle at `hooked_transformer.pt`:

	```python
	from huggingface_hub import hf_hub_download
	from transformer_lens import HookedTransformer, HookedTransformerConfig
	import torch

	path = hf_hub_download(
	"connaaa/interpgpt-standard-23M", "hooked_transformer.pt"
	)
	blob = torch.load(path, map_location="cpu", weights_only=False)
	cfg_keep = {
	k: v for k, v in blob["config"].items()
	if k in HookedTransformerConfig.__dataclass_fields__ and not (
	isinstance(v, str) and v.startswith("torch.")
	)
	}
	cfg = HookedTransformerConfig(**cfg_keep)
	model = HookedTransformer(cfg)
	model.load_state_dict(blob["model_state_dict"])
	model.eval()
	```

	### Raw PyTorch / original TaskGPT class

	```python
	# Pairs with gpt_model.py from https://github.com/cwklurks/interpgpt
	from huggingface_hub import hf_hub_download
	from gpt_model import GPTConfig, TaskGPT
	import torch

	path = hf_hub_download(
	"connaaa/interpgpt-standard-23M", "pytorch_model.pt"
	)
	blob = torch.load(path, map_location="cpu", weights_only=False)
	model = TaskGPT(GPTConfig(**blob["config"]))
	model.load_state_dict(blob["model_state_dict"])
	```

	## Reproduce the head-swap finding

	Open the companion Colab:
	`notebooks/InterpGPT_HeadSwap.ipynb` at
	[github.com/cwklurks/interpgpt](https://github.com/cwklurks/interpgpt).
	End-to-end run on Colab free tier reproduces the 0.997 vs 0.66 comparison
	in under 15 minutes.

	## Training data

	Custom task-decomposition corpus, two variants (standard vs ADHD) generated
	with the same task pool. Detailed dataset notes + generation scripts live in
	the main repo (`preprocess.py`, `merge_data.py`, `rebuild_data.py`,
	`fix_adhd_data.py`, `shorten_adhd_steps.py`).

	## License

	MIT.

	## Intended use

	Interpretability research. The model is intentionally small and
	domain-specific; not intended as a general-purpose chatbot.

	## Citation

	```bibtex
	@misc{interpgpt2026,
	title = {{InterpGPT}: A matched-pair interpretability study of task-decomposition models},
	author = {Klann, Connor},
	year = {2026},
	url = {https://github.com/cwklurks/interpgpt}
	}
	```