--- license: apache-2.0 tags: - sparse-autoencoder - mechanistic-interpretability - tool-calling - gemma - ministral - qwen arxiv: 2605.18882 --- # toolcalling-sae TopK Sparse Autoencoder checkpoints from [To Call or Not to Call: Diagnosing Intrinsic Over-Calling Bias in LLM Agents](https://arxiv.org/abs/2605.18882). ## Checkpoints | Model | Layer | Dict Size | k | Stage 1 | Stage 2 | |-------|-------|-----------|---|---------|---------| | gemma-3-1b-it | L17 | 9 216 | 128 | 50M tokens | 5M tokens | | gemma-3-4b-it | L29 | 20 480 | 128 | 50M tokens | 5M tokens | | gemma-4-E2B-it | L30 | 12 288 | 128 | 50M tokens | 5M tokens | | gemma-4-E4B-it | L30 | 20 480 | 128 | 50M tokens | 5M tokens | | Ministral-3-3B-Instruct-2512 | L21 | 24 576 | 128 | 50M tokens | 5M tokens | | Ministral-3-8B-Instruct-2512 | L31 | 32 768 | 128 | 50M tokens | 5M tokens | | Qwen3.5-4B | L25 | 20 480 | 128 | 50M tokens | 5M tokens | | Qwen3.5-9B | L25 | 32 768 | 128 | 50M tokens | 5M tokens | **Stage 1**: Pre-trained on [OpenWebText2](https://openwebtext2.readthedocs.io/). **Stage 2**: Fine-tuned on tool-calling activations from the [When2Call](https://arxiv.org/abs/2605.18882) benchmark. All checkpoints use `bfloat16` precision. ## Usage ```python from huggingface_hub import hf_hub_download from sae_model import TopKSAE ckpt_path = hf_hub_download( repo_id="SKwra/toolcalling-sae", filename="gemma-3-1b-it/stage2/gemma-3-1b-it-L17-d9216-5M-stage2.pt" ) sae = TopKSAE.load(ckpt_path, device="cuda") ``` `sae_model.py` is included in this repo. Full code at [GitHub](https://github.com/SKURA502/agent-sae).