CreditScope Circuit Tracing Models
Sparse Autoencoders (SAEs) and MoE Transcoders trained on Qwen3.5-35B-A3B-FP8 for mechanistic interpretability and circuit tracing in credit-domain safety analysis.
Models (Updated 2026-04-09)
SAEs (Sparse Autoencoders)
- Architecture: JumpReLU, d_model=2048 -> 4096 features (2x expansion)
- Layers: 0, 10, 30, 39
- Training data: Cyber + financial domain activations (289K tokens)
- Files:
checkpoints/sae_l{N}.pt
Transcoders (MoE Transcoders)
- Architecture: ReLU encoder/decoder, d_model=2048 -> 4096 features
- Layers: 0, 10, 30, 39
- Files:
checkpoints/tc_l{N}.pt
Metadata
checkpoints/feature_names.json- decoded feature labelscheckpoints/safety_threshold.json- tiered safety scoring configcheckpoints/architecture_map.json- model architecture definitionscheckpoints/chat_context_features.json- context feature datacheckpoints/safety_test_prompts.json- evaluation prompts
Usage
import torch
# Load SAE checkpoint
ckpt = torch.load("checkpoints/sae_l0.pt", map_location="cpu", weights_only=False)
state_dict = ckpt.get("state_dict", ckpt)
# encoder.weight shape: [4096, 2048]
Training Details
- Base model: Qwen/Qwen3.5-35B-A3B-FP8 served via SGLang
- Feature count: 4096 per layer (2x expansion from d_model=2048)
- Activation collection: Direct forward hooks on diverse prompts
- SAE optimizer: Adam, lr=3e-4, cosine annealing
- TC optimizer: Adam, lr=1e-3, cosine annealing
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support