CreditScope Circuit Tracing Models

Sparse Autoencoders (SAEs) and MoE Transcoders trained on Qwen3.5-35B-A3B-FP8 for mechanistic interpretability and circuit tracing in credit-domain safety analysis.

Models (Updated 2026-04-09)

SAEs (Sparse Autoencoders)

Architecture: JumpReLU, d_model=2048 -> 4096 features (2x expansion)
Layers: 0, 10, 30, 39
Training data: Cyber + financial domain activations (289K tokens)
Files: checkpoints/sae_l{N}.pt

Transcoders (MoE Transcoders)

Architecture: ReLU encoder/decoder, d_model=2048 -> 4096 features
Layers: 0, 10, 30, 39
Files: checkpoints/tc_l{N}.pt

Metadata

checkpoints/feature_names.json - decoded feature labels
checkpoints/safety_threshold.json - tiered safety scoring config
checkpoints/architecture_map.json - model architecture definitions
checkpoints/chat_context_features.json - context feature data
checkpoints/safety_test_prompts.json - evaluation prompts

Usage

import torch

# Load SAE checkpoint
ckpt = torch.load("checkpoints/sae_l0.pt", map_location="cpu", weights_only=False)
state_dict = ckpt.get("state_dict", ckpt)
# encoder.weight shape: [4096, 2048]

Training Details

Base model: Qwen/Qwen3.5-35B-A3B-FP8 served via SGLang
Feature count: 4096 per layer (2x expansion from d_model=2048)
Activation collection: Direct forward hooks on diverse prompts
SAE optimizer: Adam, lr=3e-4, cosine annealing
TC optimizer: Adam, lr=1e-3, cosine annealing

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support