CreditScope Circuit Tracing Models

Sparse Autoencoders (SAEs) and MoE Transcoders trained on Qwen3.5-35B-A3B-FP8 for mechanistic interpretability and circuit tracing in credit-domain safety analysis.

Models (Updated 2026-04-09)

SAEs (Sparse Autoencoders)

  • Architecture: JumpReLU, d_model=2048 -> 4096 features (2x expansion)
  • Layers: 0, 10, 30, 39
  • Training data: Cyber + financial domain activations (289K tokens)
  • Files: checkpoints/sae_l{N}.pt

Transcoders (MoE Transcoders)

  • Architecture: ReLU encoder/decoder, d_model=2048 -> 4096 features
  • Layers: 0, 10, 30, 39
  • Files: checkpoints/tc_l{N}.pt

Metadata

  • checkpoints/feature_names.json - decoded feature labels
  • checkpoints/safety_threshold.json - tiered safety scoring config
  • checkpoints/architecture_map.json - model architecture definitions
  • checkpoints/chat_context_features.json - context feature data
  • checkpoints/safety_test_prompts.json - evaluation prompts

Usage

import torch

# Load SAE checkpoint
ckpt = torch.load("checkpoints/sae_l0.pt", map_location="cpu", weights_only=False)
state_dict = ckpt.get("state_dict", ckpt)
# encoder.weight shape: [4096, 2048]

Training Details

  • Base model: Qwen/Qwen3.5-35B-A3B-FP8 served via SGLang
  • Feature count: 4096 per layer (2x expansion from d_model=2048)
  • Activation collection: Direct forward hooks on diverse prompts
  • SAE optimizer: Adam, lr=3e-4, cosine annealing
  • TC optimizer: Adam, lr=1e-3, cosine annealing
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support