Spaces:

Ex0bit
/

qwen-scope-live

Running

App Files Files Community

Ex0bit commited on 14 days ago

Commit

f2ae1f5

verified ·

1 Parent(s): b87ded2

initial qwen-scope-live deploy

Browse files

Files changed (9) hide show

Dockerfile +40 -0
README.md +51 -4
__pycache__/qwen_scope_steer.cpython-314.pyc +0 -0
__pycache__/server.cpython-314.pyc +0 -0
index.html +1564 -0
qwen_scope_obs.py +129 -0
qwen_scope_steer.py +279 -0
requirements.txt +10 -0
server.py +817 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,40 @@

+# Qwen-Scope Live SAE Feature Steering — HF Space Docker SDK image.
+# Free tier (CPU, ~16GB RAM) — locked to Qwen3-1.7B-Base only.
+FROM python:3.11-slim
+# Avoid interactive prompts during installs
+ENV DEBIAN_FRONTEND=noninteractive \
+    PYTHONUNBUFFERED=1 \
+    PYTHONDONTWRITEBYTECODE=1 \
+    PIP_NO_CACHE_DIR=1 \
+    HF_HUB_DISABLE_TELEMETRY=1 \
+    TRANSFORMERS_NO_ADVISORY_WARNINGS=1
+# HF Spaces convention: /home/user is writable, expects PORT=7860
+ENV HOME=/home/user \
+    PATH=/home/user/.local/bin:$PATH \
+    HF_HOME=/home/user/.cache/huggingface \
+    PORT=7860
+# Add a non-root user (HF Spaces best practice)
+RUN useradd -m -u 1000 user && \
+    apt-get update && apt-get install -y --no-install-recommends \
+        build-essential ca-certificates && \
+    rm -rf /var/lib/apt/lists/*
+USER user
+WORKDIR $HOME/app
+# Install Python deps (CPU-only torch from PyTorch index)
+COPY --chown=user:user requirements.txt .
+RUN pip install --user --no-cache-dir -r requirements.txt
+# Copy application files
+COPY --chown=user:user qwen_scope_steer.py qwen_scope_obs.py server.py index.html ./
+# Pre-create cache dir (writable for the cached SAE positions JSON)
+RUN mkdir -p $HOME/app/feature_positions
+EXPOSE 7860
+CMD ["python", "server.py"]

README.md CHANGED Viewed

@@ -1,10 +1,57 @@
 ---
-title: Qwen Scope Live
-emoji: ⚡
 colorFrom: blue
-colorTo: gray
 sdk: docker
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: "Qwen-Scope: Decoding Intelligence, Unleashing Potential"
+emoji: 🔬
 colorFrom: blue
+colorTo: pink
 sdk: docker
+app_port: 7860
 pinned: false
+license: apache-2.0
+short_description: Live SAE feature steering for Qwen3-1.7B
 ---
+# Qwen-Scope: Decoding Intelligence, Unleashing Potential
+Live, interactive demo of the [Qwen-Scope](https://huggingface.co/Qwen/SAE-Res-Qwen3-1.7B-Base-W32K-L0_50) sparse-autoencoder release. Type a prompt, see which SAE features fire, drag a slider to steer them, and watch the model's actual generated text change in real time — all running in your browser against `Qwen/Qwen3-1.7B-Base` + the layer-14 (and layer-selectable) Qwen-Scope SAE.
+This Space implements **three of the four pillars** from the Qwen-Scope paper:
+- **Steering** — interpretable feature-level control at inference time (per-token heatmap, position-selective steering, output-only steering, per-token probability panel with click-to-pin top-K candidates)
+- **Evaluation** — capability-aware corpus analysis (paste a prompt set, encode each, see which features fire across the corpus; compare two prompt sets for distinguishing features)
+- **Data-centric** — feature-guided curation (filter docs by feature signature) and steered synthesis (bulk-generate steered completions from seed prompts)
+(The fourth pillar — Post-training pathology diagnosis — is not yet implemented.)
+## How to use
+1. **Wait ~3-5 minutes** on first load — the container has to download the model (~3.4GB) and SAE (~537MB) on cold start.
+2. **Steering tab**: type a prompt, click *Encode*, click *steer* on any top feature, drag the slider, click *Generate*. The output panels show baseline vs. steered side-by-side as colored token chips — click any chip to see top-K next-token candidates. Per-token feature heatmap appears in the right column showing which features fire on which tokens.
+3. **Evaluation tab**: paste many prompts (one per line), encode the corpus, see per-sample top features, signature heatmap, and a Compare panel for finding features that distinguish set A from set B.
+4. **Data-centric tab**: paste a corpus, filter it by feature signature (include/exclude docs where feature X fires), and run bulk steered synthesis to produce N targeted training examples.
+5. **Layer selector** in the header: change the SAE layer (0–27) to see how features differ across model depth. First swap to a layer takes ~20s; subsequent swaps are <0.1s thanks to an in-memory LRU cache.
+## Hardware constraint
+This Space is **locked to Qwen3-1.7B-Base** because that's what fits in a free-tier HF Space (CPU, 16GB RAM). The full local version supports the entire Qwen-Scope catalog including Qwen3-8B, Qwen3.5-9B/27B, Qwen3.6-27B/35B-A3B, and Qwen3-30B-A3B — each with its matching SAE — but those need GPU hardware and persistent storage. See the source repository for instructions.
+## What's verified
+This deployment is the same code that was [verified end-to-end on macOS / Apple Silicon MPS](https://huggingface.co/Qwen/SAE-Res-Qwen3-1.7B-Base-W32K-L0_50) — TopK SAE encode, residual hook, additive steering, per-token probability extraction, position-selective hooks across prefill + decode steps. The math is identical; only the device (CPU instead of MPS) and dtype (fp32 instead of bf16) differ in the deployment build.
+## License
+Apache 2.0 (this code). The Qwen-Scope SAE weights and Qwen3 model weights are governed by their respective Qwen licenses.
+## Citation
+If you use this in your work, cite the Qwen-Scope paper:
+```bibtex
+@misc{qwen_scope,
+    title = {{Qwen-Scope}: Turning Sparse Features into Development Tools for Large Language Models},
+    url = {https://qianwen-res.oss-accelerate.aliyuncs.com/qwen-scope/Qwen_Scope.pdf},
+    author = {{Qwen Team}},
+    month = {April},
+    year = {2026}
+}
+```

__pycache__/qwen_scope_steer.cpython-314.pyc ADDED Viewed

Binary file (19.9 kB). View file

__pycache__/server.cpython-314.pyc ADDED Viewed

Binary file (46.5 kB). View file

index.html ADDED Viewed

	@@ -0,0 +1,1564 @@

+<!doctype html>
+<html lang="en">
+<head>
+<meta charset="utf-8" />
+<meta name="viewport" content="width=device-width, initial-scale=1" />
+<title>Qwen-Scope · Live SAE Feature Steering</title>
+<script src="https://cdnjs.cloudflare.com/ajax/libs/three.js/r128/three.min.js"></script>
+<style>
+  :root {
+    --bg: #08080d;
+    --bg-panel: rgba(18, 18, 26, 0.78);
+    --bg-panel-strong: rgba(24, 24, 34, 0.92);
+    --border: rgba(255,255,255,0.08);
+    --border-strong: rgba(255,255,255,0.16);
+    --fg: #ececf1;
+    --fg-dim: #9b9ba8;
+    --fg-faint: #5e5e6e;
+    --accent: #7df9ff;
+    --accent-2: #ff7df9;
+    --warn: #ffb84d;
+    --good: #74e2a3;
+    --bad: #ff7d7d;
+    --mono: ui-monospace, "SF Mono", "JetBrains Mono", Menlo, monospace;
+    --sans: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, sans-serif;
+  }
+  * { box-sizing: border-box; }
+  html, body { margin:0; padding:0; height:100%; background:var(--bg); color:var(--fg); font-family:var(--sans); overflow:hidden; }
+  #scene { position:fixed; inset:0; z-index:0; }
+  #ui { position:fixed; inset:0; z-index:10; pointer-events:none; display:grid;
+        grid-template-columns: 380px 1fr 420px;
+        grid-template-rows: 1fr auto;
+        gap:14px; padding:14px; }
+  #ui > section { pointer-events:auto; }
+  .panel { background:var(--bg-panel); border:1px solid var(--border);
+           border-radius:14px; padding:14px; backdrop-filter: blur(14px) saturate(140%);
+           -webkit-backdrop-filter: blur(14px) saturate(140%); overflow:hidden;
+           transition: padding .15s ease, max-height .25s ease; }
+  .panel h2 { margin:0 0 10px; font-size:11px; font-weight:600;
+              text-transform:uppercase; letter-spacing:0.14em; color:var(--fg-dim);
+              display:flex; justify-content:space-between; align-items:center; gap:8px; }
+  .panel h3 { margin:0 0 6px; font-size:13px; font-weight:600; color:var(--fg); }
+  /* Minimize button on each panel header */
+  .min-btn { background:transparent; border:1px solid var(--border-strong);
+             color:var(--fg-dim); border-radius:5px; padding:1px 8px; font-size:14px;
+             font-family:var(--mono); cursor:pointer; line-height:1; min-width:0;
+             font-weight:400; letter-spacing:0; }
+  .min-btn:hover { background:var(--bg-panel-strong); color:var(--fg);
+                   border-color:var(--accent); transform:none; box-shadow:none; }
+  .panel.collapsed { padding:8px 14px; }
+  .panel.collapsed > h2 { margin-bottom:0; }
+  .panel.collapsed > :not(h2) { display:none; }
+  /* Header strip */
+  #header { position:fixed; top:14px; left:50%; transform:translateX(-50%);
+            z-index:20; pointer-events:auto;
+            background:var(--bg-panel-strong); border:1px solid var(--border-strong);
+            border-radius:999px; padding:8px 18px; display:flex; gap:14px;
+            align-items:center; font-family:var(--mono); font-size:12px; color:var(--fg-dim);
+            backdrop-filter: blur(20px); }
+  #header .dot { width:8px; height:8px; border-radius:50%; background:var(--bad); transition:background .3s; }
+  #header.live .dot { background:var(--good); box-shadow:0 0 12px var(--good); }
+  #header.loading .dot { background:var(--warn); box-shadow:0 0 12px var(--warn); animation:pulse 1.5s ease-in-out infinite; }
+  @keyframes pulse { 0%,100% { opacity:1; } 50% { opacity:0.4; } }
+  #header b { color:var(--fg); font-weight:600; }
+  #model-select {
+    background:transparent; color:var(--fg); border:1px solid var(--border-strong);
+    border-radius:6px; padding:3px 8px; font-family:var(--mono); font-size:11px;
+    outline:none; cursor:pointer; max-width:280px;
+  }
+  #model-select:focus { border-color:var(--accent); }
+  #model-select option { background:#16161e; color:var(--fg); }
+  /* Tab strip */
+  #tabs { position:fixed; top:62px; left:50%; transform:translateX(-50%); z-index:18;
+          display:flex; gap:6px; pointer-events:auto; background:var(--bg-panel-strong);
+          border:1px solid var(--border-strong); border-radius:999px;
+          padding:4px; backdrop-filter: blur(20px); }
+  .tab { background:transparent; border:none; color:var(--fg-dim);
+         padding:6px 16px; font-size:11px; font-weight:600;
+         text-transform:uppercase; letter-spacing:0.08em; border-radius:999px;
+         cursor:pointer; transition: all 0.15s ease; }
+  .tab:hover { color:var(--fg); background:rgba(255,255,255,0.04); }
+  .tab.active { background: linear-gradient(135deg, rgba(125,249,255,0.18), rgba(255,125,249,0.12));
+                color:var(--fg); border:1px solid var(--border-strong); }
+  /* Tab visibility — hide panels not matching the current tab */
+  body[data-tab="steering"]    [data-show-tab]:not([data-show-tab~="steering"])    { display:none !important; }
+  body[data-tab="evaluation"]  [data-show-tab]:not([data-show-tab~="evaluation"])  { display:none !important; }
+  body[data-tab="datacentric"] [data-show-tab]:not([data-show-tab~="datacentric"]) { display:none !important; }
+  /* Loading overlay shown during model swap */
+  #loading-overlay {
+    position:fixed; inset:0; z-index:50; display:none;
+    background:rgba(8,8,13,0.78); backdrop-filter: blur(6px);
+    align-items:center; justify-content:center;
+  }
+  #loading-overlay.visible { display:flex; }
+  #loading-overlay .lo-card {
+    background:var(--bg-panel-strong); border:1px solid var(--border-strong);
+    border-radius:14px; padding:24px 32px; text-align:center; min-width:300px;
+  }
+  /* Left: prompt + controls (top-left of screen, below the header) */
+  #left { display:flex; flex-direction:column; gap:14px; min-height:0;
+          grid-column: 1; grid-row: 1; align-self: start;
+          padding-top: 110px; }
+  textarea, input[type="text"], input[type="number"] {
+    width:100%; background:rgba(0,0,0,0.35); border:1px solid var(--border-strong);
+    color:var(--fg); border-radius:8px; padding:10px 12px; font-family:var(--mono);
+    font-size:13px; outline:none; resize:vertical;
+  }
+  textarea:focus, input:focus { border-color:var(--accent); box-shadow:0 0 0 3px rgba(125,249,255,0.12); }
+  textarea { min-height:72px; }
+  .row { display:flex; gap:8px; align-items:center; }
+  .row > * { flex:1; }
+  .row > .grow0 { flex:0; }
+  button { background: linear-gradient(135deg, rgba(125,249,255,0.18), rgba(255,125,249,0.12));
+           border:1px solid var(--border-strong); color:var(--fg); border-radius:8px;
+           padding:9px 14px; font-family:var(--sans); font-size:13px; font-weight:600;
+           cursor:pointer; transition:all .15s; letter-spacing:0.02em; }
+  button:hover:not(:disabled) { border-color:var(--accent); transform:translateY(-1px); box-shadow:0 4px 14px rgba(125,249,255,0.18); }
+  button:disabled { opacity:0.4; cursor:not-allowed; }
+  button.primary { background: linear-gradient(135deg, #7df9ff, #ff7df9); color:#0a0a14; border-color:transparent; }
+  button.primary:hover:not(:disabled) { box-shadow:0 4px 18px rgba(255,125,249,0.32); }
+  button.ghost { background:transparent; }
+  /* Middle: nothing (3D scene shows through) */
+  #middle { grid-column: 2; grid-row: 1; pointer-events:none; }
+  /* Bottom: output band spans all columns; sits below cloud area */
+  #bottom { grid-column: 1 / -1; grid-row: 2; pointer-events:auto;
+            display:grid; grid-template-columns: minmax(0, 1fr) minmax(0, 1fr);
+            gap:14px;
+            max-height: 38vh; min-height: 0;
+            margin: 0 0 0 0; }
+  #bottom .panel { display:flex; flex-direction:column;
+                   min-width: 0; min-height: 0; overflow:hidden; }
+  #bottom .panel.collapsed { padding:8px 14px; }
+  #bottom .panel.collapsed > .panel-body { display:none; }
+  .panel-body { flex: 1 1 auto; overflow:auto; min-height:0; min-width:0; }
+  #right .panel { min-width:0; max-width:100%; }
+  #right .panel > div { max-width:100%; overflow:auto; }
+  /* Right: top features panel only — output moved to bottom band */
+  #right { display:flex; flex-direction:column; gap:14px; min-height:0;
+           grid-column: 3; grid-row: 1; max-height: calc(100vh - 28px); padding-top:110px; }
+  #features { flex: 1 1 auto; min-height: 120px; overflow-y:auto; }
+  #features::-webkit-scrollbar { width:6px; }
+  #features::-webkit-scrollbar-thumb { background:var(--border-strong); border-radius:3px; }
+  .feat { padding:10px; border:1px solid var(--border); border-radius:10px;
+          margin-bottom:8px; background:rgba(0,0,0,0.25); transition: border-color .2s, background .2s; }
+  .feat:hover { border-color:var(--accent); }
+  .feat.steered { border-color:var(--accent-2); background:rgba(255,125,249,0.06); }
+  .feat-head { display:flex; justify-content:space-between; align-items:center; margin-bottom:6px; }
+  .feat-id { font-family:var(--mono); font-size:12px; color:var(--fg); }
+  .feat-act { font-family:var(--mono); font-size:11px; color:var(--good); }
+  .feat-act.steered { color:var(--accent-2); }
+  .slider-row { display:flex; align-items:center; gap:8px; margin-top:4px; }
+  .slider-row input[type=range] { flex:1; -webkit-appearance:none; height:4px; background:var(--border-strong); border-radius:2px; outline:none; }
+  .slider-row input[type=range]::-webkit-slider-thumb {
+    -webkit-appearance:none; width:14px; height:14px; border-radius:50%;
+    background: linear-gradient(135deg, #7df9ff, #ff7df9); cursor:pointer;
+    box-shadow:0 0 8px rgba(125,249,255,0.5); border:none;
+  }
+  .slider-row input[type=range]::-moz-range-thumb {
+    width:14px; height:14px; border-radius:50%;
+    background: linear-gradient(135deg, #7df9ff, #ff7df9); cursor:pointer; border:none;
+  }
+  .slider-row .alpha-val { font-family:var(--mono); font-size:11px; color:var(--fg-dim); width:48px; text-align:right; }
+  .feat-tools { display:flex; gap:6px; }
+  .feat-tools button { padding:3px 8px; font-size:10px; font-weight:500; }
+  /* Output area */
+  #output-wrap { flex:1; overflow-y:auto; }
+  #output-wrap::-webkit-scrollbar { width:6px; }
+  #output-wrap::-webkit-scrollbar-thumb { background:var(--border-strong); border-radius:3px; }
+  .out-block { font-family:var(--mono); font-size:12px; line-height:1.55;
+               background:rgba(0,0,0,0.4); border:1px solid var(--border);
+               border-radius:8px; padding:10px; margin-bottom:8px; white-space:pre-wrap;
+               word-break: break-word; color:var(--fg); min-height:36px; }
+  .out-block.baseline { border-color:rgba(125,249,255,0.3); }
+  .out-block.steered { border-color:rgba(255,125,249,0.3); }
+  .out-label { display:flex; justify-content:space-between; align-items:center; margin-bottom:4px;
+               font-size:10px; text-transform:uppercase; letter-spacing:0.12em; color:var(--fg-dim); }
+  .verifier { font-family:var(--mono); font-size:10px; color:var(--fg-dim); margin-top:6px; padding:6px 8px;
+              border-left:2px solid var(--border-strong); }
+  .verifier .delta-up { color:var(--good); }
+  .verifier .delta-down { color:var(--accent-2); }
+  /* Hover tooltip for selected feature on 3D cloud */
+  #tooltip { position:fixed; z-index:25; pointer-events:none;
+             background:var(--bg-panel-strong); border:1px solid var(--border-strong);
+             border-radius:8px; padding:8px 12px; font-family:var(--mono); font-size:11px;
+             line-height:1.45;
+             color:var(--fg); display:none; transform: translate(-50%, calc(-100% - 12px));
+             min-width:180px; max-width:280px; backdrop-filter: blur(20px); }
+  /* Loader */
+  .loader { display:inline-block; width:12px; height:12px; border:2px solid var(--border-strong);
+            border-top-color:var(--accent); border-radius:50%; animation:spin 0.8s linear infinite; }
+  @keyframes spin { to { transform: rotate(360deg); } }
+  .empty { color:var(--fg-faint); font-size:12px; text-align:center; padding:24px 8px; }
+  .hint { font-size:11px; color:var(--fg-faint); margin-top:6px; line-height:1.4; }
+  .footer { position:fixed; bottom:8px; left:14px; z-index:20; font-family:var(--mono);
+            font-size:10px; color:var(--fg-faint); letter-spacing:0.04em; }
+  .legend { font-family:var(--mono); font-size:10px; color:var(--fg-faint); display:flex; gap:12px; margin-top:6px; }
+  .legend span::before { content:''; display:inline-block; width:8px; height:8px; border-radius:50%; margin-right:5px; vertical-align:middle; }
+  .legend .top::before { background:var(--accent); box-shadow:0 0 6px var(--accent); }
+  .legend .pick::before { background:var(--accent-2); box-shadow:0 0 6px var(--accent-2); }
+  .legend .dim::before { background:var(--fg-faint); }
+  /* small-screen graceful fallback */
+  @media (max-width: 1100px) {
+    #ui { grid-template-columns: 1fr; grid-template-rows: auto auto; }
+    #left { grid-column:1; grid-row:1; max-height:none; }
+    #right { grid-column:1; grid-row:2; padding-top:14px; max-height:none; }
+    #middle { display:none; }
+  }
+</style>
+</head>
+<body>
+<canvas id="scene"></canvas>
+<div id="header">
+  <span class="dot"></span>
+  <span>QWEN-SCOPE LIVE</span>
+  <span style="opacity:.4">·</span>
+  <select id="model-select" title="Switch the loaded model + SAE pair">
+    <option>connecting…</option>
+  </select>
+  <span style="opacity:.4">·</span>
+  <span id="hdr-layer-wrap" title="Click to change SAE layer">
+    layer <input type="number" id="layer-input" value="?" min="0" max="0"
+                  style="width:48px; padding:1px 4px; background:transparent; color:var(--accent);
+                         border:1px solid var(--border-strong); border-radius:4px;
+                         font-family:var(--mono); font-size:12px; text-align:center;" />
+    <span id="hdr-layer-meta">· mps · bfloat16</span>
+  </span>
+  <span style="opacity:.4">·</span>
+  <span id="hdr-features">— features</span>
+</div>
+<div id="tabs">
+  <button class="tab active" data-tab="steering">Steering</button>
+  <button class="tab" data-tab="evaluation">Evaluation</button>
+  <button class="tab" data-tab="datacentric">Data-centric</button>
+</div>
+<div id="loading-overlay">
+  <div class="lo-card">
+    <div class="loader" style="width:24px;height:24px;border-width:3px;"></div>
+    <div id="lo-msg" style="margin-top:14px; font-size:13px; color:var(--fg);">Loading…</div>
+    <div id="lo-detail" style="margin-top:6px; font-size:11px; color:var(--fg-faint); font-family:var(--mono);"></div>
+  </div>
+</div>
+<div id="ui">
+  <section id="left">
+    <div class="panel" data-pid="prompt" data-show-tab="steering">
+      <h2>1 · Prompt <button class="min-btn" data-min="prompt" title="Minimize panel">−</button></h2>
+      <textarea id="prompt" placeholder="The capital of France is">The capital of France is</textarea>
+      <div class="row" style="margin-top:8px;">
+        <label style="font-size:11px; color:var(--fg-dim); flex:0 0 auto;">top K</label>
+        <input type="number" id="top-k" value="20" min="1" max="500" style="flex:0 0 80px;" />
+        <button class="primary" id="btn-encode" style="flex:1;">Encode &amp; show top features</button>
+      </div>
+      <div class="hint">
+        Encodes the prompt's last-token residual at the SAE's layer and ranks the firing features.
+        TopK SAE has at most K=50 nonzero features per token, so request up to ~50.
+      </div>
+    </div>
+    <div class="panel" data-pid="generate" data-show-tab="steering">
+      <h2>2 · Generate <button class="min-btn" data-min="generate" title="Minimize panel">−</button></h2>
+      <div class="row">
+        <label style="font-size:11px; color:var(--fg-dim); flex:0 0 auto;">tokens</label>
+        <input type="number" id="max-tokens" value="40" min="5" max="200" style="flex:0 0 80px;" />
+        <button class="primary" id="btn-generate" disabled>Generate baseline + steered</button>
+      </div>
+      <div class="hint">
+        Runs <code>model.generate</code> twice: once with no hooks, once with all active sliders applied
+        as residual-stream additions <code>h ← h + α · W_dec[:, feat]</code>.
+      </div>
+    </div>
+    <div class="panel" data-pid="viz" data-show-tab="steering evaluation datacentric" style="flex:0 0 auto;">
+      <h2>3 · Visualization <button class="min-btn" data-min="viz" title="Minimize panel">−</button></h2>
+      <div class="legend">
+        <span class="dim">all 32K features</span>
+        <span class="top">top firing</span>
+        <span class="pick">steered</span>
+      </div>
+      <div class="hint">
+        Each point is one SAE feature; positions are the top-3 PCA components of <code>W_enc</code>.
+        Click a point to add it as a steering target.
+      </div>
+    </div>
+  </section>
+  <section id="middle"></section>
+  <section id="right">
+    <div class="panel" id="features-panel" data-pid="features" data-show-tab="steering"
+         style="flex: 0 1 auto; max-height: 38vh; display:flex; flex-direction:column; min-height:0;">
+      <h2>Top features <span id="feat-count" style="font-family:var(--mono); color:var(--fg-faint);"></span>
+        <button class="min-btn" data-min="features" title="Minimize panel">−</button></h2>
+      <div id="features" style="overflow-y:auto; flex:1 1 auto; min-height:0;"><div class="empty">Encode a prompt to populate features.</div></div>
+    </div>
+    <div class="panel" id="heatmap-panel" data-show-tab="steering"
+         style="flex: 1 1 auto; display:flex; flex-direction:column; min-height:200px;">
+      <h2>Per-token feature heatmap
+        <span style="display:flex; gap:8px; align-items:center;">
+          <label style="font-size:10px; color:var(--fg-faint);">
+            <input type="checkbox" id="heatmap-skip-first" style="vertical-align:middle;" /> skip first
+          </label>
+          <button class="min-btn" data-min="heatmap-panel" title="Minimize panel">−</button>
+        </span>
+      </h2>
+      <div id="heatmap-grid" style="overflow:auto; flex:1 1 auto; min-height:0;">
+        <span class="empty">Encode a prompt — heatmap fills automatically.</span>
+      </div>
+    </div>
+    <!-- Evaluation: corpus encoding + signature heatmap -->
+    <div class="panel" data-show-tab="evaluation" id="eval-corpus">
+      <h2>Evaluation · corpus encode
+        <button class="min-btn" data-min="eval-corpus" title="Minimize panel">−</button>
+      </h2>
+      <textarea id="eval-prompts" rows="6" placeholder="One prompt per line.&#10;Example:&#10;The capital of France is&#10;Bonjour comment allez-vous&#10;def fibonacci(n):&#10;The mitochondria is the powerhouse"></textarea>
+      <div class="row" style="margin-top:8px;">
+        <button class="primary" id="btn-eval-encode" style="flex:1;">Encode corpus</button>
+      </div>
+      <div class="hint">
+        Encodes each prompt's last-token residual through the SAE. Returns
+        per-sample top features and corpus-level firing rates.
+      </div>
+    </div>
+    <div class="panel" data-show-tab="evaluation" id="eval-corpus-features">
+      <h2>Corpus features <span id="eval-stats" style="font-family:var(--mono); color:var(--fg-faint);"></span>
+        <button class="min-btn" data-min="eval-corpus-features" title="Minimize panel">−</button>
+      </h2>
+      <div id="eval-features-list" style="overflow-y:auto; max-height:30vh;">
+        <div class="empty">Encode a corpus to see feature firing rates.</div>
+      </div>
+    </div>
+    <div class="panel" data-show-tab="evaluation" id="eval-compare">
+      <h2>Compare two prompt sets
+        <button class="min-btn" data-min="eval-compare" title="Minimize panel">−</button>
+      </h2>
+      <textarea id="cmp-a" rows="3" placeholder="Set A — one prompt per line"></textarea>
+      <textarea id="cmp-b" rows="3" placeholder="Set B — one prompt per line" style="margin-top:6px;"></textarea>
+      <div class="row" style="margin-top:6px;">
+        <button class="primary" id="btn-cmp" style="flex:1;">Find distinguishing features</button>
+      </div>
+      <div class="hint">
+        Encodes both sets and ranks features by |fire_rate(A) − fire_rate(B)|.
+        Shows which features distinguish A from B.
+      </div>
+    </div>
+    <!-- Data-centric: filter + steered synthesis -->
+    <div class="panel" data-show-tab="datacentric" id="dc-corpus">
+      <h2>Data-centric · corpus
+        <button class="min-btn" data-min="dc-corpus" title="Minimize panel">−</button>
+      </h2>
+      <textarea id="dc-prompts" rows="5" placeholder="One prompt per line — same shape as Evaluation."></textarea>
+      <div class="row" style="margin-top:8px;">
+        <button class="primary" id="btn-dc-encode" style="flex:1;">Encode for filtering</button>
+      </div>
+      <div class="hint">
+        Encode docs, then filter by feature signature.
+      </div>
+    </div>
+    <div class="panel" data-show-tab="datacentric" id="dc-filter">
+      <h2>Filter
+        <button class="min-btn" data-min="dc-filter" title="Minimize panel">−</button>
+      </h2>
+      <div class="row">
+        <label style="flex:0 0 auto; font-size:11px; color:var(--fg-dim);">feature</label>
+        <input type="number" id="dc-filter-id" placeholder="feature id" min="0" max="199999" style="flex:1;" />
+        <select id="dc-filter-mode" style="flex:0 0 auto; background:rgba(0,0,0,0.35); color:var(--fg); border:1px solid var(--border-strong); border-radius:6px; padding:6px 8px; font-family:var(--mono); font-size:11px;">
+          <option value="include">includes</option>
+          <option value="exclude">excludes</option>
+        </select>
+      </div>
+      <div class="row" style="margin-top:8px;">
+        <button id="btn-dc-filter" style="flex:1;">Apply filter</button>
+        <button id="btn-dc-clear" class="ghost" style="flex:0 0 auto;">clear</button>
+      </div>
+      <div class="hint">
+        "Includes" = keep only docs where this feature fired (any activation).
+        "Excludes" = drop docs where it fired.
+      </div>
+    </div>
+    <div class="panel" data-show-tab="datacentric" id="dc-synth">
+      <h2>Steered synthesis
+        <button class="min-btn" data-min="dc-synth" title="Minimize panel">−</button>
+      </h2>
+      <div class="row">
+        <label style="flex:0 0 auto; font-size:11px; color:var(--fg-dim);">feature</label>
+        <input type="number" id="dc-synth-id" placeholder="feature id" min="0" max="199999" style="flex:1;" />
+        <label style="flex:0 0 auto; font-size:11px; color:var(--fg-dim);">α</label>
+        <input type="number" id="dc-synth-alpha" value="50" min="-200" max="200" style="flex:0 0 60px;" />
+      </div>
+      <div class="row" style="margin-top:6px;">
+        <label style="flex:0 0 auto; font-size:11px; color:var(--fg-dim);">tokens</label>
+        <input type="number" id="dc-synth-tokens" value="30" min="5" max="200" style="flex:0 0 70px;" />
+        <button class="primary" id="btn-dc-synth" style="flex:1;">Synthesize from corpus</button>
+      </div>
+      <div class="hint">
+        For each prompt in the corpus above, generate a steered completion with
+        <code>h ← h + α · W_dec[:, feature]</code>. Useful for producing
+        targeted training examples that fire a specific feature.
+      </div>
+    </div>
+  </section>
+  <!-- Bottom band -->
+  <section id="bottom">
+    <!-- STEERING bottom: baseline / steered / heatmap -->
+    <div class="panel" data-show-tab="steering" data-pid="baseline">
+      <h2>Baseline output (α=0)
+        <span style="display:flex; gap:8px; align-items:center;">
+          <span id="base-time" style="font-family:var(--mono); color:var(--fg-faint);"></span>
+          <button class="min-btn" data-min="baseline" title="Minimize panel">−</button>
+        </span>
+      </h2>
+      <div class="panel-body">
+        <div class="out-block baseline" id="out-baseline"><span class="empty">(no run yet)</span></div>
+      </div>
+    </div>
+    <div class="panel" data-show-tab="steering" data-pid="steered">
+      <h2>Steered output
+        <span style="display:flex; gap:8px; align-items:center;">
+          <span id="steered-time" style="font-family:var(--mono); color:var(--fg-faint);"></span>
+          <button class="min-btn" data-min="steered" title="Minimize panel">−</button>
+        </span>
+      </h2>
+      <div class="panel-body">
+        <div class="out-block steered" id="out-steered"><span class="empty">(no run yet)</span></div>
+        <div class="verifier" id="verifier"></div>
+      </div>
+    </div>
+    <!-- EVALUATION bottom: per-sample table + signature heatmap -->
+    <div class="panel" data-show-tab="evaluation" id="eval-samples-panel" style="grid-column: 1 / 2;">
+      <h2>Per-sample top features
+        <button class="min-btn" data-min="eval-samples-panel" title="Minimize panel">−</button>
+      </h2>
+      <div class="panel-body">
+        <div id="eval-samples"><span class="empty">Encode a corpus to populate.</span></div>
+      </div>
+    </div>
+    <div class="panel" data-show-tab="evaluation" id="eval-heatmap-panel" style="grid-column: 2 / 3;">
+      <h2>Signature heatmap / Compare results
+        <span style="display:flex; gap:6px; align-items:center;">
+          <button id="btn-eval-view-heatmap" class="ghost" style="padding:2px 8px; font-size:10px;">heatmap</button>
+          <button id="btn-eval-view-compare" class="ghost" style="padding:2px 8px; font-size:10px;">compare</button>
+          <button class="min-btn" data-min="eval-heatmap-panel" title="Minimize panel">−</button>
+        </span>
+      </h2>
+      <div class="panel-body">
+        <div id="eval-heatmap"><span class="empty">Encode a corpus to populate.</span></div>
+        <div id="eval-compare-results" style="display:none;"><span class="empty">Run Compare to populate.</span></div>
+      </div>
+    </div>
+    <!-- DATA-CENTRIC bottom: filtered list + synth output -->
+    <div class="panel" data-show-tab="datacentric" id="dc-filtered-panel">
+      <h2>Filtered docs <span id="dc-filter-stats" style="font-family:var(--mono); color:var(--fg-faint);"></span>
+        <button class="min-btn" data-min="dc-filtered-panel" title="Minimize panel">−</button>
+      </h2>
+      <div class="panel-body">
+        <div id="dc-filtered-list"><span class="empty">Encode + apply filter to populate.</span></div>
+      </div>
+    </div>
+    <div class="panel" data-show-tab="datacentric" id="dc-synth-panel">
+      <h2>Synthesis output <span id="dc-synth-stats" style="font-family:var(--mono); color:var(--fg-faint);"></span>
+        <button class="min-btn" data-min="dc-synth-panel" title="Minimize panel">−</button>
+      </h2>
+      <div class="panel-body">
+        <div id="dc-synth-list"><span class="empty">Run "Synthesize from corpus" to populate.</span></div>
+      </div>
+    </div>
+  </section>
+</div>
+<div id="tooltip"></div>
+<div class="footer">
+  <span id="status">idle</span>
+</div>
+<script>
+const API = "";   // same-origin: HF Space serves API + HTML from one process
+const N_FEATURES = 32768;
+// --- State ---
+const state = {
+  positions: null,         // Float32Array length N*3
+  topFeatures: [],         // [{id, act}, ...] from /encode
+  steering: new Map(),     // feature_id -> alpha
+  picked: null,            // currently hovered feature_id
+};
+// --- Three.js scene ---
+const scene = new THREE.Scene();
+scene.fog = new THREE.FogExp2(0x08080d, 0.08);
+const camera = new THREE.PerspectiveCamera(55, window.innerWidth / window.innerHeight, 0.01, 60);
+camera.position.set(0, 0, 3.4);
+const renderer = new THREE.WebGLRenderer({
+  canvas: document.getElementById("scene"),
+  antialias: true,
+  alpha: true,
+});
+renderer.setPixelRatio(Math.min(window.devicePixelRatio, 2));
+renderer.setSize(window.innerWidth, window.innerHeight);
+renderer.setClearColor(0x000000, 0);
+// Subtle starry haze background — many tiny dim points
+function addStarHaze() {
+  const N = 600;
+  const g = new THREE.BufferGeometry();
+  const pos = new Float32Array(N*3);
+  for (let i=0; i<N; i++) {
+    const r = 18 + Math.random()*6;
+    const theta = Math.random()*Math.PI*2;
+    const phi = Math.acos(2*Math.random()-1);
+    pos[i*3] = r*Math.sin(phi)*Math.cos(theta);
+    pos[i*3+1] = r*Math.sin(phi)*Math.sin(theta);
+    pos[i*3+2] = r*Math.cos(phi);
+  }
+  g.setAttribute("position", new THREE.BufferAttribute(pos, 3));
+  const m = new THREE.PointsMaterial({ color: 0x2a2a3a, size: 0.06, sizeAttenuation: true, transparent:true, opacity:0.5 });
+  scene.add(new THREE.Points(g, m));
+}
+addStarHaze();
+// Feature point cloud — built once positions arrive
+let featurePoints = null, featureGeometry = null;
+function buildFeatureCloud(positions) {
+  featureGeometry = new THREE.BufferGeometry();
+  const N = positions.length / 3;
+  featureGeometry.setAttribute("position", new THREE.BufferAttribute(positions, 3));
+  const colors = new Float32Array(N*3);
+  const sizes  = new Float32Array(N);
+  for (let i=0; i<N; i++) {
+    colors[i*3] = 1.0; colors[i*3+1] = 1.0; colors[i*3+2] = 1.0;
+    sizes[i] = 0.5;
+  }
+  featureGeometry.setAttribute("color", new THREE.BufferAttribute(colors, 3));
+  featureGeometry.setAttribute("size",  new THREE.BufferAttribute(sizes, 1));
+  const material = new THREE.ShaderMaterial({
+    uniforms: { uPixelRatio: { value: renderer.getPixelRatio() } },
+    vertexShader: `
+      attribute float size;
+      varying vec3 vColor;
+      varying float vIsActive;
+      varying float vSize;
+      uniform float uPixelRatio;
+      void main() {
+        vColor = color;
+        vIsActive = step(1.0, size);  // active features (sizes attribute >= 1.5)
+        vec4 mv = modelViewMatrix * vec4(position, 1.0);
+        gl_Position = projectionMatrix * mv;
+        // Base features: small fixed-pixel hard dots (~3 device px, so 6 retina px).
+        // Active features: size attribute -> px directly, with mild perspective
+        // (closer = larger), capped so they never balloon out.
+        float basePx  = 3.0 * uPixelRatio;
+        float scaledPx = size * 3.0 * (3.4 / -mv.z) * uPixelRatio;
+        scaledPx = clamp(scaledPx, basePx, 26.0 * uPixelRatio);
+        gl_PointSize = mix(basePx, scaledPx, vIsActive);
+        vSize = gl_PointSize;
+      }`,
+    fragmentShader: `
+      varying vec3 vColor;
+      varying float vIsActive;
+      varying float vSize;
+      void main() {
+        vec2 uv = gl_PointCoord - 0.5;
+        float d = length(uv);
+        if (d > 0.5) discard;
+        // Anti-alias the rim with a fixed 1-pixel band, regardless of point size,
+        // so even tiny points stay solid and crisp instead of going gaussian.
+        float aa = 1.0 / max(vSize, 2.0);
+        float core = 1.0 - smoothstep(0.5 - aa, 0.5, d);
+        // Halo only on active features
+        float glow = vIsActive * (1.0 - smoothstep(0.18, 0.5, d)) * 0.35;
+        vec3 col = mix(vColor, vColor * 1.5, vIsActive);
+        float alpha = clamp(core + glow, 0.0, 1.0);
+        gl_FragColor = vec4(col, alpha);
+      }`,
+    transparent: true,
+    depthWrite: false,
+    vertexColors: true,
+    blending: THREE.NormalBlending,
+  });
+  featurePoints = new THREE.Points(featureGeometry, material);
+  scene.add(featurePoints);
+}
+// Lazy orbit-like rotation (no heavy controls dep)
+let userInteracting = false, autoRot = 0.06;
+let dragging = false, lastX = 0, lastY = 0;
+let yaw = 0.4, pitch = 0.05, dist = 3.4;
+const cnv = document.getElementById("scene");
+cnv.addEventListener("pointerdown", e => { dragging = true; userInteracting = true; lastX = e.clientX; lastY = e.clientY; });
+window.addEventListener("pointerup", () => { dragging = false; });
+window.addEventListener("pointermove", e => {
+  if (dragging) {
+    yaw   += (e.clientX - lastX) * 0.005;
+    pitch += (e.clientY - lastY) * 0.005;
+    pitch = Math.max(-1.4, Math.min(1.4, pitch));
+    lastX = e.clientX; lastY = e.clientY;
+  }
+});
+cnv.addEventListener("wheel", e => {
+  dist *= (1 + e.deltaY * 0.0012);
+  dist = Math.max(1.3, Math.min(8, dist));
+  e.preventDefault();
+}, { passive:false });
+window.addEventListener("resize", () => {
+  camera.aspect = window.innerWidth / window.innerHeight;
+  camera.updateProjectionMatrix();
+  renderer.setSize(window.innerWidth, window.innerHeight);
+});
+function tick(t) {
+  if (!userInteracting) yaw += 0.0015;
+  camera.position.x = dist * Math.sin(yaw) * Math.cos(pitch);
+  camera.position.y = dist * Math.sin(pitch);
+  camera.position.z = dist * Math.cos(yaw) * Math.cos(pitch);
+  camera.lookAt(0, 0, 0);
+  renderer.render(scene, camera);
+  requestAnimationFrame(tick);
+}
+requestAnimationFrame(tick);
+// --- Picking (raycasting) ---
+const raycaster = new THREE.Raycaster();
+raycaster.params.Points = { threshold: 0.04 };
+const mouse = new THREE.Vector2();
+const tooltip = document.getElementById("tooltip");
+cnv.addEventListener("mousemove", e => {
+  mouse.x = (e.clientX / window.innerWidth) * 2 - 1;
+  mouse.y = -(e.clientY / window.innerHeight) * 2 + 1;
+  if (!featurePoints) return;
+  raycaster.setFromCamera(mouse, camera);
+  const hits = raycaster.intersectObject(featurePoints);
+  if (hits.length > 0) {
+    const id = hits[0].index;
+    state.picked = id;
+    const top = state.topFeatures.find(t => t.id === id);
+    const isSteered = state.steering.has(id);
+    const alpha = isSteered ? state.steering.get(id) : null;
+    let html = `<div style="color:var(--fg); font-weight:600;">feature ${id}</div>`;
+    if (top) {
+      const rank = state.topFeatures.indexOf(top);
+      html += `<div style="margin-top:3px; color:var(--accent);">rank #${rank} of top firing · activation ${top.act.toFixed(3)}</div>`;
+    } else {
+      html += `<div style="margin-top:3px; color:var(--fg-faint);">not in top firing for current prompt</div>`;
+    }
+    if (isSteered) {
+      const sign = alpha >= 0 ? "+" : "";
+      html += `<div style="margin-top:3px; color:var(--accent-2);">steered · α=${sign}${alpha.toFixed(0)}</div>`;
+    }
+    html += `<div style="margin-top:5px; color:var(--fg-faint); font-size:10px;">click to ${isSteered ? "edit slider" : "add steering slider"}</div>`;
+    tooltip.innerHTML = html;
+    tooltip.style.display = "block";
+    tooltip.style.left = e.clientX + "px";
+    tooltip.style.top = e.clientY + "px";
+  } else {
+    state.picked = null;
+    tooltip.style.display = "none";
+  }
+});
+cnv.addEventListener("click", () => {
+  if (state.picked != null) {
+    addSteerSlot(state.picked, 0);
+  }
+});
+// --- Update particle attributes after encode ---
+function repaintCloud() {
+  if (!featureGeometry) return;
+  const colors = featureGeometry.attributes.color.array;
+  const sizes  = featureGeometry.attributes.size.array;
+  const N = sizes.length;
+  for (let i=0; i<N; i++) {
+    colors[i*3] = 1.0; colors[i*3+1] = 1.0; colors[i*3+2] = 1.0;
+    sizes[i] = 0.5;
+  }
+  // Top firing — cyan, larger
+  const maxAct = state.topFeatures[0]?.act || 1;
+  for (const f of state.topFeatures) {
+    const i = f.id;
+    const t = Math.min(1, Math.abs(f.act) / maxAct);
+    colors[i*3]   = 0.49 * t + 1.0 * (1-t);
+    colors[i*3+1] = 0.97 * t + 1.0 * (1-t);
+    colors[i*3+2] = 1.00;
+    sizes[i] = 1.5 + 4.5 * t;
+  }
+  // Steered — magenta override
+  for (const [id, slot] of state.steering) {
+    const alpha = (typeof slot === "object") ? slot.alpha : slot;
+    if (alpha === 0) continue;
+    const t = Math.min(1, Math.abs(alpha) / 100);
+    colors[id*3]   = 1.00;
+    colors[id*3+1] = 0.49 * (1-t) + 0.4 * t;
+    colors[id*3+2] = 0.97;
+    sizes[id] = 4 + 5 * t;
+  }
+  featureGeometry.attributes.color.needsUpdate = true;
+  featureGeometry.attributes.size.needsUpdate = true;
+}
+// --- API helpers ---
+function setStatus(msg, busy=false) {
+  const el = document.getElementById("status");
+  el.innerHTML = busy ? `<span class="loader"></span> &nbsp; ${msg}` : msg;
+}
+async function api(path, body=null) {
+  const opts = body ? {method:"POST", headers:{"Content-Type":"application/json"}, body: JSON.stringify(body)}
+                    : {method:"GET"};
+  const r = await fetch(API + path, opts);
+  if (!r.ok) throw new Error(`${path} -> ${r.status} ${r.statusText}`);
+  return await r.json();
+}
+// --- Boot ---
+function applyHealthToHeader(hp) {
+  const layerInput = document.getElementById("layer-input");
+  layerInput.value = hp.layer;
+  layerInput.max = (hp.n_layers || 1) - 1;
+  layerInput.title = `Layer 0..${hp.n_layers - 1}`;
+  document.getElementById("hdr-layer-meta").textContent = ` · ${hp.device} · ${hp.dtype}`;
+  document.getElementById("hdr-features").textContent = `${hp.n_features.toLocaleString()} features`;
+  const hdr = document.getElementById("header");
+  hdr.classList.remove("loading");
+  hdr.classList.add("live");
+  if (hp.transferred && hp.note) {
+    setStatus("⚠ transferred SAE: " + hp.note);
+  }
+}
+// Layer hot-swap: change SAE layer without reloading model
+let layerSwapPending = null;
+document.getElementById("layer-input").addEventListener("change", async (ev) => {
+  const newLayer = parseInt(ev.target.value);
+  const hp = await api("/health");
+  if (newLayer === hp.layer) return;
+  if (isNaN(newLayer) || newLayer < 0 || newLayer >= hp.n_layers) {
+    ev.target.value = hp.layer; return;
+  }
+  // Clear stale UI state — features and outputs are layer-specific
+  state.topFeatures = []; state.steering.clear();
+  document.getElementById("features").innerHTML = `<div class="empty">Encode a prompt to populate features.</div>`;
+  document.getElementById("feat-count").textContent = "";
+  document.getElementById("out-baseline").innerHTML = `<span class="empty">(no run yet)</span>`;
+  document.getElementById("out-steered").innerHTML = `<span class="empty">(no run yet)</span>`;
+  document.getElementById("verifier").innerHTML = "";
+  document.getElementById("heatmap-grid").innerHTML = `<span class="empty">Encode a prompt — heatmap fills automatically.</span>`;
+  document.getElementById("btn-generate").disabled = true;
+  showLoading(`Switching to layer ${newLayer}…`,
+              "First time may take ~20s (download + SVD); subsequent switches are <0.1s.");
+  try {
+    const r = await api("/set_layer", {layer: newLayer});
+    const hp2 = await api("/health");
+    applyHealthToHeader(hp2);
+    applyPositionsToCloud(r.positions);
+    setStatus(`layer ${newLayer}` + (r.from_cache ? " (cached)" : ""));
+  } catch (e) {
+    setStatus(`layer swap error: ${e.message}`);
+    ev.target.value = hp.layer;
+  } finally {
+    hideLoading();
+  }
+});
+function applyPositionsToCloud(positions) {
+  const flat = new Float32Array(positions.length * 3);
+  for (let i=0; i<positions.length; i++) {
+    flat[i*3] = positions[i][0] * 2.2;
+    flat[i*3+1] = positions[i][1] * 2.2;
+    flat[i*3+2] = positions[i][2] * 2.2;
+  }
+  state.positions = flat;
+  if (featurePoints) {
+    scene.remove(featurePoints);
+    featureGeometry.dispose();
+    featurePoints.material.dispose();
+    featurePoints = null; featureGeometry = null;
+  }
+  buildFeatureCloud(flat);
+}
+function fillModelDropdown(models, currentModel) {
+  const sel = document.getElementById("model-select");
+  sel.innerHTML = "";
+  for (const m of models) {
+    const opt = document.createElement("option");
+    opt.value = m.model;
+    const xferTag = m.transferred ? " ⚠" : "";
+    opt.textContent = `${m.model.replace("Qwen/","")}  (${m.approx_size_gb}GB · ${m.n_features.toLocaleString()}f)${xferTag}`;
+    if (m.model === currentModel) opt.selected = true;
+    sel.appendChild(opt);
+  }
+  // If only one model in the catalog (HF Space deployment), make it
+  // visually informational rather than interactive.
+  if (models.length <= 1) {
+    sel.disabled = true;
+    sel.title = "Locked to Qwen3-1.7B-Base on HF Space (free CPU). Run locally to swap models.";
+    sel.style.cursor = "default";
+    sel.style.opacity = "0.85";
+  }
+}
+function showLoading(msg, detail="") {
+  document.getElementById("lo-msg").textContent = msg;
+  document.getElementById("lo-detail").textContent = detail;
+  document.getElementById("loading-overlay").classList.add("visible");
+  document.getElementById("header").classList.remove("live");
+  document.getElementById("header").classList.add("loading");
+  document.getElementById("btn-encode").disabled = true;
+  document.getElementById("btn-generate").disabled = true;
+}
+function hideLoading() {
+  document.getElementById("loading-overlay").classList.remove("visible");
+  document.getElementById("btn-encode").disabled = false;
+}
+async function boot() {
+  setStatus("loading positions…", true);
+  try {
+    const [hp, ph, ml] = await Promise.all([
+      api("/health"),
+      api("/positions"),
+      api("/list_models"),
+    ]);
+    fillModelDropdown(ml.models, hp.model);
+    applyHealthToHeader(hp);
+    applyPositionsToCloud(ph.positions);
+    setStatus("ready");
+    if (hp.transferred && hp.note) setStatus("⚠ " + hp.note);
+  } catch (e) {
+    setStatus(`server unreachable at ${API}: ${e.message}`);
+    document.getElementById("model-select").innerHTML = `<option style="color:var(--bad)">server offline</option>`;
+  }
+}
+boot();
+// Tab switching
+document.body.dataset.tab = "steering";
+document.querySelectorAll(".tab").forEach(btn => {
+  btn.addEventListener("click", () => {
+    document.querySelectorAll(".tab").forEach(b => b.classList.remove("active"));
+    btn.classList.add("active");
+    document.body.dataset.tab = btn.dataset.tab;
+  });
+});
+// Minimize toggle on every panel header
+document.querySelectorAll(".min-btn").forEach(btn => {
+  btn.addEventListener("click", (ev) => {
+    ev.stopPropagation();
+    const panel = btn.closest(".panel");
+    if (!panel) return;
+    panel.classList.toggle("collapsed");
+    btn.textContent = panel.classList.contains("collapsed") ? "+" : "−";
+    btn.title = panel.classList.contains("collapsed") ? "Expand panel" : "Minimize panel";
+  });
+});
+// Model swap on dropdown change
+document.getElementById("model-select").addEventListener("change", async (ev) => {
+  const newModel = ev.target.value;
+  const ml = await api("/list_models");
+  const entry = ml.models.find(m => m.model === newModel);
+  if (!entry) return;
+  const ok = confirm(
+    `Load ${newModel}?\n\n` +
+    `~${entry.approx_size_gb}GB download (or cached if seen before).\n` +
+    `${entry.n_features.toLocaleString()} SAE features, ${entry.n_layers} layers.\n` +
+    (entry.transferred ? `⚠ TRANSFERRED SAE: ${entry.note}\n` : ``) +
+    `\nThis blocks the server until ready.`
+  );
+  if (!ok) {
+    // revert dropdown to current
+    const hp = await api("/health");
+    ev.target.value = hp.model;
+    return;
+  }
+  // Reset client-side state
+  state.topFeatures = []; state.steering.clear();
+  document.getElementById("features").innerHTML = `<div class="empty">Encode a prompt to populate features.</div>`;
+  document.getElementById("feat-count").textContent = "";
+  document.getElementById("out-baseline").innerHTML = `<span class="empty">(no run yet)</span>`;
+  document.getElementById("out-steered").innerHTML = `<span class="empty">(no run yet)</span>`;
+  document.getElementById("verifier").innerHTML = "";
+  document.getElementById("btn-generate").disabled = true;
+  showLoading(`Loading ${newModel}…`,
+              `~${entry.approx_size_gb}GB · this is real, watch the server log`);
+  try {
+    const r = await fetch(API + "/load_model", {
+      method:"POST", headers:{"Content-Type":"application/json"},
+      body: JSON.stringify({model:newModel})
+    });
+    if (!r.ok) {
+      const txt = await r.text();
+      throw new Error(`${r.status}: ${txt}`);
+    }
+    const data = await r.json();
+    const hp = await api("/health");
+    applyHealthToHeader(hp);
+    applyPositionsToCloud(data.positions);
+    setStatus(`loaded ${newModel}`);
+    if (hp.transferred && hp.note) setStatus("⚠ " + hp.note);
+  } catch (e) {
+    setStatus(`load failed: ${e.message}`);
+    alert(`Load failed:\n\n${e.message}\n\nCheck the server log.`);
+    // Try to revert dropdown
+    try {
+      const hp = await api("/health");
+      ev.target.value = hp.model;
+    } catch {}
+  } finally {
+    hideLoading();
+  }
+});
+// --- Encode action ---
+document.getElementById("btn-encode").addEventListener("click", async () => {
+  const prompt = document.getElementById("prompt").value;
+  if (!prompt.trim()) return;
+  setStatus("encoding…", true);
+  try {
+    const t0 = performance.now();
+    const top_n = Math.max(1, parseInt(document.getElementById("top-k").value || "20"));
+    const r = await api("/encode", {prompt, top_n});
+    const dt = ((performance.now() - t0)/1000).toFixed(2);
+    state.topFeatures = r.top;
+    document.getElementById("feat-count").textContent = `(K=${r.top.length} of ${r.n_features.toLocaleString()})`;
+    renderFeatures();
+    repaintCloud();
+    document.getElementById("btn-generate").disabled = false;
+    setStatus(`encoded in ${dt}s`);
+    // Auto-render the per-token heatmap too
+    renderHeatmap(prompt).catch(e => console.error("heatmap:", e));
+  } catch (e) {
+    setStatus(`encode error: ${e.message}`);
+  }
+});
+document.getElementById("heatmap-skip-first").addEventListener("change", () => {
+  if (state._lastHeatmapPrompt) renderHeatmap(state._lastHeatmapPrompt);
+});
+// =====================================================================
+//   PILLAR 2 — EVALUATION
+// =====================================================================
+function parsePrompts(textareaId) {
+  return document.getElementById(textareaId).value
+    .split("\n").map(s => s.trim()).filter(s => s.length > 0);
+}
+document.getElementById("btn-eval-encode").addEventListener("click", async () => {
+  const prompts = parsePrompts("eval-prompts");
+  if (prompts.length === 0) { setStatus("paste prompts first"); return; }
+  setStatus(`encoding ${prompts.length} prompts…`, true);
+  document.getElementById("eval-features-list").innerHTML = `<span class="loader"></span>`;
+  document.getElementById("eval-samples").innerHTML = `<span class="loader"></span>`;
+  document.getElementById("eval-heatmap").innerHTML = `<span class="loader"></span>`;
+  try {
+    const t0 = performance.now();
+    const r = await api("/encode_batch", {prompts, top_n: 10});
+    const dt = ((performance.now()-t0)/1000).toFixed(2);
+    state.evalResult = r;
+    document.getElementById("eval-stats").textContent =
+      `(${r.n_samples} samples · ${r.corpus_features.length} features fired)`;
+    renderEvalCorpus(r);
+    renderEvalSamples(r);
+    renderEvalHeatmap(r);
+    setStatus(`corpus encoded in ${dt}s`);
+  } catch (e) {
+    setStatus(`eval encode error: ${e.message}`);
+  }
+});
+function renderEvalCorpus(r) {
+  const div = document.getElementById("eval-features-list");
+  if (!r.corpus_features.length) {
+    div.innerHTML = `<div class="empty">No features fired.</div>`;
+    return;
+  }
+  // Top firing features ranked by fire_rate, with bar
+  const max = r.corpus_features[0].fire_rate;
+  div.innerHTML = r.corpus_features.slice(0, 60).map(f => {
+    const w = Math.max(2, 100 * f.fire_rate / max);
+    return `<div style="padding:5px 8px; border-bottom:1px solid var(--border); font-family:var(--mono); font-size:11px;">
+      <div style="display:flex; justify-content:space-between; gap:10px; align-items:center;">
+        <span style="color:var(--accent);">feat ${f.id}</span>
+        <span style="color:var(--fg-faint);">${(f.fire_rate*100).toFixed(0)}% · μ=${f.mean_act.toFixed(2)}</span>
+      </div>
+      <div style="height:3px; background:rgba(125,249,255,0.55); width:${w}%; margin-top:4px; border-radius:2px;"></div>
+    </div>`;
+  }).join("");
+}
+function renderEvalSamples(r) {
+  const div = document.getElementById("eval-samples");
+  if (!r.per_sample.length) { div.innerHTML = `<span class="empty">no samples</span>`; return; }
+  div.innerHTML = r.per_sample.map(s => {
+    const top = s.top.slice(0,5).map(t =>
+      `<span style="display:inline-block; padding:1px 6px; margin:1px; border:1px solid var(--border-strong); border-radius:4px; font-family:var(--mono); font-size:10px; color:var(--accent);">${t.id}<span style="color:var(--fg-faint);">·${t.act.toFixed(1)}</span></span>`
+    ).join("");
+    return `<div style="padding:8px; border-bottom:1px solid var(--border);">
+      <div style="font-family:var(--mono); font-size:11px; color:var(--fg); margin-bottom:4px;">[${s.i}] ${escapeHtml(s.preview)}</div>
+      <div>${top}</div>
+    </div>`;
+  }).join("");
+}
+function renderEvalHeatmap(r) {
+  const div = document.getElementById("eval-heatmap");
+  // Pick top 20 features that fire most across samples
+  const topFeats = r.corpus_features.slice(0, 20);
+  if (!topFeats.length || !r.per_sample.length) { div.innerHTML = `<span class="empty">no data</span>`; return; }
+  // Build sample-id × feature_id activation matrix
+  const sampleActs = r.per_sample.map(s => {
+    const m = {};
+    for (const t of s.top) m[t.id] = t.act;
+    return m;
+  });
+  const max = Math.max(1e-6, ...sampleActs.flatMap(m => Object.values(m)));
+  // Render: rows = samples, cols = features
+  const header = `<th style="padding:3px 6px; font-size:10px; color:var(--fg-dim); background:rgba(0,0,0,0.4); position:sticky; left:0; z-index:2;">sample</th>` +
+    topFeats.map(f => `<th title="feat ${f.id} (${(f.fire_rate*100).toFixed(0)}% rate)" style="padding:3px 4px; font-size:9px; color:var(--accent); border-bottom:1px solid var(--border-strong);">${f.id}</th>`).join("");
+  const rows = r.per_sample.map((s, si) => {
+    const cells = topFeats.map(f => {
+      const v = sampleActs[si][f.id] || 0;
+      const t = Math.min(1, v/max);
+      const r = Math.round(255 - 200*t), g = 255, b = Math.round(255 - 50*t);
+      return `<td title="sample ${si} · feat ${f.id} · ${v.toFixed(2)}" style="background:rgb(${r},${g},${b}); width:22px; height:18px; border:1px solid rgba(0,0,0,0.3);"></td>`;
+    }).join("");
+    return `<tr><td style="padding:2px 6px; font-family:var(--mono); font-size:10px; color:var(--fg-dim); background:rgba(0,0,0,0.3); position:sticky; left:0; z-index:1; max-width:80px; overflow:hidden; text-overflow:ellipsis; white-space:nowrap;">${si}: ${escapeHtml(s.preview.slice(0,16))}</td>${cells}</tr>`;
+  }).join("");
+  div.innerHTML = `<table style="border-collapse:collapse; font-family:var(--mono);"><thead><tr>${header}</tr></thead><tbody>${rows}</tbody></table>`;
+}
+function escapeHtml(s) {
+  return String(s).replace(/&/g,"&amp;").replace(/</g,"&lt;").replace(/>/g,"&gt;").replace(/"/g,"&quot;");
+}
+// View toggle in Evaluation bottom-right panel: heatmap vs compare
+document.getElementById("btn-eval-view-heatmap").addEventListener("click", () => {
+  document.getElementById("eval-heatmap").style.display = "block";
+  document.getElementById("eval-compare-results").style.display = "none";
+});
+document.getElementById("btn-eval-view-compare").addEventListener("click", () => {
+  document.getElementById("eval-heatmap").style.display = "none";
+  document.getElementById("eval-compare-results").style.display = "block";
+});
+// Differential feature mining
+document.getElementById("btn-cmp").addEventListener("click", async () => {
+  const a = parsePrompts("cmp-a");
+  const b = parsePrompts("cmp-b");
+  if (a.length === 0 || b.length === 0) {
+    setStatus("paste both sets first"); return;
+  }
+  setStatus(`comparing ${a.length} vs ${b.length} prompts…`, true);
+  // Auto-switch the bottom-right panel to compare view
+  document.getElementById("eval-heatmap").style.display = "none";
+  document.getElementById("eval-compare-results").style.display = "block";
+  document.getElementById("eval-compare-results").innerHTML = `<span class="loader"></span> &nbsp; encoding…`;
+  try {
+    const t0 = performance.now();
+    const r = await api("/compare_batch", {prompts_a: a, prompts_b: b, top_n: 30});
+    const dt = ((performance.now()-t0)/1000).toFixed(2);
+    renderCompareResults(r, a, b);
+    setStatus(`compared in ${dt}s`);
+  } catch (e) {
+    setStatus(`compare error: ${e.message}`);
+    document.getElementById("eval-compare-results").innerHTML = `<span class="empty">error: ${e.message}</span>`;
+  }
+});
+function renderCompareResults(r, setA, setB) {
+  const div = document.getElementById("eval-compare-results");
+  if (!r.top_diff || !r.top_diff.length) {
+    div.innerHTML = `<span class="empty">no distinguishing features found</span>`;
+    return;
+  }
+  const max = r.top_diff[0].diff || 1;
+  const header = `
+    <div style="font-size:10px; color:var(--fg-faint); padding:4px 0; display:flex; gap:14px; flex-wrap:wrap; border-bottom:1px solid var(--border-strong); margin-bottom:6px;">
+      <span><span style="color:#7df9ff; font-weight:700;">■ A</span> ${r.n_a} prompts</span>
+      <span><span style="color:#ff7df9; font-weight:700;">■ B</span> ${r.n_b} prompts</span>
+      <span>top ${r.top_diff.length} by |Δ rate|</span>
+    </div>`;
+  const rows = r.top_diff.map((f, i) => {
+    const wA = Math.max(2, 60 * f.rate_a);
+    const wB = Math.max(2, 60 * f.rate_b);
+    const winner = f.winner === "a" ? "A" : "B";
+    const winnerColor = f.winner === "a" ? "#7df9ff" : "#ff7df9";
+    return `<tr style="border-bottom:1px solid var(--border);">
+      <td style="padding:3px 6px; color:var(--fg-faint); font-size:10px;">${i+1}</td>
+      <td style="padding:3px 6px; color:var(--accent); font-family:var(--mono);">${f.id}</td>
+      <td style="padding:3px 6px; text-align:right; font-family:var(--mono);">${(f.rate_a*100).toFixed(0)}%</td>
+      <td style="padding:3px 6px;">
+        <div style="display:flex; gap:2px; align-items:center;">
+          <div style="width:${wA}px; height:6px; background:#7df9ff;"></div>
+          <div style="width:${wB}px; height:6px; background:#ff7df9;"></div>
+        </div>
+      </td>
+      <td style="padding:3px 6px; text-align:right; font-family:var(--mono);">${(f.rate_b*100).toFixed(0)}%</td>
+      <td style="padding:3px 6px; font-family:var(--mono); font-size:10px; color:${winnerColor};">${winner} ▲</td>
+      <td style="padding:3px 6px; text-align:right; font-family:var(--mono); color:var(--fg);">${(f.diff*100).toFixed(0)}%</td>
+    </tr>`;
+  }).join("");
+  div.innerHTML = `
+    <div>
+      ${header}
+      <div style="overflow:auto; max-height:32vh;">
+        <table style="border-collapse:collapse; width:100%; font-size:11px;">
+          <thead>
+            <tr style="background:rgba(0,0,0,0.4); position:sticky; top:0; z-index:1;">
+              <th style="padding:4px 6px; font-size:10px; color:var(--fg-dim); text-align:left;">#</th>
+              <th style="padding:4px 6px; font-size:10px; color:var(--fg-dim); text-align:left;">feat</th>
+              <th style="padding:4px 6px; font-size:10px; color:var(--fg-dim); text-align:right;">rate A</th>
+              <th style="padding:4px 6px; font-size:10px; color:var(--fg-dim); text-align:left;">A vs B</th>
+              <th style="padding:4px 6px; font-size:10px; color:var(--fg-dim); text-align:right;">rate B</th>
+              <th style="padding:4px 6px; font-size:10px; color:var(--fg-dim); text-align:left;">winner</th>
+              <th style="padding:4px 6px; font-size:10px; color:var(--fg-dim); text-align:right;">|Δ|</th>
+            </tr>
+          </thead>
+          <tbody>${rows}</tbody>
+        </table>
+      </div>
+    </div>`;
+}
+// =====================================================================
+//   PILLAR 3 — DATA-CENTRIC
+// =====================================================================
+document.getElementById("btn-dc-encode").addEventListener("click", async () => {
+  const prompts = parsePrompts("dc-prompts");
+  if (prompts.length === 0) { setStatus("paste prompts first"); return; }
+  setStatus(`encoding ${prompts.length} prompts…`, true);
+  try {
+    const r = await api("/encode_batch", {prompts, top_n: 50});
+    state.dcResult = r;
+    state.dcAllPrompts = prompts;
+    state.dcFiltered = r.per_sample;
+    renderDcFiltered(state.dcFiltered, prompts, null);
+    setStatus(`encoded ${r.n_samples} prompts`);
+  } catch (e) {
+    setStatus(`dc encode error: ${e.message}`);
+  }
+});
+document.getElementById("btn-dc-filter").addEventListener("click", () => {
+  if (!state.dcResult) { setStatus("encode first"); return; }
+  const fid = parseInt(document.getElementById("dc-filter-id").value);
+  const mode = document.getElementById("dc-filter-mode").value;
+  if (isNaN(fid)) { setStatus("enter a feature id"); return; }
+  const fired = new Set();
+  for (const s of state.dcResult.per_sample) {
+    if (s.top.some(t => t.id === fid)) fired.add(s.i);
+  }
+  const filtered = state.dcResult.per_sample.filter(s =>
+    mode === "include" ? fired.has(s.i) : !fired.has(s.i)
+  );
+  state.dcFiltered = filtered;
+  renderDcFiltered(filtered, state.dcAllPrompts, {id: fid, mode});
+});
+document.getElementById("btn-dc-clear").addEventListener("click", () => {
+  if (!state.dcResult) return;
+  state.dcFiltered = state.dcResult.per_sample;
+  renderDcFiltered(state.dcFiltered, state.dcAllPrompts, null);
+});
+function renderDcFiltered(samples, prompts, filt) {
+  const div = document.getElementById("dc-filtered-list");
+  const stats = document.getElementById("dc-filter-stats");
+  if (filt) {
+    stats.textContent = `(filter: ${filt.mode === "include" ? "+" : "−"}feat ${filt.id} · ${samples.length} of ${state.dcResult.n_samples})`;
+  } else {
+    stats.textContent = `(${samples.length} docs)`;
+  }
+  if (!samples.length) { div.innerHTML = `<span class="empty">no docs match.</span>`; return; }
+  div.innerHTML = samples.map(s => {
+    const fullPrompt = prompts[s.i] || s.preview;
+    const top = s.top.slice(0,3).map(t =>
+      `<span style="display:inline-block; padding:1px 5px; margin:1px; border:1px solid var(--border-strong); border-radius:3px; font-family:var(--mono); font-size:10px; color:var(--accent);">${t.id}</span>`
+    ).join("");
+    return `<div style="padding:6px 8px; border-bottom:1px solid var(--border); font-family:var(--mono); font-size:11px;">
+      <div style="color:var(--fg-faint); font-size:9px;">[${s.i}]</div>
+      <div style="color:var(--fg);">${escapeHtml(fullPrompt)}</div>
+      <div style="margin-top:3px;">${top}</div>
+    </div>`;
+  }).join("");
+}
+document.getElementById("btn-dc-synth").addEventListener("click", async () => {
+  const prompts = state.dcAllPrompts || parsePrompts("dc-prompts");
+  if (!prompts.length) { setStatus("paste seeds first"); return; }
+  const fid = parseInt(document.getElementById("dc-synth-id").value);
+  const alpha = parseFloat(document.getElementById("dc-synth-alpha").value);
+  const max_new_tokens = parseInt(document.getElementById("dc-synth-tokens").value);
+  if (isNaN(fid)) { setStatus("enter feature id"); return; }
+  setStatus(`synthesizing ${prompts.length} steered completions…`, true);
+  document.getElementById("dc-synth-list").innerHTML = `<span class="loader"></span> &nbsp; running…`;
+  try {
+    const r = await api("/synth_batch", {
+      seed_prompts: prompts,
+      steering: [{id: fid, alpha}],
+      max_new_tokens,
+    });
+    document.getElementById("dc-synth-stats").textContent =
+      `(${r.results.length} · feat ${fid} α=${alpha})`;
+    document.getElementById("dc-synth-list").innerHTML = r.results.map((res, i) =>
+      `<div style="padding:8px; border-bottom:1px solid var(--border); font-family:var(--mono); font-size:11px;">
+        <div style="color:var(--fg-faint); font-size:9px;">[${i}] seed</div>
+        <div style="color:var(--fg-dim);">${escapeHtml(res.seed)}</div>
+        <div style="color:var(--fg-faint); font-size:9px; margin-top:4px;">→ steered</div>
+        <div style="color:var(--fg);">${escapeHtml(res.text)}</div>
+      </div>`
+    ).join("");
+    setStatus("synthesis done");
+  } catch (e) {
+    setStatus(`synth error: ${e.message}`);
+    document.getElementById("dc-synth-list").innerHTML = `<span class="empty">error: ${e.message}</span>`;
+  }
+});
+async function renderHeatmap(prompt) {
+  const container = document.getElementById("heatmap-grid");
+  state._lastHeatmapPrompt = prompt;
+  container.innerHTML = `<span class="loader"></span> &nbsp; computing…`;
+  try {
+    const r = await api("/encode_full", {prompt, top_n: 16});
+    let tokens = r.tokens, grid = r.grid, ids = r.feature_ids;
+    const skipFirst = document.getElementById("heatmap-skip-first").checked;
+    if (skipFirst && tokens.length > 1) {
+      tokens = tokens.slice(1);
+      grid = grid.map(row => row.slice(1));
+    }
+    if (tokens.length === 0) {
+      container.innerHTML = `<span class="empty">No tokens (after skip).</span>`;
+      return;
+    }
+    // Build HTML table
+    const rowMax = grid.map(row => Math.max(...row.map(Math.abs), 1e-6));
+    const tokHeader = tokens.map((t,i) => {
+      const safe = (t.replace(/\n/g,"↵").replace(/</g,"&lt;").replace(/>/g,"&gt;")).slice(0,8);
+      return `<th title="pos ${i}: ${t.replace(/\n/g,'\\n').replace(/"/g,'&quot;')}" style="padding:3px 4px; font-size:10px; color:var(--fg-dim); border-bottom:1px solid var(--border-strong); white-space:nowrap;">${safe || `[${i}]`}</th>`;
+    }).join("");
+    const rows = grid.map((row, fi) => {
+      const m = rowMax[fi];
+      const cells = row.map((v, pi) => {
+        const t = Math.min(1, Math.abs(v) / m);
+        const r = 255, g = Math.round(255 - 200*t), b = Math.round(255 - 220*t);
+        return `<td title="feat ${ids[fi]} · pos ${pi} · act=${v.toFixed(3)}" style="background:rgb(${r},${g},${b}); width:30px; height:22px; border:1px solid rgba(0,0,0,0.3);"></td>`;
+      }).join("");
+      return `<tr><td style="font-family:var(--mono); font-size:10px; padding:3px 6px; color:var(--accent); white-space:nowrap; background:rgba(0,0,0,0.3); position:sticky; left:0; z-index:1;">#${ids[fi]}</td>${cells}</tr>`;
+    }).join("");
+    container.innerHTML = `<table style="border-collapse:collapse; font-family:var(--mono);"><thead><tr><th style="padding:3px 6px; font-size:10px; color:var(--fg-dim); position:sticky; left:0; z-index:2; background:rgba(0,0,0,0.4);">feat</th>${tokHeader}</tr></thead><tbody>${rows}</tbody></table>`;
+  } catch (e) {
+    container.innerHTML = `<span class="empty">heatmap error: ${e.message}</span>`;
+  }
+}
+// --- Feature card render ---
+function renderFeatures() {
+  const div = document.getElementById("features");
+  if (state.topFeatures.length === 0) {
+    div.innerHTML = `<div class="empty">Encode a prompt to populate features.</div>`;
+    return;
+  }
+  div.innerHTML = "";
+  for (const f of state.topFeatures) {
+    div.appendChild(buildFeatureCard(f.id, f.act, /*topRanked=*/true));
+  }
+  // Also render any steered features that weren't in top-K
+  for (const [id, alpha] of state.steering) {
+    if (!state.topFeatures.find(t => t.id === id)) {
+      div.appendChild(buildFeatureCard(id, null, /*topRanked=*/false));
+    }
+  }
+}
+function buildFeatureCard(id, act, topRanked) {
+  const wrap = document.createElement("div");
+  wrap.className = "feat" + (state.steering.has(id) ? " steered" : "");
+  const alpha = state.steering.has(id) ? state.steering.get(id) : 0;
+  const head = document.createElement("div");
+  head.className = "feat-head";
+  const left = document.createElement("div");
+  left.innerHTML = `<span class="feat-id">feat ${id}</span>` +
+                   (act != null ? ` &nbsp;<span class="feat-act">act ${act.toFixed(3)}</span>` : ` &nbsp;<span class="feat-act" style="color:var(--fg-faint)">(picked)</span>`);
+  const tools = document.createElement("div");
+  tools.className = "feat-tools";
+  if (state.steering.has(id)) {
+    const reset = document.createElement("button");
+    reset.textContent = "reset";
+    reset.title = "Set α back to 0 (no intervention) but keep slider visible";
+    reset.addEventListener("click", () => {
+      state.steering.set(id, 0);
+      renderFeatures();
+      repaintCloud();
+    });
+    tools.appendChild(reset);
+    const off = document.createElement("button");
+    off.textContent = "remove";
+    off.title = "Remove this slider entirely";
+    off.addEventListener("click", () => {
+      state.steering.delete(id);
+      renderFeatures();
+      repaintCloud();
+    });
+    tools.appendChild(off);
+  } else {
+    const add = document.createElement("button");
+    add.textContent = "steer";
+    add.addEventListener("click", () => addSteerSlot(id, 0));
+    tools.appendChild(add);
+  }
+  head.appendChild(left);
+  head.appendChild(tools);
+  wrap.appendChild(head);
+  if (state.steering.has(id)) {
+    const cur = state.steering.get(id);
+    const curAlpha = (typeof cur === "object") ? cur.alpha : cur;
+    const curPositions = (typeof cur === "object") ? (cur.positions || "") : "";
+    const curOutOnly = (typeof cur === "object") ? !!cur.output_only : false;
+    const sl = document.createElement("div");
+    sl.className = "slider-row";
+    const range = document.createElement("input");
+    range.type = "range"; range.min = -100; range.max = 100; range.step = 1;
+    range.value = curAlpha;
+    const val = document.createElement("span");
+    val.className = "alpha-val";
+    val.textContent = (curAlpha >= 0 ? "+" : "") + curAlpha.toFixed(0);
+    range.addEventListener("input", () => {
+      const a = parseFloat(range.value);
+      const slot = state.steering.get(id);
+      const next = (typeof slot === "object") ? {...slot, alpha:a} : {alpha:a, positions:"", output_only:false};
+      state.steering.set(id, next);
+      val.textContent = (a >= 0 ? "+" : "") + a.toFixed(0);
+      repaintCloud();
+    });
+    sl.appendChild(range);
+    sl.appendChild(val);
+    wrap.appendChild(sl);
+    // Position-selective steering + output-only toggle
+    const adv = document.createElement("div");
+    adv.style.cssText = "margin-top:6px; display:flex; gap:6px; align-items:center; flex-wrap:wrap;";
+    const posLbl = document.createElement("span");
+    posLbl.textContent = "positions";
+    posLbl.style.cssText = "font-size:10px; color:var(--fg-faint); flex:0 0 auto;";
+    const posInput = document.createElement("input");
+    posInput.type = "text";
+    posInput.placeholder = "all";
+    posInput.value = curPositions;
+    posInput.style.cssText = "flex:1; min-width:60px; font-size:11px; padding:3px 6px; background:rgba(0,0,0,0.3); border:1px solid var(--border-strong); color:var(--fg); border-radius:4px; font-family:var(--mono);";
+    posInput.title = "Token positions to steer at: 'all' or '3-7' or '0,2,5-8'. Empty = all.";
+    posInput.addEventListener("change", () => {
+      const slot = state.steering.get(id);
+      const next = (typeof slot === "object") ? {...slot, positions: posInput.value} : {alpha: slot, positions: posInput.value, output_only: false};
+      state.steering.set(id, next);
+    });
+    const outLbl = document.createElement("label");
+    outLbl.style.cssText = "font-size:10px; color:var(--fg-faint); display:inline-flex; align-items:center; gap:3px; cursor:pointer; flex:0 0 auto;";
+    const outChk = document.createElement("input");
+    outChk.type = "checkbox";
+    outChk.checked = curOutOnly;
+    outChk.style.cssText = "margin:0;";
+    outChk.addEventListener("change", () => {
+      const slot = state.steering.get(id);
+      const next = (typeof slot === "object") ? {...slot, output_only: outChk.checked} : {alpha: slot, positions: "", output_only: outChk.checked};
+      state.steering.set(id, next);
+    });
+    outLbl.appendChild(outChk);
+    outLbl.appendChild(document.createTextNode("output only"));
+    adv.appendChild(posLbl);
+    adv.appendChild(posInput);
+    adv.appendChild(outLbl);
+    wrap.appendChild(adv);
+    const lbl = document.createElement("div");
+    lbl.style.cssText = "font-size:10px; color:var(--fg-faint); margin-top:4px;";
+    lbl.textContent = "α: −100 ← suppress … +100 → amplify";
+    wrap.appendChild(lbl);
+  }
+  return wrap;
+}
+function addSteerSlot(id, alpha=0) {
+  state.steering.set(id, alpha);
+  renderFeatures();
+  repaintCloud();
+}
+// --- Generate action ---
+document.getElementById("btn-generate").addEventListener("click", async () => {
+  const prompt = document.getElementById("prompt").value;
+  const max_new_tokens = parseInt(document.getElementById("max-tokens").value || "40");
+  const steering = [];
+  for (const [id, slot] of state.steering) {
+    const alpha = (typeof slot === "object") ? slot.alpha : slot;
+    if (alpha === 0) continue;
+    const positions = (typeof slot === "object") ? (slot.positions || null) : null;
+    const output_only = (typeof slot === "object") ? !!slot.output_only : false;
+    steering.push({id, alpha, positions, output_only});
+  }
+  setStatus("generating baseline…", true);
+  document.getElementById("out-baseline").innerHTML = `<span class="loader"></span>`;
+  document.getElementById("out-steered").innerHTML = `<span class="loader"></span>`;
+  try {
+    const t0 = performance.now();
+    const baseline = await api("/generate", {prompt, steering: [], max_new_tokens, return_probs: true, topk_display: 8});
+    const t1 = performance.now();
+    renderTokenChips("out-baseline", baseline, "blue");
+    document.getElementById("base-time").textContent = ((t1-t0)/1000).toFixed(2) + "s";
+    if (steering.length === 0) {
+      document.getElementById("out-steered").innerHTML = `<span class="empty">(no sliders engaged — steered = baseline)</span>`;
+      document.getElementById("steered-time").textContent = "";
+      document.getElementById("verifier").innerHTML = "";
+      setStatus("done");
+      return;
+    }
+    setStatus("generating steered…", true);
+    const t2 = performance.now();
+    const steered = await api("/generate", {prompt, steering, max_new_tokens, return_probs: true, topk_display: 8});
+    const t3 = performance.now();
+    renderTokenChips("out-steered", steered, "magenta");
+    document.getElementById("steered-time").textContent = ((t3-t2)/1000).toFixed(2) + "s";
+    // Verifier
+    const v = document.getElementById("verifier");
+    v.innerHTML = "";
+    for (const row of steered.verifier) {
+      const d = row.steered - row.base;
+      const cls = d > 0 ? "delta-up" : (d < 0 ? "delta-down" : "");
+      const sign = d >= 0 ? "+" : "";
+      const ext = [];
+      if (row.positions) ext.push(`pos=${row.positions}`);
+      if (row.output_only) ext.push("output-only");
+      const extStr = ext.length ? ` · ${ext.join(" · ")}` : "";
+      const div = document.createElement("div");
+      div.innerHTML = `feat ${row.id} · α=${row.alpha.toFixed(0)}${extStr} · base=${row.base.toFixed(2)} → steered=${row.steered.toFixed(2)} <span class="${cls}">(Δ ${sign}${d.toFixed(2)})</span>`;
+      v.appendChild(div);
+    }
+    setStatus("done");
+  } catch (e) {
+    setStatus(`generate error: ${e.message}`);
+    document.getElementById("out-baseline").textContent = "(error)";
+    document.getElementById("out-steered").textContent = "(error)";
+  }
+});
+// --- Per-token probability chip strip ---
+function renderTokenChips(containerId, gen, theme) {
+  const container = document.getElementById(containerId);
+  if (!gen.tokens || !gen.tokens.length) {
+    container.textContent = gen.text || "(empty)";
+    return;
+  }
+  // Theme: colors for chip background
+  const themeFns = {
+    blue: (p) => {
+      const t = Math.max(0, Math.min(1, p));
+      const r = Math.round(255 * (1 - t*0.85));
+      const g = Math.round(255 * (1 - t*0.55));
+      return [r, g, 255, t < 0.5 ? "#1e3a8a" : "#fff"];
+    },
+    magenta: (p) => {
+      const t = Math.max(0, Math.min(1, p));
+      const r = 255;
+      const g = Math.round(255 * (1 - t*0.7));
+      const b = Math.round(255 * (1 - t*0.5));
+      return [r, g, b, t < 0.5 ? "#7f1d1d" : "#fff"];
+    },
+  };
+  const colorize = themeFns[theme] || themeFns.blue;
+  const chips = gen.tokens.map((row, i) => {
+    const [r,g,b,fg] = colorize(row.prob);
+    const tokDisp = row.tok.replace(/\n/g,"↵").replace(/\t/g,"→");
+    const safe = escapeHtml(tokDisp);
+    const panelHtml = topkPanelHtml(row.topk);
+    return `<span class="tok-chip" data-panel='${escapeAttr(panelHtml)}' style="background:rgb(${r},${g},${b}); color:${fg}; padding:2px 6px; margin:1px; border-radius:4px; cursor:pointer; display:inline-block; font-family:var(--mono); font-size:11px; white-space:nowrap;">${safe}<sub style="opacity:.65; font-size:8px; margin-left:3px;">${(row.prob*100).toFixed(1)}%</sub></span>`;
+  }).join("");
+  container.innerHTML = `
+    <div class="tok-strip" data-prob-root style="line-height:2.4;">
+      <div style="font-size:10px; color:var(--fg-faint); margin-bottom:4px; font-style:italic;">click any token to see top-K candidates</div>
+      <div>${chips}</div>
+      <div data-topk-panel style="display:none; margin-top:6px; padding:6px; background:rgba(0,0,0,0.35); border:1px solid var(--border-strong); border-radius:6px; font-family:var(--mono); font-size:10px;"></div>
+    </div>`;
+  // Wire click-to-pin
+  container.querySelectorAll(".tok-chip").forEach(chip => {
+    chip.addEventListener("click", () => {
+      const root = chip.closest("[data-prob-root]");
+      const panel = root.querySelector("[data-topk-panel]");
+      const wasSelected = chip.dataset.selected === "1";
+      root.querySelectorAll(".tok-chip").forEach(c => {
+        c.dataset.selected = "0"; c.style.outline = "";
+      });
+      if (wasSelected) {
+        panel.innerHTML = ""; panel.style.display = "none";
+      } else {
+        chip.dataset.selected = "1";
+        chip.style.outline = "2px solid #94a3b8";
+        chip.style.outlineOffset = "-1px";
+        panel.innerHTML = chip.dataset.panel;
+        panel.style.display = "block";
+      }
+    });
+  });
+}
+function topkPanelHtml(topk) {
+  const rows = topk.map((c, idx) => {
+    const safe = escapeHtml(c.tok.replace(/\n/g,"↵").replace(/\t/g,"→"));
+    const bg = c.is_chosen ? "background:rgba(125,249,255,0.12);" : "";
+    const fw = c.is_chosen ? "font-weight:700;" : "";
+    const mark = c.is_chosen ? " ✓" : "";
+    return `<tr style="${bg}">
+      <td style="padding:2px 6px; color:var(--fg-faint); text-align:right;">${idx+1}</td>
+      <td style="padding:2px 6px; color:var(--fg); ${fw}">${safe}${mark}</td>
+      <td style="padding:2px 6px; text-align:right; color:var(--accent);">${(c.prob*100).toFixed(2)}%</td>
+    </tr>`;
+  }).join("");
+  return `<table style="border-collapse:collapse; width:100%;"><tbody>${rows}</tbody></table>`;
+}
+function escapeAttr(s) { return s.replace(/'/g, "&apos;").replace(/"/g, "&quot;"); }
+</script>
+</body>
+</html>

qwen_scope_obs.py ADDED Viewed

	@@ -0,0 +1,129 @@

+"""Observation-only tooling for Qwen-Scope SAE features.
+Three utilities — none of them steer or modify generation. They exist to let a
+reviewer interrogate what the SAE features encode, before any intervention:
+  * encode_prompts(model, tokenizer, sae, prompts, layer)
+        For each prompt, returns the last-token sparse feature code (shape:
+        len(prompts) x n_features).
+  * top_features_for_prompt(...)
+        The N strongest-firing features for a single prompt, with activation
+        values. Equivalent to the read-only path in qwen_scope_steer.
+  * differential_features(pos_codes, neg_codes, top_n)
+        Given two stacks of feature codes, returns features ranked by
+        (mean activation on positive set) - (mean activation on negative set).
+        Useful for "which features distinguish concept A from concept B?"
+  * scan_prompts_for_feature(codes, feature_id)
+        Given a stack of codes and a feature id, returns the per-prompt
+        activation values (zero where the feature didn't make TopK).
+Nothing here generates steered text. Wire it up to the steering hooks in
+qwen_scope_steer.py only when intervention is the explicit experimental goal,
+and document the intervention separately.
+"""
+from __future__ import annotations
+from dataclasses import dataclass
+from typing import Sequence
+import torch
+from qwen_scope_steer import SAE, capture_residual
+@dataclass
+class FeatureRanking:
+    feature_id: int
+    score: float           # (pos_mean - neg_mean) for differential, or activation for top-features
+    pos_mean: float | None = None
+    neg_mean: float | None = None
+    pos_fire_rate: float | None = None
+    neg_fire_rate: float | None = None
+def encode_prompts(model, tokenizer, sae: SAE, prompts: Sequence[str],
+                   layer_idx: int) -> torch.Tensor:
+    """Encode the last-token residual of each prompt through the SAE.
+    Returns codes of shape (len(prompts), n_features) on CPU float32 for
+    stable downstream stats. No generation is performed.
+    """
+    codes = []
+    for p in prompts:
+        inputs = tokenizer(p, return_tensors="pt").to(model.device)
+        with torch.no_grad(), capture_residual(model, layer_idx) as bucket:
+            model(**inputs)
+        h_last = bucket["h"][0, -1].unsqueeze(0)
+        z = sae.encode(h_last)[0]
+        codes.append(z.detach().to("cpu", torch.float32))
+    return torch.stack(codes, dim=0)
+def top_features_for_prompt(model, tokenizer, sae: SAE, prompt: str,
+                            layer_idx: int, top_n: int = 10) -> list[FeatureRanking]:
+    codes = encode_prompts(model, tokenizer, sae, [prompt], layer_idx)[0]
+    nz = codes.nonzero(as_tuple=False).flatten()
+    vals = codes[nz]
+    order = vals.argsort(descending=True)[:top_n]
+    return [FeatureRanking(feature_id=int(nz[i]), score=float(vals[i]))
+            for i in order]
+def differential_features(pos_codes: torch.Tensor, neg_codes: torch.Tensor,
+                          top_n: int = 20) -> list[FeatureRanking]:
+    """Rank features by their differential firing across two prompt sets.
+    pos_codes: (P, F) feature codes for the "positive" prompt set
+    neg_codes: (N, F) feature codes for the "negative" prompt set
+    Returns top_n features by (pos_mean - neg_mean), with both means and
+    per-set fire rates (fraction of prompts where the feature fired).
+    Read-only — no generation, no steering.
+    """
+    if pos_codes.shape[1] != neg_codes.shape[1]:
+        raise ValueError(f"feature dim mismatch: {pos_codes.shape} vs {neg_codes.shape}")
+    pos_mean = pos_codes.mean(dim=0)
+    neg_mean = neg_codes.mean(dim=0)
+    diff = pos_mean - neg_mean
+    pos_fire = (pos_codes != 0).float().mean(dim=0)
+    neg_fire = (neg_codes != 0).float().mean(dim=0)
+    order = diff.argsort(descending=True)[:top_n]
+    return [
+        FeatureRanking(
+            feature_id=int(i),
+            score=float(diff[i]),
+            pos_mean=float(pos_mean[i]),
+            neg_mean=float(neg_mean[i]),
+            pos_fire_rate=float(pos_fire[i]),
+            neg_fire_rate=float(neg_fire[i]),
+        )
+        for i in order
+    ]
+def scan_prompts_for_feature(codes: torch.Tensor, feature_id: int) -> torch.Tensor:
+    """Per-prompt activation vector for a single feature (zero where it didn't make TopK)."""
+    return codes[:, feature_id]
+def fire_rate(codes: torch.Tensor, feature_id: int) -> float:
+    """Fraction of prompts on which the feature fired (was in TopK)."""
+    return float((codes[:, feature_id] != 0).float().mean())
+def pretty_ranking(rs: list[FeatureRanking]) -> str:
+    out = []
+    for r in rs:
+        if r.pos_mean is not None:
+            out.append(
+                f"  feat {r.feature_id:>6d}  "
+                f"diff={r.score:+8.4f}  "
+                f"pos_mean={r.pos_mean:+8.4f}  neg_mean={r.neg_mean:+8.4f}  "
+                f"pos_fire={r.pos_fire_rate:.2f}  neg_fire={r.neg_fire_rate:.2f}"
+            )
+        else:
+            out.append(f"  feat {r.feature_id:>6d}  act={r.score:+8.4f}")
+    return "\n".join(out)

qwen_scope_steer.py ADDED Viewed

	@@ -0,0 +1,279 @@

+"""Qwen-Scope SAE feature reading + steering for transformers.
+End-to-end demo:
+  1. Loads a base Qwen3 model and a matching Qwen-Scope TopK SAE checkpoint.
+  2. Captures the residual-stream output of a chosen decoder layer.
+  3. Encodes it through the SAE -> top-K firing features.
+  4. Generates a baseline completion.
+  5. Re-generates with feature steering: residual h <- h + alpha * W_dec[:, feat]
+     applied via register_forward_hook on every forward pass.
+Verified against:
+  * Qwen/SAE-Res-Qwen3-1.7B-Base-W32K-L0_50  (W_enc 32768x2048, W_dec 2048x32768,
+    b_enc 32768, b_dec 2048, all float32, K=50)
+  * Qwen/Qwen3-1.7B-Base  (28 Qwen3DecoderLayer, hidden_size=2048, layer forward
+    returns bare torch.Tensor under transformers >= 5).
+"""
+from __future__ import annotations
+import argparse
+import contextlib
+from dataclasses import dataclass
+from pathlib import Path
+import torch
+import torch.nn.functional as F
+from huggingface_hub import hf_hub_download
+from transformers import AutoModelForCausalLM, AutoTokenizer
+# ---------------------------------------------------------------------------
+# SAE
+# ---------------------------------------------------------------------------
+@dataclass
+class SAE:
+    W_enc: torch.Tensor   # (n_features, d_model)
+    W_dec: torch.Tensor   # (d_model, n_features)
+    b_enc: torch.Tensor   # (n_features,)
+    b_dec: torch.Tensor   # (d_model,)
+    k: int                # TopK
+    layer: int            # layer index this SAE belongs to
+    @classmethod
+    def from_repo(cls, repo: str, layer: int, k: int, device: str = "cpu",
+                  dtype: torch.dtype = torch.float32) -> "SAE":
+        path = hf_hub_download(repo, f"layer{layer}.sae.pt")
+        return cls.from_path(path, layer=layer, k=k, device=device, dtype=dtype)
+    @classmethod
+    def from_path(cls, path: str | Path, layer: int, k: int,
+                  device: str = "cpu", dtype: torch.dtype = torch.float32) -> "SAE":
+        sd = torch.load(str(path), map_location=device, weights_only=True)
+        for key in ("W_enc", "W_dec", "b_enc", "b_dec"):
+            if key not in sd:
+                raise KeyError(f"SAE checkpoint at {path} missing key {key!r}; "
+                               f"got {list(sd.keys())}")
+        return cls(
+            W_enc=sd["W_enc"].to(device=device, dtype=dtype),
+            W_dec=sd["W_dec"].to(device=device, dtype=dtype),
+            b_enc=sd["b_enc"].to(device=device, dtype=dtype),
+            b_dec=sd["b_dec"].to(device=device, dtype=dtype),
+            k=k, layer=layer,
+        )
+    @property
+    def n_features(self) -> int:
+        return self.W_enc.shape[0]
+    @property
+    def d_model(self) -> int:
+        return self.W_enc.shape[1]
+    def encode(self, x: torch.Tensor) -> torch.Tensor:
+        """Encode residual stream activations -> sparse feature codes (TopK)."""
+        x = x.to(device=self.W_enc.device, dtype=self.W_enc.dtype)
+        pre = F.linear(x, self.W_enc, self.b_enc)         # (..., n_features)
+        topk_vals, topk_idx = pre.topk(self.k, dim=-1)
+        z = torch.zeros_like(pre)
+        z.scatter_(-1, topk_idx, topk_vals)
+        return z
+    def decode(self, z: torch.Tensor) -> torch.Tensor:
+        z = z.to(device=self.W_dec.device, dtype=self.W_dec.dtype)
+        return F.linear(z, self.W_dec, self.b_dec)
+    def steering_vector(self, feature_id: int) -> torch.Tensor:
+        return self.W_dec[:, feature_id].clone()
+# ---------------------------------------------------------------------------
+# Hook helpers
+# ---------------------------------------------------------------------------
+def _layer_output_to_tensor(out):
+    """Qwen3DecoderLayer returns torch.Tensor in transformers >= 5,
+    a tuple (hidden_states, ...) in transformers < 5. Handle both."""
+    if isinstance(out, tuple):
+        return out[0], out
+    return out, None
+def _rebuild_layer_output(new_h: torch.Tensor, original_out):
+    if original_out is None:
+        return new_h
+    return (new_h, *original_out[1:])
+@contextlib.contextmanager
+def capture_residual(model, layer_idx: int):
+    """Capture the residual-stream output of model.model.layers[layer_idx]."""
+    bucket: dict = {}
+    layer = model.model.layers[layer_idx]
+    def hook(_module, _inp, out):
+        h, _ = _layer_output_to_tensor(out)
+        bucket["h"] = h.detach()
+        return out
+    handle = layer.register_forward_hook(hook)
+    try:
+        yield bucket
+    finally:
+        handle.remove()
+@contextlib.contextmanager
+def steer(model, layer_idx: int, direction: torch.Tensor, alpha: float):
+    """Add `alpha * direction` to the residual stream output of layer_idx
+    on every forward pass while the context is active."""
+    layer = model.model.layers[layer_idx]
+    direction = direction.detach()
+    def hook(_module, _inp, out):
+        h, original = _layer_output_to_tensor(out)
+        d = direction.to(device=h.device, dtype=h.dtype)
+        new_h = h + alpha * d
+        return _rebuild_layer_output(new_h, original)
+    handle = layer.register_forward_hook(hook)
+    try:
+        yield
+    finally:
+        handle.remove()
+# ---------------------------------------------------------------------------
+# Pipeline
+# ---------------------------------------------------------------------------
+def read_top_features(model, tokenizer, sae: SAE, prompt: str,
+                      layer_idx: int, top_n: int = 10):
+    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+    with torch.no_grad(), capture_residual(model, layer_idx) as bucket:
+        model(**inputs)
+    h = bucket["h"]                                # (1, T, d_model) on model.device
+    h_last = h[0, -1].unsqueeze(0)                 # (1, d_model) — encode() handles device/dtype
+    z = sae.encode(h_last)[0]
+    nonzero = z.nonzero(as_tuple=False).flatten()
+    vals = z[nonzero]
+    order = vals.argsort(descending=True)
+    top = nonzero[order][:top_n]
+    return [(int(f.item()), float(z[f].item())) for f in top]
+def generate(model, tokenizer, prompt: str, max_new_tokens: int = 40):
+    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+    with torch.no_grad():
+        out = model.generate(
+            **inputs,
+            max_new_tokens=max_new_tokens,
+            do_sample=False,                      # deterministic for A/B comparison
+            pad_token_id=tokenizer.eos_token_id,
+        )
+    return tokenizer.decode(out[0], skip_special_tokens=True)
+# ---------------------------------------------------------------------------
+# CLI
+# ---------------------------------------------------------------------------
+def parse_topk_from_repo(repo: str) -> int:
+    # e.g. "Qwen/SAE-Res-Qwen3-1.7B-Base-W32K-L0_50" -> 50
+    suffix = repo.rsplit("L0_", 1)
+    if len(suffix) == 2 and suffix[1].isdigit():
+        return int(suffix[1])
+    return 50
+def main():
+    ap = argparse.ArgumentParser()
+    ap.add_argument("--model", default="Qwen/Qwen3-1.7B-Base")
+    ap.add_argument("--sae-repo", default="Qwen/SAE-Res-Qwen3-1.7B-Base-W32K-L0_50")
+    ap.add_argument("--layer", type=int, default=14)
+    ap.add_argument("--prompt", default="The capital of France is")
+    ap.add_argument("--max-new-tokens", type=int, default=40)
+    ap.add_argument("--alpha", type=float, default=-10.0,
+                    help="Steering magnitude. Negative suppresses, positive amplifies.")
+    ap.add_argument("--suppress-rank", type=int, default=0,
+                    help="Which top-firing feature (0 = strongest) to steer.")
+    ap.add_argument("--feature-id", type=int, default=None,
+                    help="Override: steer this exact feature instead of a top-rank pick.")
+    ap.add_argument("--topk", type=int, default=None,
+                    help="Override SAE TopK (auto-detected from repo name).")
+    ap.add_argument("--device", default=None,
+                    help="cuda | mps | cpu (auto if omitted)")
+    ap.add_argument("--dtype", default="bfloat16",
+                    choices=["bfloat16", "float16", "float32"])
+    args = ap.parse_args()
+    if args.device is None:
+        if torch.cuda.is_available():
+            args.device = "cuda"
+        elif torch.backends.mps.is_available():
+            args.device = "mps"
+        else:
+            args.device = "cpu"
+    dtype = {"bfloat16": torch.bfloat16, "float16": torch.float16,
+             "float32": torch.float32}[args.dtype]
+    print(f"[load] model={args.model} device={args.device} dtype={args.dtype}")
+    tokenizer = AutoTokenizer.from_pretrained(args.model)
+    model = AutoModelForCausalLM.from_pretrained(
+        args.model, dtype=dtype, device_map=args.device,
+    )
+    model.eval()
+    n_layers = len(model.model.layers)
+    if not (0 <= args.layer < n_layers):
+        raise ValueError(f"--layer {args.layer} out of range; model has {n_layers} layers")
+    hidden = model.config.hidden_size
+    print(f"[load] {type(model).__name__}: {n_layers} layers, hidden={hidden}")
+    k = args.topk or parse_topk_from_repo(args.sae_repo)
+    print(f"[load] SAE repo={args.sae_repo} layer={args.layer} K={k}")
+    sae = SAE.from_repo(args.sae_repo, layer=args.layer, k=k,
+                        device=args.device, dtype=dtype)
+    if sae.d_model != hidden:
+        raise ValueError(f"SAE d_model={sae.d_model} != model hidden_size={hidden}; "
+                         f"this SAE doesn't match this model.")
+    # 1. Top features for the prompt
+    print(f"\n[features] top firing at layer {args.layer} for prompt: {args.prompt!r}")
+    top = read_top_features(model, tokenizer, sae, args.prompt, args.layer, top_n=10)
+    for rank, (fid, act) in enumerate(top):
+        print(f"  rank {rank:2d}  feature {fid:>6d}   act={act:+.4f}")
+    # Pick steering target
+    if args.feature_id is not None:
+        target_id = args.feature_id
+    else:
+        target_id = top[args.suppress_rank][0]
+    # 2. Baseline generation
+    print(f"\n[baseline] generating (no steering)...")
+    baseline = generate(model, tokenizer, args.prompt, args.max_new_tokens)
+    print(f"  >>> {baseline!r}")
+    # 3. Steered generation
+    print(f"\n[steer] feature {target_id} at layer {args.layer} with alpha={args.alpha}")
+    direction = sae.steering_vector(target_id)
+    with steer(model, args.layer, direction, args.alpha):
+        steered = generate(model, tokenizer, args.prompt, args.max_new_tokens)
+    print(f"  >>> {steered!r}")
+    # 4. Verify the steering actually moved the feature
+    inputs = tokenizer(args.prompt, return_tensors="pt").to(model.device)
+    with torch.no_grad(), capture_residual(model, args.layer) as bucket:
+        model(**inputs)
+    base_act = sae.encode(bucket["h"][0, -1].unsqueeze(0))[0, target_id].item()
+    with torch.no_grad(), steer(model, args.layer, direction, args.alpha), \
+            capture_residual(model, args.layer) as bucket:
+        model(**inputs)
+    steered_act = sae.encode(bucket["h"][0, -1].unsqueeze(0))[0, target_id].item()
+    print(f"\n[verify] feature {target_id} activation: baseline={base_act:+.4f}  "
+          f"steered={steered_act:+.4f}  delta={steered_act - base_act:+.4f}")
+    if args.alpha > 0 and steered_act <= base_act:
+        print("  WARN: alpha>0 but activation didn't go up — unexpected.")
+    if args.alpha < 0 and steered_act >= base_act:
+        print("  WARN: alpha<0 but activation didn't go down — unexpected.")
+if __name__ == "__main__":
+    main()

requirements.txt ADDED Viewed

	@@ -0,0 +1,10 @@

+--extra-index-url https://download.pytorch.org/whl/cpu
+torch>=2.5.0
+transformers>=4.51.0
+accelerate>=1.0
+huggingface_hub>=0.25
+safetensors>=0.4
+numpy>=1.26
+fastapi>=0.110
+uvicorn[standard]>=0.30
+pydantic>=2.5

server.py ADDED Viewed

	@@ -0,0 +1,817 @@

+"""FastAPI backend for the Qwen-Scope HF Space deployment.
+Locked to Qwen3-1.7B-Base + the W32K-L0_50 SAE so it fits inside a
+free-tier HF Space (CPU, ~16GB RAM). Layer is still selectable.
+"""
+from __future__ import annotations
+import gc
+import json
+import os
+import threading
+from contextlib import asynccontextmanager
+from pathlib import Path
+import numpy as np
+import torch
+from fastapi import FastAPI, HTTPException
+from fastapi.middleware.cors import CORSMiddleware
+from fastapi.responses import FileResponse
+from pydantic import BaseModel
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from qwen_scope_steer import SAE, capture_residual, steer
+DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
+DTYPE = torch.float32   # bf16 on CPU is slow + flaky on free-tier hardware
+POSITIONS_DIR = Path(os.environ.get(
+    "POSITIONS_DIR",
+    str(Path(__file__).parent / "feature_positions"),
+))
+POSITIONS_DIR.mkdir(exist_ok=True, parents=True)
+# ---------------------------------------------------------------------------
+# Catalog of supported model + SAE pairs.
+# Verified against the Qwen org HF listing. For Qwen3.6 (no native SAE yet)
+# we point at the Qwen3.5 SAE that matches dimensions; this is a best-effort
+# fallback flagged as transferred=True in the response.
+# ---------------------------------------------------------------------------
+MODEL_CATALOG = [
+    {
+        "model": "Qwen/Qwen3-1.7B-Base",
+        "sae_repo": "Qwen/SAE-Res-Qwen3-1.7B-Base-W32K-L0_50",
+        "default_layer": 14, "n_layers": 28, "n_features": 32768,
+        "approx_size_gb": 3.4, "k": 50, "transferred": False,
+    },
+]
+# ---------------------------------------------------------------------------
+# State and locks
+# ---------------------------------------------------------------------------
+state: dict = {}
+load_lock = threading.Lock()
+def _find_decoder_layers(model):
+    """Return (layers_module_list, dotted_path) for any qwen3 / qwen3_5 model.
+    Handles:
+      * model.model.layers           (standard Qwen3*ForCausalLM)
+      * model.language_model.model.layers   (multimodal Qwen3_5ForConditionalGeneration)
+    """
+    for path in (("model", "model", "layers"),
+                 ("model", "layers"),
+                 ("language_model", "model", "layers"),
+                 ("model", "language_model", "model", "layers")):
+        obj = model
+        ok = True
+        for p in path:
+            if not hasattr(obj, p):
+                ok = False; break
+            obj = getattr(obj, p)
+        if ok and hasattr(obj, "__len__") and len(obj) > 0:
+            return obj, ".".join(path)
+    raise RuntimeError(f"could not locate decoder layers on "
+                       f"{type(model).__name__}")
+# ---------------------------------------------------------------------------
+# Position computation (cached per SAE).
+# Uses TruncatedSVD via numpy power-iteration for the 80K feature SAE,
+# economy SVD for smaller ones. Good enough for visualization layout.
+# ---------------------------------------------------------------------------
+def _safe_filename(s: str) -> str:
+    return s.replace("/", "__")
+def _positions_path(sae_repo: str, layer: int | None = None) -> Path:
+    if layer is None:
+        return POSITIONS_DIR / f"{_safe_filename(sae_repo)}.json"
+    return POSITIONS_DIR / f"{_safe_filename(sae_repo)}__L{layer}.json"
+def compute_positions(W_enc: torch.Tensor) -> list[list[float]]:
+    X = W_enc.detach().to("cpu", torch.float32).numpy()       # (n_features, d_model)
+    X = X - X.mean(axis=0, keepdims=True)
+    n, d = X.shape
+    if n * d <= 32768 * 4096:
+        # Economy SVD is fine for the smaller SAEs.
+        _, _, Vt = np.linalg.svd(X, full_matrices=False)
+        pos = X @ Vt[:3].T
+    else:
+        # Randomized SVD for very large SAEs (e.g. 80K * 5120).
+        rng = np.random.default_rng(0)
+        Q = rng.standard_normal((d, 8)).astype(np.float32)
+        for _ in range(3):  # power iterations
+            Q = X.T @ (X @ Q)
+            Q, _ = np.linalg.qr(Q)
+        Y = X @ Q                              # (n, 8)
+        _, _, Vt2 = np.linalg.svd(Y, full_matrices=False)
+        pos = Y @ Vt2[:3].T
+    pos = pos / max(abs(pos.min()), abs(pos.max()))
+    return pos.tolist()
+def load_or_compute_positions(W_enc: torch.Tensor, sae_repo: str,
+                              layer: int | None = None) -> list[list[float]]:
+    # Try layer-specific cache first; fall back to legacy SAE-repo-only cache
+    # so existing files don't go stale.
+    p_layer = _positions_path(sae_repo, layer)
+    p_legacy = _positions_path(sae_repo)
+    for p in (p_layer, p_legacy):
+        if p.exists():
+            try:
+                return json.loads(p.read_text())["positions"]
+            except Exception:
+                pass
+    pos = compute_positions(W_enc)
+    p_layer.write_text(json.dumps({"positions": pos}))
+    return pos
+# ---------------------------------------------------------------------------
+# Model + SAE loading (called both at startup and on /load_model)
+# ---------------------------------------------------------------------------
+def _free_current_state():
+    """Release the currently loaded model + SAE so a new one can fit."""
+    for k in ("model", "tokenizer", "sae", "layers"):
+        if k in state:
+            del state[k]
+    gc.collect()
+    if hasattr(torch, "mps") and torch.backends.mps.is_available():
+        try:
+            torch.mps.empty_cache()
+        except Exception:
+            pass
+    if torch.cuda.is_available():
+        try:
+            torch.cuda.empty_cache()
+        except Exception:
+            pass
+def _catalog_entry(model_name: str, sae_repo: str | None) -> dict:
+    """Find the catalog row that matches model_name (and optionally sae_repo)."""
+    for row in MODEL_CATALOG:
+        if row["model"] == model_name and (sae_repo is None or row["sae_repo"] == sae_repo):
+            return row
+    raise HTTPException(status_code=400,
+                        detail=f"unknown model/sae combination: {model_name} / {sae_repo}")
+def load_state(model_name: str, sae_repo: str | None = None,
+               layer: int | None = None, k: int = 50) -> dict:
+    """Replace the loaded model+SAE+layer with the requested one."""
+    entry = _catalog_entry(model_name, sae_repo)
+    sae_repo = entry["sae_repo"]
+    layer = entry["default_layer"] if layer is None else int(layer)
+    k = entry.get("k", k)
+    print(f"[load] {model_name} ({entry['approx_size_gb']:.0f}GB) "
+          f"+ SAE {sae_repo} layer {layer} on {DEVICE}")
+    _free_current_state()
+    tokenizer = AutoTokenizer.from_pretrained(model_name)
+    model = AutoModelForCausalLM.from_pretrained(
+        model_name, dtype=DTYPE, device_map=DEVICE,
+    )
+    model.eval()
+    layers, layers_path = _find_decoder_layers(model)
+    n_layers = len(layers)
+    if not (0 <= layer < n_layers):
+        layer = min(max(0, layer), n_layers - 1)
+    print(f"[load] model loaded: {type(model).__name__}, layers at "
+          f"'{layers_path}', n={n_layers}")
+    sae = SAE.from_repo(sae_repo, layer=layer, k=k, device=DEVICE, dtype=DTYPE)
+    print(f"[load] SAE loaded: n_features={sae.n_features}, d_model={sae.d_model}")
+    print("[load] computing/loading 3D feature positions")
+    positions = load_or_compute_positions(sae.W_enc, sae_repo, layer)
+    _sae_cache_put(sae_repo, layer, sae)
+    state.update(
+        model=model, tokenizer=tokenizer, sae=sae,
+        layers=layers, layers_path=layers_path,
+        positions=positions, n_layers=n_layers,
+        current_model=model_name, current_sae=sae_repo,
+        current_layer=layer, current_k=k,
+        catalog_entry=entry,
+    )
+    print("[load] ready")
+    return state
+# ---------------------------------------------------------------------------
+# Hook helpers — work against state["layers"] not model.model.layers
+# ---------------------------------------------------------------------------
+import contextlib
+@contextlib.contextmanager
+def _capture_at(layer_module):
+    bucket = {}
+    def hook(_m, _i, out):
+        h = out[0] if isinstance(out, tuple) else out
+        bucket["h"] = h.detach()
+        return out
+    handle = layer_module.register_forward_hook(hook)
+    try:
+        yield bucket
+    finally:
+        handle.remove()
+@contextlib.contextmanager
+def _steer_at(layer_module, direction, alpha, *,
+              positions=None, output_only=False, prompt_len=None):
+    """Hook adds α·direction to layer residual, with position/decode controls.
+    positions     : None or "all" → every token; list[int] → only those absolute
+                    token indices (works across prefill + decode).
+    output_only   : if True, only steer during decode (skip prefill entirely).
+    prompt_len    : length of the prompt; needed to map decode-step counter
+                    to absolute position when positions is a list.
+    """
+    direction = direction.detach()
+    counter = [0]
+    pos_set = set(positions) if isinstance(positions, (list, set)) else None
+    def hook(_m, _i, out):
+        h = out[0] if isinstance(out, tuple) else out
+        d = direction.to(device=h.device, dtype=h.dtype)
+        cur = counter[0]
+        counter[0] += 1
+        new_h = h
+        is_prefill = (cur == 0)
+        if is_prefill:
+            seq = h.shape[1]
+            if output_only:
+                pass  # leave prompt untouched
+            elif pos_set is None:
+                new_h = h + alpha * d
+            else:
+                new_h = h.clone()
+                for p in pos_set:
+                    if 0 <= p < seq:
+                        new_h[:, p, :] = new_h[:, p, :] + alpha * d
+        else:
+            # Decode step — h is [batch, 1, hidden] (one new token)
+            cur_pos = (prompt_len or 0) + cur - 1
+            if pos_set is None or output_only or (cur_pos in pos_set):
+                new_h = h + alpha * d
+        return (new_h, *out[1:]) if isinstance(out, tuple) else new_h
+    handle = layer_module.register_forward_hook(hook)
+    try:
+        yield
+    finally:
+        handle.remove()
+def _parse_positions(s: str | None):
+    """Parse '3', '3-7', '0,2,5-8', 'all', or None into a position spec.
+    Returns 'all' or a list[int] (or None if input is empty/None)."""
+    if s is None or not str(s).strip():
+        return None
+    s = str(s).strip().lower()
+    if s == "all":
+        return "all"
+    out: list[int] = []
+    for part in s.split(","):
+        part = part.strip()
+        if not part:
+            continue
+        if "-" in part:
+            try:
+                lo, hi = part.split("-", 1)
+                out.extend(range(int(lo), int(hi) + 1))
+            except ValueError:
+                continue
+        else:
+            try:
+                out.append(int(part))
+            except ValueError:
+                continue
+    return sorted(set(out)) if out else None
+def _hook_stack(layer_module, sae, specs, prompt_len=None):
+    from contextlib import ExitStack
+    stack = ExitStack()
+    for s in specs:
+        d = sae.steering_vector(s.id)
+        positions = _parse_positions(getattr(s, "positions", None))
+        output_only = bool(getattr(s, "output_only", False))
+        # "all" or None both mean "every position" inside _steer_at — pass None.
+        eff_positions = None if (positions is None or positions == "all") else positions
+        stack.enter_context(_steer_at(
+            layer_module, d, s.alpha,
+            positions=eff_positions,
+            output_only=output_only,
+            prompt_len=prompt_len,
+        ))
+    return stack
+# ---------------------------------------------------------------------------
+# Lifespan + app
+# ---------------------------------------------------------------------------
+@asynccontextmanager
+async def lifespan(app: FastAPI):
+    # Default startup: small model so the demo is interactive immediately.
+    load_state("Qwen/Qwen3-1.7B-Base")
+    yield
+    state.clear()
+app = FastAPI(lifespan=lifespan)
+app.add_middleware(CORSMiddleware, allow_origins=["*"],
+                   allow_methods=["*"], allow_headers=["*"])
+# ---------------------------------------------------------------------------
+# Request models
+# ---------------------------------------------------------------------------
+class EncodeRequest(BaseModel):
+    prompt: str
+    top_n: int = 20
+class SteerSpec(BaseModel):
+    id: int
+    alpha: float
+    positions: str | None = None     # "all" | "3-7" | "0,2,5" | None (= all)
+    output_only: bool = False        # if True, steer only during decode, not prompt
+class GenerateRequest(BaseModel):
+    prompt: str
+    steering: list[SteerSpec] = []
+    max_new_tokens: int = 40
+    return_probs: bool = False       # if True, return per-token softmax + top-K candidates
+    topk_display: int = 8            # number of candidate tokens to expose per step
+class LoadModelRequest(BaseModel):
+    model: str
+    sae_repo: str | None = None
+    layer: int | None = None
+class SetLayerRequest(BaseModel):
+    layer: int
+# ---------------------------------------------------------------------------
+# Routes
+# ---------------------------------------------------------------------------
+@app.get("/")
+def index():
+    return FileResponse(Path(__file__).parent / "index.html")
+@app.get("/health")
+def health():
+    sae = state.get("sae")
+    return {
+        "ok": True,
+        "model": state.get("current_model"),
+        "sae": state.get("current_sae"),
+        "layer": state.get("current_layer"),
+        "device": DEVICE,
+        "dtype": str(DTYPE).replace("torch.", ""),
+        "n_features": sae.n_features if sae else None,
+        "n_layers": state.get("n_layers"),
+        "transferred": state.get("catalog_entry", {}).get("transferred", False),
+        "note": state.get("catalog_entry", {}).get("note", ""),
+    }
+@app.get("/list_models")
+def list_models():
+    return {"models": MODEL_CATALOG}
+@app.post("/load_model")
+def load_model(req: LoadModelRequest):
+    with load_lock:
+        try:
+            load_state(req.model, req.sae_repo, req.layer)
+        except HTTPException:
+            raise
+        except Exception as e:
+            raise HTTPException(status_code=500, detail=f"load failed: {e}")
+    sae = state["sae"]
+    return {
+        "ok": True,
+        "model": state["current_model"],
+        "sae": state["current_sae"],
+        "layer": state["current_layer"],
+        "n_features": sae.n_features,
+        "n_layers": state["n_layers"],
+        "transferred": state["catalog_entry"].get("transferred", False),
+        "note": state["catalog_entry"].get("note", ""),
+        "positions": state["positions"],
+    }
+# In-memory LRU cache of recently-used SAE checkpoints, keyed by
+# (sae_repo, layer). Each SAE for the 1.7B model is ~537 MB on disk and
+# similar in RAM at fp32; for the 27B SAE it's ~3.3 GB. Cap conservatively.
+_sae_lru: "OrderedDict[tuple[str,int], SAE]" = None   # initialized lazily
+SAE_LRU_MAX = 6
+def _sae_cache_get(sae_repo: str, layer: int):
+    global _sae_lru
+    if _sae_lru is None:
+        from collections import OrderedDict
+        _sae_lru = OrderedDict()
+    key = (sae_repo, layer)
+    if key in _sae_lru:
+        _sae_lru.move_to_end(key)
+        return _sae_lru[key]
+    return None
+def _sae_cache_put(sae_repo: str, layer: int, sae: SAE):
+    global _sae_lru
+    if _sae_lru is None:
+        from collections import OrderedDict
+        _sae_lru = OrderedDict()
+    key = (sae_repo, layer)
+    _sae_lru[key] = sae
+    _sae_lru.move_to_end(key)
+    while len(_sae_lru) > SAE_LRU_MAX:
+        _sae_lru.popitem(last=False)
+@app.post("/set_layer")
+def set_layer(req: SetLayerRequest):
+    """Hot-swap the active SAE to a different layer of the same SAE repo.
+    Keeps the model loaded; just downloads (or fetches from cache) the new
+    layer's SAE checkpoint. Recomputes 3D positions for the new SAE
+    (cached on disk per SAE-repo+layer).
+    """
+    if "model" not in state:
+        raise HTTPException(status_code=400, detail="no model loaded")
+    n_layers = state["n_layers"]
+    layer = int(req.layer)
+    if not (0 <= layer < n_layers):
+        raise HTTPException(status_code=400,
+                            detail=f"layer must be in [0, {n_layers-1}]")
+    sae_repo = state["current_sae"]
+    if layer == state["current_layer"]:
+        return {"ok": True, "unchanged": True,
+                "layer": layer, "n_features": state["sae"].n_features,
+                "positions": state["positions"]}
+    with load_lock:
+        # 1. SAE itself — try LRU first
+        cached = _sae_cache_get(sae_repo, layer)
+        if cached is not None:
+            sae = cached
+            print(f"[layer-swap] SAE {sae_repo} layer {layer} from LRU cache")
+        else:
+            print(f"[layer-swap] loading SAE {sae_repo} layer {layer}")
+            k = state["catalog_entry"].get("k", 50)
+            sae = SAE.from_repo(sae_repo, layer=layer, k=k,
+                                device=DEVICE, dtype=DTYPE)
+            _sae_cache_put(sae_repo, layer, sae)
+        # 2. Positions — per-layer cache file on disk
+        positions_key = f"{sae_repo}__L{layer}"
+        p = POSITIONS_DIR / f"{_safe_filename(positions_key)}.json"
+        if p.exists():
+            try:
+                positions = json.loads(p.read_text())["positions"]
+            except Exception:
+                positions = compute_positions(sae.W_enc)
+                p.write_text(json.dumps({"positions": positions}))
+        else:
+            print(f"[layer-swap] computing positions for layer {layer}")
+            positions = compute_positions(sae.W_enc)
+            p.write_text(json.dumps({"positions": positions}))
+        state["sae"] = sae
+        state["current_layer"] = layer
+        state["positions"] = positions
+    return {
+        "ok": True,
+        "layer": layer,
+        "n_features": sae.n_features,
+        "positions": positions,
+        "from_cache": cached is not None,
+    }
+@app.get("/positions")
+def positions():
+    return {"positions": state["positions"]}
+@app.post("/encode")
+def encode(req: EncodeRequest):
+    model, tokenizer, sae = state["model"], state["tokenizer"], state["sae"]
+    layer_module = state["layers"][state["current_layer"]]
+    inputs = tokenizer(req.prompt, return_tensors="pt").to(model.device)
+    with torch.no_grad(), _capture_at(layer_module) as bucket:
+        model(**inputs)
+    h_last = bucket["h"][0, -1].unsqueeze(0)
+    z = sae.encode(h_last)[0]
+    nz = z.nonzero(as_tuple=False).flatten()
+    vals = z[nz]
+    order = vals.argsort(descending=True)[:req.top_n]
+    top = [{"id": int(nz[i].item()), "act": float(vals[i].item())} for i in order]
+    return {"top": top, "n_features": sae.n_features}
+class EncodeFullRequest(BaseModel):
+    prompt: str
+    top_n: int = 16   # number of feature ROWS to return in the heatmap
+@app.post("/encode_full")
+def encode_full(req: EncodeFullRequest):
+    """Return a per-token feature activation grid for a single prompt.
+    Picks the top_n features ranked by *mean activation across all token
+    positions* (matches the official app.py heatmap definition), then returns
+    each feature's activation at every token position. Activations that
+    didn't make TopK at a given position are zero.
+    """
+    model, tokenizer, sae = state["model"], state["tokenizer"], state["sae"]
+    layer_module = state["layers"][state["current_layer"]]
+    inputs = tokenizer(req.prompt, return_tensors="pt").to(model.device)
+    with torch.no_grad(), _capture_at(layer_module) as bucket:
+        model(**inputs)
+    h = bucket["h"][0]                     # (seq_len, d_model)
+    z = sae.encode(h)                      # (seq_len, n_features) sparse TopK
+    seq_len = z.shape[0]
+    # Token strings for column headers
+    ids = inputs["input_ids"][0].tolist()
+    tokens = [tokenizer.decode([t], skip_special_tokens=False) for t in ids]
+    # Rank features by mean activation across all positions
+    mean_per_feat = z.mean(dim=0)
+    top_vals, top_idx = mean_per_feat.topk(min(int(req.top_n), sae.n_features))
+    grid = z[:, top_idx]                   # (seq_len, top_n)
+    return {
+        "tokens": tokens,
+        "feature_ids": [int(i.item()) for i in top_idx],
+        "mean_acts": [float(v.item()) for v in top_vals],
+        # grid: outer list = features, inner list = positions (transposed for
+        # natural row-per-feature rendering in the UI)
+        "grid": [[float(grid[p, f].item()) for p in range(seq_len)]
+                 for f in range(grid.shape[1])],
+        "seq_len": seq_len,
+        "n_features": sae.n_features,
+    }
+class EncodeBatchRequest(BaseModel):
+    prompts: list[str]
+    top_n: int = 20    # top features per prompt to return individually
+@app.post("/encode_batch")
+def encode_batch(req: EncodeBatchRequest):
+    """Encode N prompts and return per-sample top features + corpus-level stats.
+    For each prompt: encode the last-token residual through the SAE, return
+    its top_n firing features. Corpus-level: union of features that fired
+    at all, with per-feature firing rate (fraction of prompts where it
+    appeared) and mean activation.
+    """
+    if not req.prompts:
+        return {"per_sample": [], "corpus_features": [], "n_features": state["sae"].n_features}
+    model, tokenizer, sae = state["model"], state["tokenizer"], state["sae"]
+    layer_module = state["layers"][state["current_layer"]]
+    per_sample = []
+    union_act_sum: dict[int, float] = {}
+    union_count: dict[int, int] = {}
+    for idx, p in enumerate(req.prompts):
+        inputs = tokenizer(p, return_tensors="pt").to(model.device)
+        with torch.no_grad(), _capture_at(layer_module) as bucket:
+            model(**inputs)
+        h_last = bucket["h"][0, -1].unsqueeze(0)
+        z = sae.encode(h_last)[0]
+        nz = z.nonzero(as_tuple=False).flatten()
+        vals = z[nz]
+        order = vals.argsort(descending=True)
+        top_idx = nz[order][:req.top_n]
+        top = [{"id": int(top_idx[i].item()), "act": float(z[top_idx[i]].item())}
+               for i in range(len(top_idx))]
+        per_sample.append({
+            "i": idx,
+            "preview": p[:80] + ("…" if len(p) > 80 else ""),
+            "len": len(p),
+            "top": top,
+            "n_active": int(len(nz)),
+        })
+        # Union stats over ALL nonzero features, not just top
+        for fid, v in zip(nz.tolist(), vals.tolist()):
+            union_count[fid] = union_count.get(fid, 0) + 1
+            union_act_sum[fid] = union_act_sum.get(fid, 0.0) + float(v)
+    n = len(req.prompts)
+    corpus = []
+    for fid, cnt in union_count.items():
+        corpus.append({
+            "id": fid,
+            "fire_rate": cnt / n,
+            "mean_act": union_act_sum[fid] / cnt,
+            "n_samples": cnt,
+        })
+    # Sort by fire_rate desc then mean_act desc
+    corpus.sort(key=lambda r: (-r["fire_rate"], -r["mean_act"]))
+    return {
+        "per_sample": per_sample,
+        "corpus_features": corpus[:200],   # cap to 200 most frequent
+        "n_features": sae.n_features,
+        "n_samples": n,
+    }
+class CompareBatchRequest(BaseModel):
+    prompts_a: list[str]
+    prompts_b: list[str]
+    top_n: int = 30
+@app.post("/compare_batch")
+def compare_batch(req: CompareBatchRequest):
+    """Differential feature mining between two prompt sets.
+    For each set: encode all prompts, compute per-feature firing rate
+    (fraction of prompts where the feature fires) and mean activation.
+    Rank features by |fire_rate_A − fire_rate_B|.
+    Returns top features that distinguish A from B.
+    """
+    model, tokenizer, sae = state["model"], state["tokenizer"], state["sae"]
+    layer_module = state["layers"][state["current_layer"]]
+    def _encode_set(prompts):
+        n_feats = sae.n_features
+        rate = torch.zeros(n_feats, dtype=torch.float32)
+        acts = torch.zeros(n_feats, dtype=torch.float32)
+        for p in prompts:
+            inputs = tokenizer(p, return_tensors="pt").to(model.device)
+            with torch.no_grad(), _capture_at(layer_module) as bucket:
+                model(**inputs)
+            h_last = bucket["h"][0, -1].unsqueeze(0)
+            z = sae.encode(h_last)[0].detach().to("cpu", torch.float32)
+            rate += (z != 0).float()
+            acts += z
+        if prompts:
+            rate /= len(prompts)
+            acts /= len(prompts)
+        return rate, acts
+    rate_a, acts_a = _encode_set(req.prompts_a)
+    rate_b, acts_b = _encode_set(req.prompts_b)
+    diff = (rate_a - rate_b).abs()
+    top_vals, top_idx = diff.topk(min(int(req.top_n), sae.n_features))
+    rows = []
+    for v, fid in zip(top_vals.tolist(), top_idx.tolist()):
+        rows.append({
+            "id": int(fid),
+            "diff": float(v),
+            "rate_a": float(rate_a[fid]),
+            "rate_b": float(rate_b[fid]),
+            "act_a": float(acts_a[fid]),
+            "act_b": float(acts_b[fid]),
+            "winner": "a" if rate_a[fid] >= rate_b[fid] else "b",
+        })
+    return {"top_diff": rows, "n_a": len(req.prompts_a), "n_b": len(req.prompts_b),
+            "n_features": sae.n_features}
+class SynthRequest(BaseModel):
+    seed_prompts: list[str]
+    steering: list[SteerSpec] = []
+    max_new_tokens: int = 40
+@app.post("/synth_batch")
+def synth_batch(req: SynthRequest):
+    """Bulk steered synthesis: run steered generate over N seed prompts.
+    Useful for the data-centric synthesis workflow: produce K examples
+    that fire feature F at strength α, for downstream training data.
+    """
+    if not req.seed_prompts:
+        return {"results": []}
+    model, tokenizer, sae = state["model"], state["tokenizer"], state["sae"]
+    layer_module = state["layers"][state["current_layer"]]
+    results = []
+    for seed in req.seed_prompts:
+        inputs = tokenizer(seed, return_tensors="pt").to(model.device)
+        with _hook_stack(layer_module, sae, req.steering):
+            with torch.no_grad():
+                out = model.generate(
+                    **inputs,
+                    max_new_tokens=req.max_new_tokens,
+                    do_sample=False,
+                    pad_token_id=tokenizer.eos_token_id,
+                )
+            text = tokenizer.decode(out[0], skip_special_tokens=True)
+        results.append({"seed": seed, "text": text})
+    return {"results": results}
+def _extract_per_token_probs(gen_out, prompt_len, tokenizer, top_k):
+    """Build per-step probabilities + top-K candidate strings."""
+    new_ids = gen_out.sequences[0][prompt_len:].tolist()
+    if not new_ids:
+        return []
+    rows = []
+    for step, score_t in enumerate(gen_out.scores):
+        probs = torch.softmax(score_t[0].float(), dim=-1)
+        chosen_id = new_ids[step]
+        chosen_prob = float(probs[chosen_id].item())
+        top_vals, top_ids = probs.topk(min(top_k, probs.shape[0]))
+        top_ids_list = top_ids.tolist()
+        # Decode one batch (chosen + topK) to limit tokenizer overhead
+        decoded_chosen = tokenizer.decode([chosen_id], skip_special_tokens=False)
+        decoded_top = tokenizer.batch_decode([[t] for t in top_ids_list], skip_special_tokens=False)
+        topk = []
+        for tid, tv, ts in zip(top_ids_list, top_vals.tolist(), decoded_top):
+            topk.append({"tok": ts, "prob": float(tv), "is_chosen": tid == chosen_id})
+        # If the chosen token wasn't in top-K, append it explicitly
+        if chosen_id not in top_ids_list:
+            topk.append({"tok": decoded_chosen, "prob": chosen_prob, "is_chosen": True})
+        rows.append({"tok": decoded_chosen, "prob": chosen_prob, "topk": topk})
+    return rows
+@app.post("/generate")
+def generate(req: GenerateRequest):
+    model, tokenizer, sae = state["model"], state["tokenizer"], state["sae"]
+    layer_module = state["layers"][state["current_layer"]]
+    inputs = tokenizer(req.prompt, return_tensors="pt").to(model.device)
+    prompt_len = int(inputs["input_ids"].shape[1])
+    base_acts = {}
+    if req.steering:
+        with torch.no_grad(), _capture_at(layer_module) as bucket:
+            model(**inputs)
+        z_base = sae.encode(bucket["h"][0, -1].unsqueeze(0))[0]
+        for s in req.steering:
+            base_acts[s.id] = float(z_base[s.id].item())
+    gen_kwargs = dict(
+        **inputs,
+        max_new_tokens=req.max_new_tokens,
+        do_sample=False,
+        pad_token_id=tokenizer.eos_token_id,
+    )
+    if req.return_probs:
+        gen_kwargs["return_dict_in_generate"] = True
+        gen_kwargs["output_scores"] = True
+    with _hook_stack(layer_module, sae, req.steering, prompt_len=prompt_len):
+        with torch.no_grad():
+            out = model.generate(**gen_kwargs)
+        seq = out.sequences[0] if req.return_probs else out[0]
+        text = tokenizer.decode(seq, skip_special_tokens=True)
+        per_token_probs = (_extract_per_token_probs(out, prompt_len, tokenizer, req.topk_display)
+                           if req.return_probs else None)
+        steered_acts = {}
+        if req.steering:
+            with torch.no_grad(), _capture_at(layer_module) as bucket:
+                model(**inputs)
+            z_steered = sae.encode(bucket["h"][0, -1].unsqueeze(0))[0]
+            for s in req.steering:
+                steered_acts[s.id] = float(z_steered[s.id].item())
+    verifier = [
+        {"id": s.id, "alpha": s.alpha,
+         "positions": s.positions,
+         "output_only": s.output_only,
+         "base": base_acts.get(s.id, 0.0),
+         "steered": steered_acts.get(s.id, 0.0)}
+        for s in req.steering
+    ]
+    resp = {"text": text, "verifier": verifier, "prompt_len": prompt_len}
+    if per_token_probs is not None:
+        resp["tokens"] = per_token_probs
+    return resp
+if __name__ == "__main__":
+    import uvicorn
+    port = int(os.environ.get("PORT", 7860))
+    uvicorn.run(app, host="0.0.0.0", port=port)