Spaces:
Running on Zero
A newer version of the Gradio SDK is available: 6.13.0
OBLITERATUS Pipeline Efficiency Audit
Date: 2026-03-03
Scope: All obliteration methods in abliterate.py (5,076 lines), bayesian_optimizer.py, informed_pipeline.py, and 4 ablation strategies.
Executive Summary
The 6-stage pipeline (SUMMON → PROBE → DISTILL → EXCISE → VERIFY → REBIRTH) is architecturally sound with good separation of concerns. Memory hygiene between stages is correct. The rank-1 projection math is efficient. Quantization handling is robust.
8 concrete efficiency issues found. Estimated cumulative impact: ~40-60% wall-clock reduction on typical runs (8B model, advanced/surgical methods). Ordered by ROI (ease × impact).
HIGH PRIORITY (Fix This Week)
1. PROBE runs 1,536 prompts with zero batching
Location: abliterate.py:1074-1088
Impact: Largest single wall-clock bottleneck (~77s on 8B model, reducible to ~10s)
The activation collection loop processes each prompt individually with a full forward pass + GC cycle between each one. With 512 harmful + 512 harmless + 512 jailbreak prompts = 1,536 serial forward passes.
The _free_gpu_memory() call at line 1086 is inside the per-prompt loop, adding ~20ms × 1,536 = 30s of pure garbage collection overhead.
# CURRENT (serial)
for i, prompt in enumerate(prompts):
inputs = tokenizer(prompt, return_tensors="pt", ...)
model(**inputs)
del inputs
self._free_gpu_memory() # <-- 30s wasted
Fix: Batch prompts (batch_size=8-16). Hooks already handle batch dimension correctly via hidden[:, -1, :]. Move _free_gpu_memory() to run every N batches, not every prompt.
Speedup: ~7-8x on PROBE stage.
2. VERIFY generates 30 completions sequentially — no batching
Location: abliterate.py:4622-4670
Impact: Second-largest wall-clock cost (~57s on 8B model, reducible to ~15s)
Each of the 30 refusal-test prompts gets an independent model.generate(max_new_tokens=128) call. At ~15ms/token on an 8B model, that's 30 × 128 × 15ms ≈ 57s.
Fix: Batch the generation calls (batch_size=4-8). model.generate() supports batched inputs natively. The tokenizer already handles padding.
Speedup: ~4x on VERIFY stage.
3. SAE training is forced to CPU with no early stopping
Location: abliterate.py:1579-1583
Impact: Moderate — adds ~20-40s per run when SAE features are enabled (surgical, nuclear methods)
SAE training runs 30 fixed epochs per strong layer on CPU. With 15-20 strong layers, that's 450-600 CPU training epochs. No convergence check, no early stopping.
The device="cpu" is overly conservative — the memory-aware cap at line 1570-1578 already validates GPU headroom, and a typical SAE encoder (expansion=2, hidden_dim=4096) is only ~128MB.
Fix:
- Add early stopping when reconstruction loss plateaus (< 0.1% improvement over 3 epochs)
- Use GPU when
free_mb > sae_mem_mb + 1024(1GB headroom) - Reduce default epochs from 30 to 15 with convergence guard
MEDIUM PRIORITY (Fix This Sprint)
4. _distill_inner() is a degraded copy of _distill() — drops half the SOTA techniques
Location: abliterate.py:2958-3055 vs 1102-1750
Impact: Quality regression on refinement passes 2+, not pure compute waste
The iterative refinement path calls _distill_inner() which is a simplified ~100-line copy that skips: Wasserstein-optimal extraction, layer-adaptive strength, float layer interpolation, SAE features, EGA, CoT-aware orthogonalization, and RDO refinement.
This means "true iterative refinement" actually produces worse directions on later passes because it drops the analysis-guided enhancements.
Fix: Extract shared SVD/direction logic into _extract_directions(full_features=True/False) and call from both paths. At minimum, keep whitened SVD and jailbreak-contrastive blending in the inner path.
5. Bayesian optimizer clones ALL weight tensors — ~7GB memory overhead
Location: bayesian_optimizer.py:300-341
Impact: Memory pressure on GPU-constrained setups; 50× full-restore cycles
The optimizer saves a complete clone of every weight tensor across all strong layers. For a 7B model with 32 layers, that's ~7GB of clones sitting in memory during all 50 trials.
After each trial, _restore_all() copies all clones back — 50 trials × full-model memcpy.
Fix (easy): Only clone weights in _strong_layers (already partially done, but named_parameters() crawl still catches everything). Drop the seen_data_ptrs set once the loop is tightened.
Fix (better): Store the projection delta Δ = scale * d @ (d^T @ W) per layer instead of cloning the full weight. Rollback = W += Δ. This reduces storage from O(hidden_dim²) to O(hidden_dim) per direction per layer.
6. Norm computation in _project_out_advanced() traverses the full matrix twice
Location: abliterate.py:3477-3486
Impact: ~4,800 unnecessary full-matrix norm computations per run (8-direction surgical)
When norm_preserve=True, the code computes W.norm() before projection and W.norm() after projection. Each norm traverses the full weight matrix (16M elements for 4096×4096).
With 8 directions × 30 layers × 10 weight matrices = 2,400 projections → 4,800 norm calls → 77 billion unnecessary FLOPs.
Fix: After rank-1 update W' = W - scale * d @ (d^T @ W), the new norm satisfies:
||W'||² = ||W||² - 2·scale·||d^T @ W||² + scale²·||d^T @ W||²·||d||²
Since ||d|| = 1: ||W'||² = ||W||² - scale·(2 - scale)·||coeff||²
This replaces a 16M-element norm with a single coeff.pow(2).sum() call (~4K FLOPs).
LOW PRIORITY (Backlog)
7. Gram-Schmidt appears 3 times as O(n²) nested loops
Location: abliterate.py:1168-1173, 1361-1367, 3038-3044
Impact: Minimal compute but code quality issue
Three separate implementations of the same Gram-Schmidt orthogonalization with nested Python loops. With n_directions=8, it's 28 dot products per call — trivial compute but (a) DRY violation, (b) numerically inferior to torch.linalg.qr().
Fix: Extract to _orthogonalize_subspace(sub: Tensor) -> Tensor using QR decomposition. Single call site, single test, better numerics.
8. Pre-EXCISE baseline KL capture re-forward-passes 100 prompts already seen in PROBE
Location: abliterate.py:2313-2366
Impact: ~700ms wasted (minor)
_capture_baseline_kl_logits() runs 100 harmless prompts through the model to capture pre-EXCISE logits. But PROBE already ran those same prompts and captured hidden states at every layer. The logits could be computed as lm_head(last_hidden_state) — a single matmul.
Fix: After PROBE, compute baseline_logits = model.lm_head(harmful_means[last_layer]) on the cached activations. Skip the 100-prompt forward pass entirely.
What's Done Well
| Area | Assessment |
|---|---|
| Stage-boundary memory cleanup | Correct — _free_gpu_memory() + explicit dict clearing between stages |
| Rank-1 projection math | Efficient — W @ d then d.T * coeff instead of materializing I - dd^T |
| Quantization dequant/requant | Robust — handles bitsandbytes NF4, GPTQ, AWQ; fails loudly on unsupported formats |
| Incremental expert mean | Smart — Welford running mean in _transplant_expert_weights() avoids stacking all expert weights |
| Router stabilization | Defensive — _stabilize_router_weights() after MoE projection prevents CUDA crashes |
| Large model mode | Pragmatic — caps directions, SAE features, refinement passes for 120B+ models |
| Event emission | Clean — _emit() / _on_stage() / _on_log() callbacks for UI integration without coupling |
Method Efficiency Comparison
| Method | PROBE Cost | DISTILL Cost | EXCISE Cost | VERIFY Cost | Primary Bottleneck |
|---|---|---|---|---|---|
| basic | 1x (1,024 prompts) | 1x (diff-in-means) | 1x (~10 projections) | 1x | PROBE |
| advanced | 2x (re-probe on pass 2) | 2x (re-distill) | 2x (2 passes) | 1x | PROBE × 2 |
| aggressive | 3x (re-probe on passes 2,3) | 3x (re-distill) | 3x (3 passes, 8 dirs) | 1x | PROBE × 3 |
| surgical | 1.5x (+jailbreak prompts) | 2x (SAE training) | 2x (head surgery + EGA) | 1x | SAE on CPU |
| optimized | 1.5x (+jailbreak) | 1x | 50x (Bayesian trials) | 1x | Bayesian optimizer |
| inverted | 1.5x (+jailbreak) | 1x | 2x (reflection math) | 1x | PROBE |
| nuclear | 1.5x (+jailbreak) | 2x (SAE) | 3x (all techniques) | 1x | SAE + PROBE |
| informed | 1x | 1.5x (analysis modules) | 1x-3x (dynamic) | 1.5x (Ouroboros check) | Analysis modules |
Prioritized Action Plan
- Batch PROBE forward passes — immediate 7-8x speedup on largest bottleneck
- Batch VERIFY generation — immediate 4x speedup on second bottleneck
- Add SAE early stopping + GPU path — 2-3x speedup on SAE-enabled methods
- Unify
_distill/_distill_inner— quality fix, prevents direction degradation - Optimize Bayesian rollback storage — memory fix for GPU-constrained users
- Analytical norm computation — eliminates 77B unnecessary FLOPs
- DRY Gram-Schmidt — code quality
- Cache KL baseline from PROBE — minor speedup