obliteratus

Running on Zero

App Files Files Community

obliteratus / PIPELINE_EFFICIENCY_AUDIT.md

pliny-the-prompter

Upload 130 files

ae16715 verified about 2 months ago

preview code

raw

history blame contribute delete

9.38 kB

A newer version of the Gradio SDK is available: 6.13.0

Upgrade

OBLITERATUS Pipeline Efficiency Audit

Date: 2026-03-03 Scope: All obliteration methods in abliterate.py (5,076 lines), bayesian_optimizer.py, informed_pipeline.py, and 4 ablation strategies.

Executive Summary

The 6-stage pipeline (SUMMON → PROBE → DISTILL → EXCISE → VERIFY → REBIRTH) is architecturally sound with good separation of concerns. Memory hygiene between stages is correct. The rank-1 projection math is efficient. Quantization handling is robust.

8 concrete efficiency issues found. Estimated cumulative impact: ~40-60% wall-clock reduction on typical runs (8B model, advanced/surgical methods). Ordered by ROI (ease × impact).

HIGH PRIORITY (Fix This Week)

1. PROBE runs 1,536 prompts with zero batching

Location: abliterate.py:1074-1088 Impact: Largest single wall-clock bottleneck (~77s on 8B model, reducible to ~10s)

The activation collection loop processes each prompt individually with a full forward pass + GC cycle between each one. With 512 harmful + 512 harmless + 512 jailbreak prompts = 1,536 serial forward passes.

The _free_gpu_memory() call at line 1086 is inside the per-prompt loop, adding ~20ms × 1,536 = 30s of pure garbage collection overhead.

# CURRENT (serial)
for i, prompt in enumerate(prompts):
    inputs = tokenizer(prompt, return_tensors="pt", ...)
    model(**inputs)
    del inputs
    self._free_gpu_memory()  # <-- 30s wasted

Fix: Batch prompts (batch_size=8-16). Hooks already handle batch dimension correctly via hidden[:, -1, :]. Move _free_gpu_memory() to run every N batches, not every prompt.

Speedup: ~7-8x on PROBE stage.

2. VERIFY generates 30 completions sequentially — no batching

Location: abliterate.py:4622-4670 Impact: Second-largest wall-clock cost (~57s on 8B model, reducible to ~15s)

Each of the 30 refusal-test prompts gets an independent model.generate(max_new_tokens=128) call. At ~15ms/token on an 8B model, that's 30 × 128 × 15ms ≈ 57s.

Fix: Batch the generation calls (batch_size=4-8). model.generate() supports batched inputs natively. The tokenizer already handles padding.

Speedup: ~4x on VERIFY stage.

3. SAE training is forced to CPU with no early stopping

Location: abliterate.py:1579-1583 Impact: Moderate — adds ~20-40s per run when SAE features are enabled (surgical, nuclear methods)

SAE training runs 30 fixed epochs per strong layer on CPU. With 15-20 strong layers, that's 450-600 CPU training epochs. No convergence check, no early stopping.

The device="cpu" is overly conservative — the memory-aware cap at line 1570-1578 already validates GPU headroom, and a typical SAE encoder (expansion=2, hidden_dim=4096) is only ~128MB.

Fix:

Add early stopping when reconstruction loss plateaus (< 0.1% improvement over 3 epochs)
Use GPU when free_mb > sae_mem_mb + 1024 (1GB headroom)
Reduce default epochs from 30 to 15 with convergence guard

MEDIUM PRIORITY (Fix This Sprint)

4. `_distill_inner()` is a degraded copy of `_distill()` — drops half the SOTA techniques

Location: abliterate.py:2958-3055 vs 1102-1750 Impact: Quality regression on refinement passes 2+, not pure compute waste

The iterative refinement path calls _distill_inner() which is a simplified ~100-line copy that skips: Wasserstein-optimal extraction, layer-adaptive strength, float layer interpolation, SAE features, EGA, CoT-aware orthogonalization, and RDO refinement.

This means "true iterative refinement" actually produces worse directions on later passes because it drops the analysis-guided enhancements.

Fix: Extract shared SVD/direction logic into _extract_directions(full_features=True/False) and call from both paths. At minimum, keep whitened SVD and jailbreak-contrastive blending in the inner path.

5. Bayesian optimizer clones ALL weight tensors — ~7GB memory overhead

Location: bayesian_optimizer.py:300-341 Impact: Memory pressure on GPU-constrained setups; 50× full-restore cycles

The optimizer saves a complete clone of every weight tensor across all strong layers. For a 7B model with 32 layers, that's ~7GB of clones sitting in memory during all 50 trials.

After each trial, _restore_all() copies all clones back — 50 trials × full-model memcpy.

Fix (easy): Only clone weights in _strong_layers (already partially done, but named_parameters() crawl still catches everything). Drop the seen_data_ptrs set once the loop is tightened.

Fix (better): Store the projection delta Δ = scale * d @ (d^T @ W) per layer instead of cloning the full weight. Rollback = W += Δ. This reduces storage from O(hidden_dim²) to O(hidden_dim) per direction per layer.

6. Norm computation in `_project_out_advanced()` traverses the full matrix twice

Location: abliterate.py:3477-3486 Impact: ~4,800 unnecessary full-matrix norm computations per run (8-direction surgical)

When norm_preserve=True, the code computes W.norm() before projection and W.norm() after projection. Each norm traverses the full weight matrix (16M elements for 4096×4096).

With 8 directions × 30 layers × 10 weight matrices = 2,400 projections → 4,800 norm calls → 77 billion unnecessary FLOPs.

Fix: After rank-1 update W' = W - scale * d @ (d^T @ W), the new norm satisfies: ||W'||² = ||W||² - 2·scale·||d^T @ W||² + scale²·||d^T @ W||²·||d||²

Since ||d|| = 1: ||W'||² = ||W||² - scale·(2 - scale)·||coeff||²

This replaces a 16M-element norm with a single coeff.pow(2).sum() call (~4K FLOPs).

LOW PRIORITY (Backlog)

7. Gram-Schmidt appears 3 times as O(n²) nested loops

Location: abliterate.py:1168-1173, 1361-1367, 3038-3044 Impact: Minimal compute but code quality issue

Three separate implementations of the same Gram-Schmidt orthogonalization with nested Python loops. With n_directions=8, it's 28 dot products per call — trivial compute but (a) DRY violation, (b) numerically inferior to torch.linalg.qr().

Fix: Extract to _orthogonalize_subspace(sub: Tensor) -> Tensor using QR decomposition. Single call site, single test, better numerics.

8. Pre-EXCISE baseline KL capture re-forward-passes 100 prompts already seen in PROBE

Location: abliterate.py:2313-2366 Impact: ~700ms wasted (minor)

_capture_baseline_kl_logits() runs 100 harmless prompts through the model to capture pre-EXCISE logits. But PROBE already ran those same prompts and captured hidden states at every layer. The logits could be computed as lm_head(last_hidden_state) — a single matmul.

Fix: After PROBE, compute baseline_logits = model.lm_head(harmful_means[last_layer]) on the cached activations. Skip the 100-prompt forward pass entirely.

What's Done Well

Area	Assessment
Stage-boundary memory cleanup	Correct — `_free_gpu_memory()` + explicit dict clearing between stages
Rank-1 projection math	Efficient — `W @ d` then `d.T * coeff` instead of materializing `I - dd^T`
Quantization dequant/requant	Robust — handles bitsandbytes NF4, GPTQ, AWQ; fails loudly on unsupported formats
Incremental expert mean	Smart — Welford running mean in `_transplant_expert_weights()` avoids stacking all expert weights
Router stabilization	Defensive — `_stabilize_router_weights()` after MoE projection prevents CUDA crashes
Large model mode	Pragmatic — caps directions, SAE features, refinement passes for 120B+ models
Event emission	Clean — `_emit()` / `_on_stage()` / `_on_log()` callbacks for UI integration without coupling

Method Efficiency Comparison

Method	PROBE Cost	DISTILL Cost	EXCISE Cost	VERIFY Cost	Primary Bottleneck
basic	1x (1,024 prompts)	1x (diff-in-means)	1x (~10 projections)	1x	PROBE
advanced	2x (re-probe on pass 2)	2x (re-distill)	2x (2 passes)	1x	PROBE × 2
aggressive	3x (re-probe on passes 2,3)	3x (re-distill)	3x (3 passes, 8 dirs)	1x	PROBE × 3
surgical	1.5x (+jailbreak prompts)	2x (SAE training)	2x (head surgery + EGA)	1x	SAE on CPU
optimized	1.5x (+jailbreak)	1x	50x (Bayesian trials)	1x	Bayesian optimizer
inverted	1.5x (+jailbreak)	1x	2x (reflection math)	1x	PROBE
nuclear	1.5x (+jailbreak)	2x (SAE)	3x (all techniques)	1x	SAE + PROBE
informed	1x	1.5x (analysis modules)	1x-3x (dynamic)	1.5x (Ouroboros check)	Analysis modules

Prioritized Action Plan

Batch PROBE forward passes — immediate 7-8x speedup on largest bottleneck
Batch VERIFY generation — immediate 4x speedup on second bottleneck
Add SAE early stopping + GPU path — 2-3x speedup on SAE-enabled methods
Unify _distill / _distill_inner — quality fix, prevents direction degradation
Optimize Bayesian rollback storage — memory fix for GPU-constrained users
Analytical norm computation — eliminates 77B unnecessary FLOPs
DRY Gram-Schmidt — code quality
Cache KL baseline from PROBE — minor speedup