arxiv:2605.15138

Forgetting That Sticks: Quantization-Permanent Unlearning via Circuit Attribution

Published on May 14

· Submitted by

Pratinav Seth on May 18

Lexsi Labs

Upvote

Authors:

Abstract

Quantization reverses machine unlearning effects, revealing a fundamental sparsity-permanence tradeoff where parameter updates fall below quantization bin widths, leading to the development of MANSU to achieve both forgetting and retention preservation under compression.

AI-generated summary

Standard unlearning evaluations measure behavioral suppression in full precision, immediately after training, despite every deployed language model being quantized first. Recent work has shown that 4-bit post-training quantization can reverse machine unlearning; we show this is not a tuning artefact but a systematic dual failure: gradient-based methods that achieve meaningful forgetting lose it under compression, while methods that survive quantization barely change the model. Both failures trace to the same root cause: across all baselines, per-parameter updates lie 47-828x below the NF4 quantization bin width; updates diffused across billions of parameters cannot clear quantization bin boundaries, a consequence we formalize as a sparsity-permanence tradeoff. We present MANSU (Mechanistic-Aligned Null-Space Unlearning), which resolves both modes by combining causal circuit attribution to isolate the minimal forget-set subgraph, circuit-restricted null-space projection with a diagonal-Fisher retain bound, and a per-parameter magnitude floor guaranteeing quantization survival by construction. We additionally introduce Circuit Attribution Divergence (CAD), a mechanistic verification metric distinguishing structural erasure from behavioral suppression, a distinction existing metrics cannot make. Across multiple model families and hazard benchmarks, MANSU is the first method to jointly satisfy all four properties with margin on each (meaningful forgetting, retain preservation, non-positive PTQ gap, and structural erasure), while gradient-based baselines recover up to +0.05 accuracy under compression.

View arXiv page View PDF Add to collection

Community

pratinavsetharya

Paper submitter 1 day ago

The paper argues that current machine unlearning methods suffer from a systematic "quantization reversal," where the tiny weight updates used to suppress knowledge are rounded away during standard 4-bit deployment, effectively resurrecting hazardous information. To address this, the authors propose MANSU, a method that uses circuit attribution to isolate specific knowledge subgraphs and enforces a "magnitude floor" on weight updates to ensure they are large enough to clear quantization bin boundaries. While the authors demonstrate that this approach maintains forgetting across 4-bit formats and introduces a new metric for structural erasure, their methodology rests on the potentially reductive assumption that hazardous knowledge is localized into discrete, identifiable circuits rather than being distributed polysemantically across the model. By forcing large, discrete jumps in weight space to ensure "permanence," the method risks inducing structural damage that may not be captured by standard accuracy benchmarks, and it remains unclear if these findings generalize beyond the specific NormalFloat 4-bit scheme to other popular compression techniques like GPTQ or AWQ.

librarian-bot

about 20 hours ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2605.15138

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.15138 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.15138 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.15138 in a Space README.md to link it from this page.