Papers
arxiv:2510.21322

Leverage Unlearning to Sanitize LLMs

Published on Oct 24, 2025
Authors:
,

Abstract

SANI is an unlearning method that sanitizes language models by erasing sensitive information through neuron reset and repairing memorization through targeted fine-tuning, effectively reducing regurgitation of confidential data while preserving model utility.

AI-generated summary

Pre-trained large language models (LLMs) are becoming useful for various tasks. To improve their performance on certain tasks, it is necessary to fine-tune them on specific data corpora (e.g., medical reports, business data). These specialized data corpora may contain sensitive data (e.g., personal or confidential data) that will be memorized by the model and likely to be regurgitated during its subsequent use. This memorization of sensitive information by the model poses a significant privacy or confidentiality issue. To remove this memorization and sanitize the model without requiring costly additional fine-tuning on a secured data corpus, we propose SANI. SANI is an unlearning approach to sanitize language models. It relies on both an erasure and repair phases that 1) reset certain neurons in the last layers of the model to disrupt the memorization of fine-grained information, and then 2) fine-tune the model while avoiding memorizing sensitive information. We comprehensively evaluate SANI to sanitize both a model fine-tuned and specialized with medical data by removing directly and indirectly identifiers from the memorization of the model, and a standard pre-trained model by removing specific terms defined as confidential information from the model. Results show that with only few additional epochs of unlearning, the model is sanitized and the number of regurgitations is drastically reduced. This approach can be particularly useful for hospitals or other industries that have already spent significant resources training models on large datasets and wish to sanitize them before sharing.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2510.21322
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2510.21322 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2510.21322 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2510.21322 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.