Papers
arxiv:2601.04278

From Domains to Instances: Dual-Granularity Data Synthesis for LLM Unlearning

Published on Apr 20
Authors:
,
,
,
,
,
,
,
,

Abstract

Machine unlearning for LLMs requires precise forgetting scope representation, which is addressed through a framework that generates high-quality forget sets using internal model guidance rather than external generators.

AI-generated summary

Although machine unlearning is essential for removing private, harmful, or copyrighted content from LLMs, current benchmarks often fail to faithfully represent the true ``forgetting scope'' learned by the model. We formalize two distinct unlearning granularities, domain-level and instance-level, and propose \BiForget, an automated framework for synthesizing high-quality forget sets. Unlike prior work relying on external generators, \BiForget exploits the target model per se to elicit data that matches its internal knowledge distribution through seed-guided and adversarial prompting. Our experiments across diverse benchmarks show that it achieves a superior balance of relevance, diversity, and efficiency. Quantitatively, in the Harry Potter domain, it improves relevance by {sim}20 and diversity by {sim}0.05 while halving the total data size compared to SOTAs. Ultimately, it facilitates more robust forgetting and better utility preservation, providing a more rigorous foundation for evaluating LLM unlearning.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2601.04278 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2601.04278 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2601.04278 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.