Papers
arxiv:2603.14185

Relationship-Aware Safety Unlearning for Multimodal LLMs

Published on Mar 25
Authors:
,
,
,
,

Abstract

Relationship-aware safety unlearning framework targets unsafe object-relation-object tuples using parameter-efficient edits to maintain benign uses while suppressing harmful combinations.

AI-generated summary

Generative multimodal models can exhibit safety failures that are inherently relational: two benign concepts can become unsafe when linked by a specific action or relation (e.g., child-drinking-wine). Existing unlearning and concept-erasure approaches often target isolated concepts or image-text pairs, which can cause collateral damage to benign uses of the same objects and relations. We propose relationship-aware safety unlearning: a framework that explicitly represents unsafe object-relation-object (O-R-O) tuples and applies targeted parameter-efficient edits (LoRA) to suppress unsafe tuples while preserving object marginals and safe neighboring relations. We include CLIP-based experiments and robustness evaluation under paraphrase, contextual, and out-of-distribution image attacks.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2603.14185
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2603.14185 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2603.14185 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2603.14185 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.