new

Get trending papers in your email inbox!

Subscribe

Daily Papers

byAK and the research community

Apr 20

RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models

Large language models (LLMs) inevitably memorize sensitive, copyrighted, and harmful knowledge from the training corpus; therefore, it is crucial to erase this knowledge from the models. Machine unlearning is a promising solution for efficiently removing specific knowledge by post hoc modifying models. In this paper, we propose a Real-World Knowledge Unlearning benchmark (RWKU) for LLM unlearning. RWKU is designed based on the following three key factors: (1) For the task setting, we consider a more practical and challenging unlearning setting, where neither the forget corpus nor the retain corpus is accessible. (2) For the knowledge source, we choose 200 real-world famous people as the unlearning targets and show that such popular knowledge is widely present in various LLMs. (3) For the evaluation framework, we design the forget set and the retain set to evaluate the model's capabilities across various real-world applications. Regarding the forget set, we provide four four membership inference attack (MIA) methods and nine kinds of adversarial attack probes to rigorously test unlearning efficacy. Regarding the retain set, we assess locality and utility in terms of neighbor perturbation, general ability, reasoning ability, truthfulness, factuality, and fluency. We conduct extensive experiments across two unlearning scenarios, two models and six baseline methods and obtain some meaningful findings. We release our benchmark and code publicly at http://rwku-bench.github.io for future work.

  • 9 authors
·
Jun 16, 2024

RealEra: Semantic-level Concept Erasure via Neighbor-Concept Mining

The remarkable development of text-to-image generation models has raised notable security concerns, such as the infringement of portrait rights and the generation of inappropriate content. Concept erasure has been proposed to remove the model's knowledge about protected and inappropriate concepts. Although many methods have tried to balance the efficacy (erasing target concepts) and specificity (retaining irrelevant concepts), they can still generate abundant erasure concepts under the steering of semantically related inputs. In this work, we propose RealEra to address this "concept residue" issue. Specifically, we first introduce the mechanism of neighbor-concept mining, digging out the associated concepts by adding random perturbation into the embedding of erasure concept, thus expanding the erasing range and eliminating the generations even through associated concept inputs. Furthermore, to mitigate the negative impact on the generation of irrelevant concepts caused by the expansion of erasure scope, RealEra preserves the specificity through the beyond-concept regularization. This makes irrelevant concepts maintain their corresponding spatial position, thereby preserving their normal generation performance. We also employ the closed-form solution to optimize weights of U-Net for the cross-attention alignment, as well as the prediction noise alignment with the LoRA module. Extensive experiments on multiple benchmarks demonstrate that RealEra outperforms previous concept erasing methods in terms of superior erasing efficacy, specificity, and generality. More details are available on our project page https://realerasing.github.io/RealEra/ .

  • 8 authors
·
Oct 11, 2024

PerturbDiff: Functional Diffusion for Single-Cell Perturbation Modeling

Building Virtual Cells that can accurately simulate cellular responses to perturbations is a long-standing goal in systems biology. A fundamental challenge is that high-throughput single-cell sequencing is destructive: the same cell cannot be observed both before and after a perturbation. Thus, perturbation prediction requires mapping unpaired control and perturbed populations. Existing models address this by learning maps between distributions, but typically assume a single fixed response distribution when conditioned on observed cellular context (e.g., cell type) and the perturbation type. In reality, responses vary systematically due to unobservable latent factors such as microenvironmental fluctuations and complex batch effects, forming a manifold of possible distributions for the same observed conditions. To account for this variability, we introduce PerturbDiff, which shifts modeling from individual cells to entire distributions. By embedding distributions as points in a Hilbert space, we define a diffusion-based generative process operating directly over probability distributions. This allows PerturbDiff to capture population-level response shifts across hidden factors. Benchmarks on established datasets show that PerturbDiff achieves state-of-the-art performance in single-cell response prediction and generalizes substantially better to unseen perturbations. See our project page (https://katarinayuan.github.io/PerturbDiff-ProjectPage/), where code and data will be made publicly available (https://github.com/DeepGraphLearning/PerturbDiff).

  • 6 authors
·
Feb 22

Perturbation Analysis of Neural Collapse

Training deep neural networks for classification often includes minimizing the training loss beyond the zero training error point. In this phase of training, a "neural collapse" behavior has been observed: the variability of features (outputs of the penultimate layer) of within-class samples decreases and the mean features of different classes approach a certain tight frame structure. Recent works analyze this behavior via idealized unconstrained features models where all the minimizers exhibit exact collapse. However, with practical networks and datasets, the features typically do not reach exact collapse, e.g., because deep layers cannot arbitrarily modify intermediate features that are far from being collapsed. In this paper, we propose a richer model that can capture this phenomenon by forcing the features to stay in the vicinity of a predefined features matrix (e.g., intermediate features). We explore the model in the small vicinity case via perturbation analysis and establish results that cannot be obtained by the previously studied models. For example, we prove reduction in the within-class variability of the optimized features compared to the predefined input features (via analyzing gradient flow on the "central-path" with minimal assumptions), analyze the minimizers in the near-collapse regime, and provide insights on the effect of regularization hyperparameters on the closeness to collapse. We support our theory with experiments in practical deep learning settings.

  • 3 authors
·
Oct 29, 2022

Measuring Primitive Accumulation: An Information-Theoretic Approach to Capitalist Enclosure in PIK2, Indonesia

Large-scale land enclosure for speculative mega-development constitutes a non-equilibrium spatial process whose velocity, topology, and irreversibility remain poorly quantified. We study the Pantai Indah Kapuk 2 (PIK2) coastal mega-development north of Jakarta, Indonesia, using eight years (2017--2024) of Sentinel-2 land-use/land-cover (LULC) data at 10-meter resolution. The landscape is projected onto a Marxian probability simplex partitioning terrestrial pixels into Commons, Agrarian, and Capital fractions. Fisher-Rao (FR) geodesic distances on this simplex identify a transformation pulse of 0.405~rad/yr during 2019--2020, coinciding with major construction activity. Absorbing Markov chain analysis yields expected absorption times into the built environment of 46.0~years for cropland and 38.1~years for tree cover, with a pooled built-area self-retention rate of 96.4%. Percolation analysis reveals that a giant connected component containing 89--95% of all built pixels persists at occupation probabilities p in [0.096, 0.162], far below the random percolation threshold p_c approx 0.593, indicating planned rather than stochastic spatial growth. The box-counting fractal dimension of the urban boundary increases from d_f = 1.316 to 1.397, consistent with increasingly irregular frontier expansion. These results suggest that information-geometric and statistical-mechanical tools can characterize the kinematic and topological signatures of capitalist spatial accumulation with quantitative precision.

Random Spatial Networks: Small Worlds without Clustering, Traveling Waves, and Hop-and-Spread Disease Dynamics

Random network models play a prominent role in modeling, analyzing and understanding complex phenomena on real-life networks. However, a key property of networks is often neglected: many real-world networks exhibit spatial structure, the tendency of a node to select neighbors with a probability depending on physical distance. Here, we introduce a class of random spatial networks (RSNs) which generalizes many existing random network models but adds spatial structure. In these networks, nodes are placed randomly in space and joined in edges with a probability depending on their distance and their individual expected degrees, in a manner that crucially remains analytically tractable. We use this network class to propose a new generalization of small-world networks, where the average shortest path lengths in the graph are small, as in classical Watts-Strogatz small-world networks, but with close spatial proximity of nodes that are neighbors in the network playing the role of large clustering. Small-world effects are demonstrated on these spatial small-world networks without clustering. We are able to derive partial integro-differential equations governing susceptible-infectious-recovered disease spreading through an RSN, and we demonstrate the existence of traveling wave solutions. If the distance kernel governing edge placement decays slower than exponential, the population-scale dynamics are dominated by long-range hops followed by local spread of traveling waves. This provides a theoretical modeling framework for recent observations of how epidemics like Ebola evolve in modern connected societies, with long-range connections seeding new focal points from which the epidemic locally spreads in a wavelike manner.

  • 4 authors
·
Feb 4, 2017

Differentially Private Multivariate Time Series Forecasting of Aggregated Human Mobility With Deep Learning: Input or Gradient Perturbation?

This paper investigates the problem of forecasting multivariate aggregated human mobility while preserving the privacy of the individuals concerned. Differential privacy, a state-of-the-art formal notion, has been used as the privacy guarantee in two different and independent steps when training deep learning models. On one hand, we considered gradient perturbation, which uses the differentially private stochastic gradient descent algorithm to guarantee the privacy of each time series sample in the learning stage. On the other hand, we considered input perturbation, which adds differential privacy guarantees in each sample of the series before applying any learning. We compared four state-of-the-art recurrent neural networks: Long Short-Term Memory, Gated Recurrent Unit, and their Bidirectional architectures, i.e., Bidirectional-LSTM and Bidirectional-GRU. Extensive experiments were conducted with a real-world multivariate mobility dataset, which we published openly along with this paper. As shown in the results, differentially private deep learning models trained under gradient or input perturbation achieve nearly the same performance as non-private deep learning models, with loss in performance varying between 0.57% to 2.8%. The contribution of this paper is significant for those involved in urban planning and decision-making, providing a solution to the human mobility multivariate forecast problem through differentially private deep learning models.

  • 5 authors
·
May 1, 2022

FedSpeed: Larger Local Interval, Less Communication Round, and Higher Generalization Accuracy

Federated learning is an emerging distributed machine learning framework which jointly trains a global model via a large number of local devices with data privacy protections. Its performance suffers from the non-vanishing biases introduced by the local inconsistent optimal and the rugged client-drifts by the local over-fitting. In this paper, we propose a novel and practical method, FedSpeed, to alleviate the negative impacts posed by these problems. Concretely, FedSpeed applies the prox-correction term on the current local updates to efficiently reduce the biases introduced by the prox-term, a necessary regularizer to maintain the strong local consistency. Furthermore, FedSpeed merges the vanilla stochastic gradient with a perturbation computed from an extra gradient ascent step in the neighborhood, thereby alleviating the issue of local over-fitting. Our theoretical analysis indicates that the convergence rate is related to both the communication rounds T and local intervals K with a upper bound small O(1/T) if setting a proper local interval. Moreover, we conduct extensive experiments on the real-world dataset to demonstrate the efficiency of our proposed FedSpeed, which performs significantly faster and achieves the state-of-the-art (SOTA) performance on the general FL experimental settings than several baselines. Our code is available at https://github.com/woodenchild95/FL-Simulator.git.

  • 5 authors
·
Feb 20, 2023

A Novel Bifurcation Method for Observation Perturbation Attacks on Reinforcement Learning Agents: Load Altering Attacks on a Cyber Physical Power System

Components of cyber physical systems, which affect real-world processes, are often exposed to the internet. Replacing conventional control methods with Deep Reinforcement Learning (DRL) in energy systems is an active area of research, as these systems become increasingly complex with the advent of renewable energy sources and the desire to improve their efficiency. Artificial Neural Networks (ANN) are vulnerable to specific perturbations of their inputs or features, called adversarial examples. These perturbations are difficult to detect when properly regularized, but have significant effects on the ANN's output. Because DRL uses ANN to map optimal actions to observations, they are similarly vulnerable to adversarial examples. This work proposes a novel attack technique for continuous control using Group Difference Logits loss with a bifurcation layer. By combining aspects of targeted and untargeted attacks, the attack significantly increases the impact compared to an untargeted attack, with drastically smaller distortions than an optimally targeted attack. We demonstrate the impacts of powerful gradient-based attacks in a realistic smart energy environment, show how the impacts change with different DRL agents and training procedures, and use statistical and time-series analysis to evaluate attacks' stealth. The results show that adversarial attacks can have significant impacts on DRL controllers, and constraining an attack's perturbations makes it difficult to detect. However, certain DRL architectures are far more robust, and robust training methods can further reduce the impact.

  • 3 authors
·
Jul 6, 2024

Generalizable Data-free Objective for Crafting Universal Adversarial Perturbations

Machine learning models are susceptible to adversarial perturbations: small changes to input that can cause large changes in output. It is also demonstrated that there exist input-agnostic perturbations, called universal adversarial perturbations, which can change the inference of target model on most of the data samples. However, existing methods to craft universal perturbations are (i) task specific, (ii) require samples from the training data distribution, and (iii) perform complex optimizations. Additionally, because of the data dependence, fooling ability of the crafted perturbations is proportional to the available training data. In this paper, we present a novel, generalizable and data-free approaches for crafting universal adversarial perturbations. Independent of the underlying task, our objective achieves fooling via corrupting the extracted features at multiple layers. Therefore, the proposed objective is generalizable to craft image-agnostic perturbations across multiple vision tasks such as object recognition, semantic segmentation, and depth estimation. In the practical setting of black-box attack scenario (when the attacker does not have access to the target model and it's training data), we show that our objective outperforms the data dependent objectives to fool the learned models. Further, via exploiting simple priors related to the data distribution, our objective remarkably boosts the fooling ability of the crafted perturbations. Significant fooling rates achieved by our objective emphasize that the current deep learning models are now at an increased risk, since our objective generalizes across multiple tasks without the requirement of training data for crafting the perturbations. To encourage reproducible research, we have released the codes for our proposed algorithm.

  • 3 authors
·
Jan 24, 2018

A Comprehensive Perturbative Formalism for Phase Mixing in Perturbed Disks. II. Phase Spirals in an Inhomogeneous Disk Galaxy with a Non-responsive Dark Matter Halo

We develop a linear perturbative formalism to compute the response of an inhomogeneous stellar disk embedded in a non-responsive dark matter halo to perturbations like bars, spiral arms and satellite galaxy encounters. Without self-gravity to reinforce it, the response of a Fourier mode phase mixes away due to an intrinsic spread in the vertical (Omega_z), radial (Omega_r) and azimuthal (Omega_phi) frequencies, giving rise to local phase-space spirals. Collisional diffusion due to scattering of stars by structures like giant molecular clouds causes super-exponential damping of the phase-spiral amplitude. The z-v_z phase-spiral is 1-armed (2-armed) for vertically anti-symmetric (symmetric) bending (breathing) modes. Only transient perturbations with timescales (tau_{P}) comparable to the vertical oscillation period (tau_z sim 1/Omega_z) trigger z-v_z phase-spirals. Each (n,l,m) mode of the response to impulsive (tau_{P}<tau=1/(nOmega_z+lOmega_r+mOmega_phi)) perturbations is power law (sim tau_{P}/tau) suppressed, but that to adiabatic (tau_{P}>tau) perturbations is exponentially weak (sim left[-left(tau_{mathrm{P}/tauright)^alpharight]}) except resonant (tauto infty) modes. Slower (tau_{P}>tau_z) perturbations, e.g., distant encounters with satellite galaxies, induce stronger bending modes. If the Gaia phase-spiral was triggered by a satellite, Sagittarius is the leading contender as it dominates the Solar neighborhood response of the Milky Way disk to satellite encounters. However, survival against collisional damping necessitates that the impact occurred within sim 0.6-0.7 Gyr ago. We discuss how the detailed galactic potential dictates the phase-spiral shape: phase mixing occurs slower and phase-spirals are less wound in the outer disk and in presence of an ambient halo.

  • 3 authors
·
Feb 28, 2023

Robust Latent Matters: Boosting Image Generation with Sampling Error

Recent image generation schemes typically capture image distribution in a pre-constructed latent space relying on a frozen image tokenizer. Though the performance of tokenizer plays an essential role to the successful generation, its current evaluation metrics (e.g. rFID) fail to precisely assess the tokenizer and correlate its performance to the generation quality (e.g. gFID). In this paper, we comprehensively analyze the reason for the discrepancy of reconstruction and generation qualities in a discrete latent space, and, from which, we propose a novel plug-and-play tokenizer training scheme to facilitate latent space construction. Specifically, a latent perturbation approach is proposed to simulate sampling noises, i.e., the unexpected tokens sampled, from the generative process. With the latent perturbation, we further propose (1) a novel tokenizer evaluation metric, i.e., pFID, which successfully correlates the tokenizer performance to generation quality and (2) a plug-and-play tokenizer training scheme, which significantly enhances the robustness of tokenizer thus boosting the generation quality and convergence speed. Extensive benchmarking are conducted with 11 advanced discrete image tokenizers with 2 autoregressive generation models to validate our approach. The tokenizer trained with our proposed latent perturbation achieve a notable 1.60 gFID with classifier-free guidance (CFG) and 3.45 gFID without CFG with a sim400M generator. Code: https://github.com/lxa9867/ImageFolder.

  • 10 authors
·
Mar 11, 2025