XAI GAN on CelebA

Overview

A DCGAN trained on CelebA faces at $64 \times 64$ resolution, featuring editable latent directions for semantic control over attributes such as smiling, eyeglasses, age, gender, and blond hair. We also evaluated whether the model is overfitted to the training data and, consequently, susceptible to Membership Inference Attacks (MIA).

Dataset

We use the CelebA dataset, which contains 162,770 training images and 19,867 validation images. Each face is resized to $64 \times 64$, center-cropped, and normalized to the range $[-1, 1]$, which matches the generator’s tanh output.

CelebA also provides 40 binary facial attributes, but only the following are used:

Smiling
Eyeglasses
Young
Male
Blond_Hair

Model

The generative model is a standard DCGAN trained from scratch. The generator maps a 128-dimensional latent vector into a $64 \times 64$ RGB face, and the discriminator learns to separate real CelebA images from synthetic ones.

The model was trained with Adam and BCE-with-logits loss. The final generator has about 3.8M parameters, and the discriminator has about 2.8M parameters.

Generator

The generator uses a stack of transposed convolutions with batch normalization and ReLU activations. The output layer uses tanh, which makes the image range compatible with the dataset normalization:

$z \in \mathbb{R}^{128} \rightarrow G(z) \in [-1, 1]^{3 \times 64 \times 64}$

This baseline is intentionally lightweight for ease of training and evaluation. However, the generation quality is merely acceptable.

Discriminator

The discriminator is a convolutional classifier with downsampling blocks and LeakyReLU activations.

The ROC-AUC is about 0.85.

Its role is to distinguish real images from fake ones, it is also reused as a score function for membership inference.

Attribute classifier

The latent-editing stage depends on a separate classifier that learns to recognize facial attributes directly from images. The model is trained on the selected CelebA labels and outputs one logit per attribute. The classifier reaches strong validation accuracy on all selected attributes:

Smiling: 0.9208
Eyeglasses: 0.9836
Young: 0.8534
Male: 0.9739
Blond_Hair: 0.9456

Latent directions

Once the generator and classifier are trained, random latent vectors are sampled and scored by the attribute classifier. This creates a paired set of latent codes and attribute scores.

For each selected attribute, a least-squares fit is used to estimate a direction in latent space:

$\min_{w} \|Zw - s\|_2^2$

After normalization, each vector becomes an editable axis.

Latent editing

To visualize an edit, a single latent vector is sampled first and then shifted along one learned direction with different step sizes:

$z' = z + \alpha d$

where $d$ is the learned direction and $\alpha$ controls the edit strength.

Membership inference attack

We performed a score-based membership inference attack based on discriminator logits. If the discriminator assigns systematically higher scores to training examples than to unseen examples, that can indicate memorization.

The attack is evaluated with a calibration split and a held-out test split. The final results are close to random guessing:

Calibration AUC: 0.5218
Test AUC: 0.5303
Test accuracy: 0.5205
Test balanced accuracy: 0.5205

The score distributions for members and non-members almost overlap.

This means the discriminator does not provide a strong membership signal in this setup, and no clear evidence of memorization is detected by this attack.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

pymlex
/

celeba-gan-xai