XAI GAN on CelebA
Overview
A DCGAN trained on CelebA faces at $64 \times 64$ resolution, featuring editable latent directions for semantic control over attributes such as smiling, eyeglasses, age, gender, and blond hair. We also evaluated whether the model is overfitted to the training data and, consequently, susceptible to Membership Inference Attacks (MIA).
Dataset
We use the CelebA dataset, which contains 162,770 training images and 19,867 validation images. Each face is resized to $64 \times 64$, center-cropped, and normalized to the range $[-1, 1]$, which matches the generator’s tanh output.
CelebA also provides 40 binary facial attributes, but only the following are used:
- Smiling
- Eyeglasses
- Young
- Male
- Blond_Hair
Model
The generative model is a standard DCGAN trained from scratch. The generator maps a 128-dimensional latent vector into a $64 \times 64$ RGB face, and the discriminator learns to separate real CelebA images from synthetic ones.
The model was trained with Adam and BCE-with-logits loss. The final generator has about 3.8M parameters, and the discriminator has about 2.8M parameters.
Generator
The generator uses a stack of transposed convolutions with batch normalization and ReLU activations. The output layer uses tanh, which makes the image range compatible with the dataset normalization:
This baseline is intentionally lightweight for ease of training and evaluation. However, the generation quality is merely acceptable.
Discriminator
The discriminator is a convolutional classifier with downsampling blocks and LeakyReLU activations.
The ROC-AUC is about 0.85.
Its role is to distinguish real images from fake ones, it is also reused as a score function for membership inference.
Attribute classifier
The latent-editing stage depends on a separate classifier that learns to recognize facial attributes directly from images. The model is trained on the selected CelebA labels and outputs one logit per attribute. The classifier reaches strong validation accuracy on all selected attributes:
- Smiling: 0.9208
- Eyeglasses: 0.9836
- Young: 0.8534
- Male: 0.9739
- Blond_Hair: 0.9456
Latent directions
Once the generator and classifier are trained, random latent vectors are sampled and scored by the attribute classifier. This creates a paired set of latent codes and attribute scores.
For each selected attribute, a least-squares fit is used to estimate a direction in latent space:
After normalization, each vector becomes an editable axis.
Latent editing
To visualize an edit, a single latent vector is sampled first and then shifted along one learned direction with different step sizes:
where $d$ is the learned direction and $\alpha$ controls the edit strength.
Membership inference attack
We performed a score-based membership inference attack based on discriminator logits. If the discriminator assigns systematically higher scores to training examples than to unseen examples, that can indicate memorization.
The attack is evaluated with a calibration split and a held-out test split. The final results are close to random guessing:
- Calibration AUC: 0.5218
- Test AUC: 0.5303
- Test accuracy: 0.5205
- Test balanced accuracy: 0.5205
The score distributions for members and non-members almost overlap.
This means the discriminator does not provide a strong membership signal in this setup, and no clear evidence of memorization is detected by this attack.







