π§ Face Generation GPT-2 (VQ-VAE Latent Space)
This model is a GPT-2 fine-tuned to act as an autoregressive generator for human faces. Instead of predicting text tokens, it predicts visual tokens extracted from the CelebA dataset via a VQ-VAE.
π Project Overview
The core idea is to treat image generation as a language modeling problem.
- Compression: A VQ-VAE encodes $128 \times 128$ face images into $16 \times 16$ discrete tokens (total 256 tokens).
- Generative Modeling: This GPT-2 model is trained to predict the sequence of these visual tokens based on facial attributes.
- Reconstruction: The predicted tokens are passed back through the VQ-VAE Decoder to synthesize a new face.
π οΈ Intended Uses & Limitations
How to Use
This model is designed to work in conjunction with the project's custom VQ-VAE decoder. It accepts a sequence of facial attribute tokens and generates the corresponding latent representation of a face.
Limitations
- Resolution: Fixed at $128 \times 128$ reconstruction.
- Realism: While structurally sound, it may lack the high-frequency details found in modern Diffusion models.
- Dependencies: Requires the specific VQ-VAE Codebook and Decoder available in the GitHub Repository.
π Training Data
- Dataset: CelebA (CelebFaces Attributes Dataset).
- Preprocessing: Images resized to $128 \times 128$, normalized, and tokenized using a Vector Quantized Variational Autoencoder.
- Input Format:
<START_FACE> <ATTRIBUTES> <START_GENERATION> <256_TOKENS> <END_GENERATION>.
βοΈ Training Procedure
Hyperparameters
- Learning Rate: 2e-4
- Batch Size: 40 (Effective Batch Size: 80 with 2 Gradient Accumulation steps)
- Precision: Mixed Precision (Native AMP)
- Epochs: 5
- Optimizer: AdamW (Fused)
Training Results
The model shows steady convergence in cross-entropy loss over 5 epochs:
| Epoch | Training Loss | Validation Loss |
|---|---|---|
| 1.0 | 3.7055 | 3.7670 |
| 5.0 | 3.6473 | 3.7547 |
π Resources
- Live Demo: Streamlit Web App
- Source Code: GitHub Repository
π Framework Versions
- Transformers 5.0.0
- Pytorch 2.10.0+cu128
- Datasets 4.8.3
- Tokenizers 0.22.2
- Downloads last month
- 1,860
Model tree for yosef-samy019/gpt-face-celeb-generator
Base model
openai-community/gpt2