🧠 Face Generation GPT-2 (VQ-VAE Latent Space)

This model is a GPT-2 fine-tuned to act as an autoregressive generator for human faces. Instead of predicting text tokens, it predicts visual tokens extracted from the CelebA dataset via a VQ-VAE.

🚀 Project Overview

The core idea is to treat image generation as a language modeling problem.

Compression: A VQ-VAE encodes $128 \times 128$ face images into $16 \times 16$ discrete tokens (total 256 tokens).
Generative Modeling: This GPT-2 model is trained to predict the sequence of these visual tokens based on facial attributes.
Reconstruction: The predicted tokens are passed back through the VQ-VAE Decoder to synthesize a new face.

🛠️ Intended Uses & Limitations

How to Use

This model is designed to work in conjunction with the project's custom VQ-VAE decoder. It accepts a sequence of facial attribute tokens and generates the corresponding latent representation of a face.

Limitations

Resolution: Fixed at $128 \times 128$ reconstruction.
Realism: While structurally sound, it may lack the high-frequency details found in modern Diffusion models.
Dependencies: Requires the specific VQ-VAE Codebook and Decoder available in the GitHub Repository.

📊 Training Data

Dataset: CelebA (CelebFaces Attributes Dataset).
Preprocessing: Images resized to $128 \times 128$, normalized, and tokenized using a Vector Quantized Variational Autoencoder.
Input Format: <START_FACE> <ATTRIBUTES> <START_GENERATION> <256_TOKENS> <END_GENERATION>.

⚙️ Training Procedure

Hyperparameters

Learning Rate: 2e-4
Batch Size: 40 (Effective Batch Size: 80 with 2 Gradient Accumulation steps)
Precision: Mixed Precision (Native AMP)
Epochs: 5
Optimizer: AdamW (Fused)

Training Results

The model shows steady convergence in cross-entropy loss over 5 epochs:

Epoch	Training Loss	Validation Loss
1.0	3.7055	3.7670
5.0	3.6473	3.7547

🔗 Resources

Live Demo: Streamlit Web App
Source Code: GitHub Repository

📜 Framework Versions

Transformers 5.0.0
Pytorch 2.10.0+cu128
Datasets 4.8.3
Tokenizers 0.22.2

Downloads last month: 1,860

Safetensors

Model size

86.3M params

Tensor type

F32

Model tree for yosef-samy019/gpt-face-celeb-generator

Base model

openai-community/gpt2

Finetuned

(2132)

this model