🧠 Face Generation GPT-2 (VQ-VAE Latent Space)

This model is a GPT-2 fine-tuned to act as an autoregressive generator for human faces. Instead of predicting text tokens, it predicts visual tokens extracted from the CelebA dataset via a VQ-VAE.

πŸš€ Project Overview

The core idea is to treat image generation as a language modeling problem.

  1. Compression: A VQ-VAE encodes $128 \times 128$ face images into $16 \times 16$ discrete tokens (total 256 tokens).
  2. Generative Modeling: This GPT-2 model is trained to predict the sequence of these visual tokens based on facial attributes.
  3. Reconstruction: The predicted tokens are passed back through the VQ-VAE Decoder to synthesize a new face.

πŸ› οΈ Intended Uses & Limitations

How to Use

This model is designed to work in conjunction with the project's custom VQ-VAE decoder. It accepts a sequence of facial attribute tokens and generates the corresponding latent representation of a face.

Limitations

  • Resolution: Fixed at $128 \times 128$ reconstruction.
  • Realism: While structurally sound, it may lack the high-frequency details found in modern Diffusion models.
  • Dependencies: Requires the specific VQ-VAE Codebook and Decoder available in the GitHub Repository.

πŸ“Š Training Data

  • Dataset: CelebA (CelebFaces Attributes Dataset).
  • Preprocessing: Images resized to $128 \times 128$, normalized, and tokenized using a Vector Quantized Variational Autoencoder.
  • Input Format: <START_FACE> <ATTRIBUTES> <START_GENERATION> <256_TOKENS> <END_GENERATION>.

βš™οΈ Training Procedure

Hyperparameters

  • Learning Rate: 2e-4
  • Batch Size: 40 (Effective Batch Size: 80 with 2 Gradient Accumulation steps)
  • Precision: Mixed Precision (Native AMP)
  • Epochs: 5
  • Optimizer: AdamW (Fused)

Training Results

The model shows steady convergence in cross-entropy loss over 5 epochs:

Epoch Training Loss Validation Loss
1.0 3.7055 3.7670
5.0 3.6473 3.7547

πŸ”— Resources

πŸ“œ Framework Versions

  • Transformers 5.0.0
  • Pytorch 2.10.0+cu128
  • Datasets 4.8.3
  • Tokenizers 0.22.2
Downloads last month
1,860
Safetensors
Model size
86.3M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for yosef-samy019/gpt-face-celeb-generator

Finetuned
(2132)
this model