StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks
Paper β’ 1612.03242 β’ Published
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
A Deep Convolutional Generative Adversarial Network (DCGAN) implementation for text-to-image generation using GLOVE embeddings and COCO dataset.
Microsoft COCO Dataset:
train2014/val2014/captions_train2014.json, captions_val2014.jsonglove.6B.300d.txt (300-dimensional word vectors)βββ models/
β βββ dcgan_model.py # Generator and Discriminator architectures
β βββ char_cnn_rnn_model.py # Text processing models
β βββ net_modules/ # Network components
βββ saved_models/ # Trained model checkpoints
β βββ generator_final.pth
β βββ discriminator_final.pth
β βββ checkpoint_epoch_*.pth
βββ generated_images/ # Generated images and visualizations
β βββ output_*.png
β βββ output_gif_*.gif
β βββ evaluation_*.png
βββ utils.py # Utility functions
βββ data_util.py # Data processing utilities
βββ requirements.txt # Python dependencies
βββ glove.6B.300d.txt # GLOVE word embeddings
βββ DCGAN_Text2Image.ipynb # Main training notebook
Install Dependencies:
pip install -r requirements.txt
Download GLOVE Embeddings:
glove.6B.300d.txt in the project rootPrepare COCO Dataset:
Run Training:
# Execute cells in DCGAN_Text2Image.ipynb
# Training will run for 70 epochs with automatic checkpointing
# Load trained model
generator.load_state_dict(torch.load('saved_models/generator_final.pth'))
# Generate from text
noise = torch.randn(1, 100, 1, 1, device=device)
text_embedding = caption_to_embedding("a red car")
generated_image = generator(noise, text_embedding)
# Load checkpoint
start_epoch, G_losses, D_losses = load_checkpoint(
'saved_models/checkpoint_epoch_50.pth',
generator, discriminator, optimizer_G, optimizer_D
)