mini-style-transfer

A small, fast artistic style transfer model built with PyTorch as a learning project.
Applies 4 artistic styles to any photo in under 1 second on CPU.

Based on Johnson et al. (2016) β€” Perceptual Losses for Real-Time Style Transfer.


What it does

Input photo + Style painting β†’ Output
Any photo (any size) Starry Night / Mosaic / Candy / Sketch Stylised version

Styles available

File Style
starry_night.pth Van Gogh β€” Starry Night
mosaic.pth Classic mosaic tile pattern
candy.pth Bright candy colours
sketch.pth Pencil sketch look

Quick start

import torch
from torchvision import transforms
from PIL import Image
from model import StyleNet

# 1. Load model
model = StyleNet()
model.load_state_dict(torch.load("starry_night.pth", map_location="cpu"))
model.eval()

# 2. Prepare your image
img = Image.open("my_photo.jpg").convert("RGB")
to_tensor = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225]),
])
tensor = to_tensor(img).unsqueeze(0)

# 3. Run inference
with torch.no_grad():
    output = model(tensor).squeeze(0).clamp(0, 1)

# 4. Save result
result = transforms.ToPILImage()(output)
result.save("styled_output.jpg")
print("Done! Open styled_output.jpg")

Or use the included run.py script:

python run.py --model starry_night.pth --input my_photo.jpg --output result.jpg

Model details

Property Value
Architecture Feed-forward CNN (Encoder β†’ 5Γ— ResBlock β†’ Decoder)
Parameters ~450K
Model size ~1.7 MB per style
Input Any RGB image, any resolution
Output Same size as input, styled
Framework PyTorch 2.x
Normalisation ImageNet mean/std

Training details

Property Value
Content dataset MS-COCO train2017 (subset)
Style images 4 artwork images
Epochs 2 per style
Batch size 4
Image size (training) 256 Γ— 256
Optimizer Adam, lr=1e-3
Loss Perceptual (VGG16) β€” content + style
Content weight 1.0
Style weight 1e5
Training time ~45 min per style (GPU)

Repository structure

mini-style-transfer/
β”œβ”€β”€ model.py            ← StyleNet architecture
β”œβ”€β”€ train.py            ← Training script
β”œβ”€β”€ run.py              ← Inference script
β”œβ”€β”€ starry_night.pth    ← Trained weights (starry night style)
β”œβ”€β”€ mosaic.pth          ← Trained weights (mosaic style)
β”œβ”€β”€ candy.pth           ← Trained weights (candy style)
β”œβ”€β”€ sketch.pth          ← Trained weights (sketch style)
└── README.md           ← This file

Limitations

  • Each style is a separate model file β€” there is no single multi-style model yet
  • Works best on natural photos (landscapes, portraits, cities)
  • Cartoons, diagrams, and text-heavy images may give unexpected results
  • Training images were 256Γ—256; very high-resolution outputs may look slightly blurry
  • Not suitable for commercial use without further evaluation

What I learned building this

  • How convolutional encoders and decoders work together
  • What Instance Normalisation does vs Batch Normalisation
  • How Gram matrices capture texture and style
  • What perceptual loss is and why pixel-level loss looks bad for style transfer
  • How to use a pretrained VGG network as a feature extractor without training it

References


Built as a learning project. Feedback and suggestions welcome!

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Papers for Ateshh/mini-style-transfer