--- license: mit tags: - image-to-image - style-transfer - pytorch - beginner - fast-inference pipeline_tag: image-to-image datasets: - coco metrics: - perceptual-loss --- # mini-style-transfer A small, fast artistic style transfer model built with PyTorch as a learning project. Applies 4 artistic styles to any photo in **under 1 second on CPU**. Based on [Johnson et al. (2016) — Perceptual Losses for Real-Time Style Transfer](https://arxiv.org/abs/1603.08155). --- ## What it does | Input photo | + Style painting | → Output | |---|---|---| | Any photo (any size) | Starry Night / Mosaic / Candy / Sketch | Stylised version | --- ## Styles available | File | Style | |---|---| | `starry_night.pth` | Van Gogh — Starry Night | | `mosaic.pth` | Classic mosaic tile pattern | | `candy.pth` | Bright candy colours | | `sketch.pth` | Pencil sketch look | --- ## Quick start ```python import torch from torchvision import transforms from PIL import Image from model import StyleNet # 1. Load model model = StyleNet() model.load_state_dict(torch.load("starry_night.pth", map_location="cpu")) model.eval() # 2. Prepare your image img = Image.open("my_photo.jpg").convert("RGB") to_tensor = transforms.Compose([ transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), ]) tensor = to_tensor(img).unsqueeze(0) # 3. Run inference with torch.no_grad(): output = model(tensor).squeeze(0).clamp(0, 1) # 4. Save result result = transforms.ToPILImage()(output) result.save("styled_output.jpg") print("Done! Open styled_output.jpg") ``` Or use the included `run.py` script: ```bash python run.py --model starry_night.pth --input my_photo.jpg --output result.jpg ``` --- ## Model details | Property | Value | |---|---| | Architecture | Feed-forward CNN (Encoder → 5× ResBlock → Decoder) | | Parameters | ~450K | | Model size | ~1.7 MB per style | | Input | Any RGB image, any resolution | | Output | Same size as input, styled | | Framework | PyTorch 2.x | | Normalisation | ImageNet mean/std | --- ## Training details | Property | Value | |---|---| | Content dataset | MS-COCO train2017 (subset) | | Style images | 4 artwork images | | Epochs | 2 per style | | Batch size | 4 | | Image size (training) | 256 × 256 | | Optimizer | Adam, lr=1e-3 | | Loss | Perceptual (VGG16) — content + style | | Content weight | 1.0 | | Style weight | 1e5 | | Training time | ~45 min per style (GPU) | --- ## Repository structure ``` mini-style-transfer/ ├── model.py ← StyleNet architecture ├── train.py ← Training script ├── run.py ← Inference script ├── starry_night.pth ← Trained weights (starry night style) ├── mosaic.pth ← Trained weights (mosaic style) ├── candy.pth ← Trained weights (candy style) ├── sketch.pth ← Trained weights (sketch style) └── README.md ← This file ``` --- ## Limitations - Each style is a **separate model file** — there is no single multi-style model yet - Works best on **natural photos** (landscapes, portraits, cities) - Cartoons, diagrams, and text-heavy images may give unexpected results - Training images were 256×256; very high-resolution outputs may look slightly blurry - Not suitable for commercial use without further evaluation --- ## What I learned building this - How **convolutional encoders and decoders** work together - What **Instance Normalisation** does vs Batch Normalisation - How **Gram matrices** capture texture and style - What **perceptual loss** is and why pixel-level loss looks bad for style transfer - How to use a **pretrained VGG** network as a feature extractor without training it --- ## References - Johnson, J., Alahi, A., & Fei-Fei, L. (2016). [Perceptual Losses for Real-Time Style Transfer and Super-Resolution](https://arxiv.org/abs/1603.08155) - Gatys, L., Ecker, A., & Bethge, M. (2015). [A Neural Algorithm of Artistic Style](https://arxiv.org/abs/1508.06576) --- *Built as a learning project. Feedback and suggestions welcome!*