---
license: mit
tags:
  - image-to-image
  - style-transfer
  - pytorch
  - beginner
  - fast-inference
pipeline_tag: image-to-image
datasets:
  - coco
metrics:
  - perceptual-loss
---

# mini-style-transfer

A small, fast artistic style transfer model built with PyTorch as a learning project.  
Applies 4 artistic styles to any photo in **under 1 second on CPU**.

Based on [Johnson et al. (2016) — Perceptual Losses for Real-Time Style Transfer](https://arxiv.org/abs/1603.08155).

---

## What it does

| Input photo | + Style painting | → Output |
|---|---|---|
| Any photo (any size) | Starry Night / Mosaic / Candy / Sketch | Stylised version |

---

## Styles available

| File | Style |
|---|---|
| `starry_night.pth` | Van Gogh — Starry Night |
| `mosaic.pth` | Classic mosaic tile pattern |
| `candy.pth` | Bright candy colours |
| `sketch.pth` | Pencil sketch look |

---

## Quick start

```python
import torch
from torchvision import transforms
from PIL import Image
from model import StyleNet

# 1. Load model
model = StyleNet()
model.load_state_dict(torch.load("starry_night.pth", map_location="cpu"))
model.eval()

# 2. Prepare your image
img = Image.open("my_photo.jpg").convert("RGB")
to_tensor = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225]),
])
tensor = to_tensor(img).unsqueeze(0)

# 3. Run inference
with torch.no_grad():
    output = model(tensor).squeeze(0).clamp(0, 1)

# 4. Save result
result = transforms.ToPILImage()(output)
result.save("styled_output.jpg")
print("Done! Open styled_output.jpg")
```

Or use the included `run.py` script:

```bash
python run.py --model starry_night.pth --input my_photo.jpg --output result.jpg
```

---

## Model details

| Property | Value |
|---|---|
| Architecture | Feed-forward CNN (Encoder → 5× ResBlock → Decoder) |
| Parameters | ~450K |
| Model size | ~1.7 MB per style |
| Input | Any RGB image, any resolution |
| Output | Same size as input, styled |
| Framework | PyTorch 2.x |
| Normalisation | ImageNet mean/std |

---

## Training details

| Property | Value |
|---|---|
| Content dataset | MS-COCO train2017 (subset) |
| Style images | 4 artwork images |
| Epochs | 2 per style |
| Batch size | 4 |
| Image size (training) | 256 × 256 |
| Optimizer | Adam, lr=1e-3 |
| Loss | Perceptual (VGG16) — content + style |
| Content weight | 1.0 |
| Style weight | 1e5 |
| Training time | ~45 min per style (GPU) |

---

## Repository structure

```
mini-style-transfer/
├── model.py            ← StyleNet architecture
├── train.py            ← Training script
├── run.py              ← Inference script
├── starry_night.pth    ← Trained weights (starry night style)
├── mosaic.pth          ← Trained weights (mosaic style)
├── candy.pth           ← Trained weights (candy style)
├── sketch.pth          ← Trained weights (sketch style)
└── README.md           ← This file
```

---

## Limitations

- Each style is a **separate model file** — there is no single multi-style model yet
- Works best on **natural photos** (landscapes, portraits, cities)
- Cartoons, diagrams, and text-heavy images may give unexpected results
- Training images were 256×256; very high-resolution outputs may look slightly blurry
- Not suitable for commercial use without further evaluation

---

## What I learned building this

- How **convolutional encoders and decoders** work together
- What **Instance Normalisation** does vs Batch Normalisation
- How **Gram matrices** capture texture and style
- What **perceptual loss** is and why pixel-level loss looks bad for style transfer
- How to use a **pretrained VGG** network as a feature extractor without training it

---

## References

- Johnson, J., Alahi, A., & Fei-Fei, L. (2016). [Perceptual Losses for Real-Time Style Transfer and Super-Resolution](https://arxiv.org/abs/1603.08155)
- Gatys, L., Ecker, A., & Bethge, M. (2015). [A Neural Algorithm of Artistic Style](https://arxiv.org/abs/1508.06576)

---

*Built as a learning project. Feedback and suggestions welcome!*