| --- |
| license: mit |
| datasets: |
| - ylecun/mnist |
| tags: |
| - harley-ml |
| - image |
| - digit-to-image |
| - mnist |
| - small |
| - text-to-image |
| - supralabs |
| --- |
| |
| # **SupraMNiST-IMG-200k** |
|
|
| ## Sumary |
|
|
| ``` |
| Task: Number-To-Image |
| Dataset: ylecun/mnist |
| Total training time: ~8 minutes |
| Inputs: Number (0-9) |
| Outputs: 32x32 image |
| Params: ~201k |
| Framework: PyTorch, diffusers |
| Author: SupraLabs |
| ``` |
|
|
| ## **Description** |
| MNiST-IMG-200k is an ~**200k parameter model** trained to **generate an image** based on an **input number (0-9)**. |
|
|
| ## Architecture |
|
|
| | Parameter | Value | |
| | -------------------- | ---------- | |
| | `image_size` | `32` | |
| | `in_channels` | `1` | |
| | `out_channels` | `1` | |
| | `num_classes` | `10` | |
| | `block_out_channels` | `[12, 16]` | |
| | `layers_per_block` | `8` | |
| | `norm_num_groups` | `4` | |
|
|
| ## **Training** |
|
|
| ### **Hardware** |
|
|
| MNiST-IMG was trained on Google Colaboratory (NVIDA Tesla T4) for ~8 minutes with a batch size of 64 for 10 epochs. |
|
|
| ### **Dataset** |
|
|
| [ylecun/mnist](https://huggingface.co/ylecun/mnist) |
|
|
| ### **Training Results** |
|
|
| Loss ended at ~**0.40**. |
|
|
| Note: I can't provide the raw training logs as I loss it somehwere after training. Sorry! |
|
|
| ## **Generation Examples** |
|
|
| At **1000** decoding steps: |
|
|
|  |
|
|
| At **200** decoding steps: |
|
|
|  |
|
|
| # Inference |
|
|
| Use the script in the repo. [inference.py](https://huggingface.co/Harley-ml/MNIST-IMG-390k/blob/main/inference.py) |
|
|
| ### Related Models |
|
|
| 1. [MNIST-IMG-390k](https://huggingface.co/Harley-ml/MNIST-IMG-390k) |
|
|
| # Citation |
|
|
| ```bibtex |
| @misc{mnist-img-390k, |
| title = {MNIST-IMG-390k: a Tiny Diffusion Model for Generating Handwritten Digits}, |
| author = {Paul Courneya; Harley-ml}, |
| year = {2026}, |
| url = {https://huggingface.co/Harley-ml/MNIST-IMG-390k} |
| } |
| ``` |