| --- |
| license: apache-2.0 |
| --- |
| |
| # UNet |
|
|
| A lightweight UNet with single-block levels and sliding window attention. |
|
|
| - Pixel-space model in CIELAB color space |
| - LAB input, RGB output |
| - Decompose the input images into their frequency-domain components |
| - [Docling](https://huggingface.co/ibm-granite/granite-docling-258M) as text encoder |
| - Token efficient visual text inputs |
| - Variable head in the attention modules across the layers |
|
|
| ## Retrospection |
|
|
| Reconstruction quality, from good to worst: |
| - U-Docling (this repo) |
| - U-DAE |
| - U-DAE-NLL |
| - EQ-SAE-CIELAB |
| - EQ-SAE-CIELAB-c8 |
| - VAE-f16-c4-kv |
| - VAE-f16-c4 |
| - VAE-f16-c8 |
|
|
| ## References |
|
|
| - 2411.17459 |
| - 2503.11576 |
| - 2510.17800 |
| - 2510.18279 |
|
|