--- license: apache-2.0 --- # UNet A lightweight UNet with single-block levels and sliding window attention. - Pixel-space model in CIELAB color space - LAB input, RGB output - Decompose the input images into their frequency-domain components - [Docling](https://huggingface.co/ibm-granite/granite-docling-258M) as text encoder - Token efficient visual text inputs - Variable head in the attention modules across the layers ## Retrospection Reconstruction quality, from good to worst: - U-Docling (this repo) - U-DAE - U-DAE-NLL - EQ-SAE-CIELAB - EQ-SAE-CIELAB-c8 - VAE-f16-c4-kv - VAE-f16-c4 - VAE-f16-c8 ## References - 2411.17459 - 2503.11576 - 2510.17800 - 2510.18279