nightknocker's picture
Update README.md
4a72c1b verified
|
raw
history blame
679 Bytes
---
license: apache-2.0
---
# UNet
A lightweight UNet with single-block levels and sliding window attention.
- Pixel-space model in CIELAB color space
- LAB input, RGB output
- Decompose the input images into their frequency-domain components
- [Docling](https://huggingface.co/ibm-granite/granite-docling-258M) as text encoder
- Token efficient visual text inputs
- Variable head in the attention modules across the layers
## Retrospection
Reconstruction quality, from good to worst:
- U-Docling (this repo)
- U-DAE
- U-DAE-NLL
- EQ-SAE-CIELAB
- EQ-SAE-CIELAB-c8
- VAE-f16-c4-kv
- VAE-f16-c4
- VAE-f16-c8
## References
- 2411.17459
- 2503.11576
- 2510.17800
- 2510.18279