license: apache-2.0
pipeline_tag: image-to-image
DecQ: Detail-Condensing Queries for Enhanced Reconstruction and Generation in Representation Autoencoders
This repository contains the weights for DecQ, a framework that introduces lightweight detail-condensing queries into Representation Autoencoders (RAEs). DecQ improves spatial reconstruction capacity while preserving the pretrained semantic space of vision foundation models (VFMs), facilitating high-quality image reconstruction and faster convergence in latent diffusion models.
- Paper: DecQ: Detail-Condensing Queries for Enhanced Reconstruction and Generation in Representation Autoencoders
- Repository: GitHub - Tianhang-Wang/DecQ
Overview
DecQ addresses the reconstruction–generation trade-off in RAEs by using detail-condensing queries to extract fine-grained information from intermediate VFM features through condenser modules. These queries are incorporated into the decoder to support reconstruction and are jointly generated with patch tokens during generative modeling.
Key features:
- Lightweight: Only 8 additional queries and 3.9% extra computation.
- Improved Reconstruction: Significant PSNR improvement over frozen DINOv2-based RAEs.
- Faster Convergence: Achieves 3.3× faster convergence in generative modeling.
This repository currently contains the Stage 1 tokenizer weights.
Sample Usage
To perform image reconstruction using the Stage 1 autoencoder, you can use the sampling script provided in the official repository:
python src/stage1_sample.py \
--config <config_path> \
--image <input_image_path>
Refer to the GitHub repository for environment setup and configuration files.
Citation
@article{wang2026decq,
title={DecQ: Detail-Condensing Queries for Enhanced Reconstruction and Generation in Representation Autoencoders},
author={Wang, Tianhang and Chen, Yitong and Song, Wei and Wu, Zuxuan and Li, Min and Wang, Jiaqi},
journal={arXiv preprint arXiv:2605.22777},
year={2026}
}