tliby commited on
Commit
7e4a6ae
·
verified ·
1 Parent(s): 1d96ce8

update readme

Browse files
Files changed (1) hide show
  1. README.md +16 -100
README.md CHANGED
@@ -1,8 +1,16 @@
 
 
 
 
 
 
 
 
1
  # TC-AE: Unlocking Token Capacity for Deep Compression Autoencoders
2
 
3
  <p align="center">
4
- <a href="https://arxiv.org/abs/2604.07340"><img src="https://img.shields.io/badge/Paper-Arxiv-b31b1b.svg" alt="arXiv"></a>&nbsp;
5
- <a href="https://huggingface.co/inclusionAI/TC-AE/tree/main"><img src="https://img.shields.io/badge/🤗%20Hugging%20Face-Models-yellow" alt="Models"></a>
6
  </p>
7
  <div align="center">
8
  <a href="https://tliby.github.io/" target="_blank">Teng&nbsp;Li</a><sup>1,2*</sup>,
@@ -19,13 +27,6 @@
19
 
20
 
21
 
22
-
23
-
24
- ## News
25
-
26
- - [2026/04/09] Research paper, code, and models are released for TC-AE!
27
-
28
-
29
  ## Introduction
30
 
31
  <p align="center">
@@ -40,12 +41,9 @@
40
  - Staged Token Compression: Decomposes token-to-latent mapping into two stages, reducing structural information loss in the bottleneck
41
  - Semantic Enhancement: Incorporates self-supervised learning to produce more generative-friendly latents
42
 
43
- 🚀 In this codebase, we release:
44
 
45
- - Pre-trained TC-AE tokenizer weights and evaluation code
46
- - Diffusion model training and evaluation code
47
-
48
- ## Environment Setup
49
 
50
  To set up the environment for TC-AE, follow these steps:
51
 
@@ -55,19 +53,7 @@ conda activate tcae
55
  pip install -r requirements.txt
56
  ```
57
 
58
- ## Download Checkpoints
59
-
60
- Download the pre-trained TC-AE weights and place them in the `results/` directory:
61
-
62
-
63
- | Tokenizer | Compression Ratio | rFID | LPIPS | Pretrained Weights |
64
- | --------- | ----------------- | ---- | ----- | ------------------------------------------------------------ |
65
- | TC-AE-SL | f32d128 | 0.35 | 0.060 | [![Models](https://img.shields.io/badge/🤗%20Hugging%20Face-Models-yellow)](https://huggingface.co/inclusionAI/TC-AE/tree/main) |
66
-
67
-
68
- ## Reconstruction Evaluation
69
-
70
- ##### Image Reconstruction Demo
71
 
72
  ```shell
73
  python tcae/script/demo_recon.py \
@@ -78,7 +64,7 @@ python tcae/script/demo_recon.py \
78
  --rank 0
79
  ```
80
 
81
- ##### ImageNet Evaluation
82
 
83
  Evaluate reconstruction quality on ImageNet validation set:
84
 
@@ -90,80 +76,10 @@ python tcae/script/eval_recon.py \
90
  --rank 0
91
  ```
92
 
93
- ## Generation Evaluation
94
-
95
- Our DiT architecture and training pipeline are based on [RAE](https://github.com/bytetriper/RAE) and [VA-VAE](https://github.com/hustvl/LightningDiT).
96
-
97
- ##### Prepare ImageNet Latents for Training
98
-
99
- Extract and cache latent representations from ImageNet training set:
100
-
101
- ```shell
102
- accelerate launch \
103
- --mixed_precision bf16 \
104
- diffusion/script/extract_features.py \
105
- --data_path /path/to/imagenet_train \
106
- --batch_size 50 \
107
- --tokenizer_cfg_path configs/TC-AE-SL.yaml \
108
- --tokenizer_ckpt_path results/tcae.pt
109
- ```
110
-
111
- This will cache latents to `results/cached_latents/imagenet_train_256/`.
112
-
113
- ##### Training
114
-
115
- Train a DiT-XL model on the extracted latents:
116
-
117
- ```shell
118
- mkdir -p results/dit
119
- torchrun --standalone --nproc_per_node=8 \
120
- diffusion/script/train_dit.py \
121
- --config configs/DiT-XL.yaml \
122
- --data-path results/cached_latents/imagenet_train_256 \
123
- --results-dir results/dit \
124
- --image-size 256 \
125
- --precision bf16
126
- ```
127
-
128
- ##### Sampling
129
-
130
- Generate images using the trained diffusion model:
131
-
132
- ```shell
133
- mkdir -p results/dit/samples
134
- torchrun --standalone --nnodes=1 --nproc_per_node=8 \
135
- diffusion/script/sample_ddp_dit.py \
136
- --config configs/DiT-XL.yaml \
137
- --sample-dir results/dit/samples \
138
- --precision bf16 \
139
- --label-sampling equal \
140
- --tokenizer_cfg_path configs/TC-AE-SL.yaml \
141
- --tokenizer_ckpt_path results/tcae.pt
142
- ```
143
-
144
- ##### Evaluation
145
-
146
- Download the ImageNet reference statistics: [adm_in256_stats.npz](https://huggingface.co/jjiaweiyang/l-DeTok/commit/28ef58d254bb1bde10e331372fe542e5458f3b5f#d2h-232267) and place it in `results/`.
147
-
148
- ```shell
149
- python diffusion/script/eval_dit.py \
150
- --generated_dir results/dit/samples/DiT-0100000-cfg-1.00-bs100-ODE-50-euler-bf16 \
151
- --reference_npz results/adm_in256_stats.npz \
152
- --batch-size 512 \
153
- --num-workers 8
154
- ```
155
-
156
- ## Acknowledgements
157
-
158
- The codebase is built on [HieraTok](https://arxiv.org/abs/2509.23736), [RAE](https://github.com/bytetriper/RAE), [VA-VAE](https://github.com/hustvl/LightningDiT), [iBOT](https://github.com/bytedance/ibot). Thanks for their efforts!
159
-
160
- ## License
161
-
162
- This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
163
-
164
  ## Citation
165
 
166
- ```
 
167
  @article{li2026tcae,
168
  title={TC-AE: Unlocking Token Capacity for Deep Compression Autoencoders},
169
  author={Li, Teng and Huang, Ziyuan and Chen, Cong and Li, Yangfu and Lyu, Yuanhuiyi and Zheng, Dandan and Shen, Chunhua and Zhang, Jun},
 
1
+ ---
2
+ license: mit
3
+ pipeline_tag: feature-extraction
4
+ tags:
5
+ - visual-tokenizer
6
+ - image-reconstruction
7
+ ---
8
+
9
  # TC-AE: Unlocking Token Capacity for Deep Compression Autoencoders
10
 
11
  <p align="center">
12
+ <a href="https://huggingface.co/papers/2604.07340"><img src="https://img.shields.io/badge/Paper-Arxiv-b31b1b.svg" alt="arXiv"></a>&nbsp;
13
+ <a href="https://github.com/inclusionAI/TC-AE"><img src="https://img.shields.io/badge/Code-GitHub-blue?logo=github" alt="GitHub"></a>
14
  </p>
15
  <div align="center">
16
  <a href="https://tliby.github.io/" target="_blank">Teng&nbsp;Li</a><sup>1,2*</sup>,
 
27
 
28
 
29
 
 
 
 
 
 
 
 
30
  ## Introduction
31
 
32
  <p align="center">
 
41
  - Staged Token Compression: Decomposes token-to-latent mapping into two stages, reducing structural information loss in the bottleneck
42
  - Semantic Enhancement: Incorporates self-supervised learning to produce more generative-friendly latents
43
 
44
+ ## Usage
45
 
46
+ ### Environment Setup
 
 
 
47
 
48
  To set up the environment for TC-AE, follow these steps:
49
 
 
53
  pip install -r requirements.txt
54
  ```
55
 
56
+ ### Image Reconstruction Demo
 
 
 
 
 
 
 
 
 
 
 
 
57
 
58
  ```shell
59
  python tcae/script/demo_recon.py \
 
64
  --rank 0
65
  ```
66
 
67
+ ### ImageNet Reconstruction Evaluation
68
 
69
  Evaluate reconstruction quality on ImageNet validation set:
70
 
 
76
  --rank 0
77
  ```
78
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
79
  ## Citation
80
 
81
+
82
+ ```bibtex
83
  @article{li2026tcae,
84
  title={TC-AE: Unlocking Token Capacity for Deep Compression Autoencoders},
85
  author={Li, Teng and Huang, Ziyuan and Chen, Cong and Li, Yangfu and Lyu, Yuanhuiyi and Zheng, Dandan and Shen, Chunhua and Zhang, Jun},