markyw commited on
Commit
e29c335
Β·
verified Β·
1 Parent(s): 4502a27

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +76 -3
README.md CHANGED
@@ -1,3 +1,76 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <div align="center">
2
+ <h1>WinTok: A Win-Win Hybrid Tokenizer via Decomposing Visual Understanding and Generation with Transferable Tokens</h1>
3
+
4
+ [![arXiv](https://img.shields.io/badge/arXiv-2508.05599-b31b1b.svg)](https://arxiv.org/abs/2605.18115)
5
+ [![Github](https://img.shields.io/badge/Github-WeTok-blue)](https://github.com/markywg/WinTok)
6
+ [![Hugging Face Model](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-yellow)](https://huggingface.co/markyw/WinTok/tree/main)
7
+ </div>
8
+
9
+ This project introduces **WinTok**, a concise hybrid visual tokenizer designed to resolve the long-standing conflict between visual understanding and generation. By decoupling semantic and pixel tokens with an asymmetric distillation mechanism, WinTok achieves a win-win across reconstruction, understanding, and generation, surpassing strong baselines with substantially less training data. <br><br>
10
+
11
+ > <a href="https://arxiv.org/abs/2605.18115">WinTok: A Win-Win Hybrid Tokenizer via Decomposing Visual Understanding and Generation with Transferable Tokens</a><br>
12
+ > [Yiwei Guo](https://scholar.google.com/citations?user=HCAyeJIAAAAJ&hl=zh-CN&oi=ao), [Shaobin Zhuang](https://scholar.google.com/citations?user=PGaDirMAAAAJ&hl=zh-CN&oi=ao), Canmiao Fu, [Zhipeng Huang](https://scholar.google.com/citations?user=_fnuIHUAAAAJ&hl=zh-CN&oi=ao), [Chen Li](https://scholar.google.com/citations?hl=zh-CN&user=WDJL3gYAAAAJ), Jing LYU, [Yali Wang](https://scholar.google.com/citations?hl=zh-CN&user=hD948dkAAAAJ)<br>
13
+ > Shenzhen Institutes of Advanced Technology (Chinese Academy of Sciences), WeChat Vision (Tencent Inc.), Shanghai Jiao Tong University<br>
14
+ > ```
15
+ > @article{guo2026wintok,
16
+ > title={WinTok: A Win-Win Hybrid Tokenizer via Decomposing Visual Understanding and Generation with Transferable Tokens},
17
+ > author={Guo, Yiwei and Zhuang, Shaobin and Huang, Zhipeng and Fu, Canmiao and Li, Chen and LYU, Jing and Wang, Yali},
18
+ > journal={arXiv preprint arXiv:2605.18115},
19
+ > year={2026}
20
+ > }
21
+ > ```
22
+
23
+ <p align="center">
24
+ <img src="./assets/visualization.jpg" width="90%">
25
+ <br>
26
+ <em>WinTok achieves superior performance on downstream applications, surpassing previous unified tokenizers, with a more flexible hybrid encoding mechanism.</em>
27
+ </p>
28
+
29
+ ## πŸ“° News
30
+ * **[2026.05.19]** πŸš€ πŸš€ πŸš€ We are excited to release **WinTok**, a unified visual tokenizer featuring our novel **hybrid encoding** and **asymmetric distillation**. Code and model are now available!
31
+
32
+ ## πŸ“– Implementations
33
+
34
+ ### πŸ› οΈ Installation
35
+ - **Dependencies**:
36
+ ```
37
+ bash env.sh
38
+ ```
39
+
40
+ ### Evaluation
41
+
42
+ - **Evaluation on ImageNet 50K Validation Set**
43
+
44
+ The dataset should be organized as follows:
45
+ ```
46
+ imagenet
47
+ └── val/
48
+ β”œβ”€β”€ ...
49
+ ```
50
+
51
+ Run the 256Γ—256 resolution evaluation script, change the corresponding path:
52
+ ```
53
+ bash scripts/eval_tokenizer/eval_metrics_ddp.sh
54
+ ```
55
+
56
+ - **Evaluation on MS-COCO Val2017**
57
+
58
+ The dataset should be organized as follows:
59
+ ```
60
+ MSCOCO2017
61
+ └── val2017/
62
+ β”œβ”€β”€ ...
63
+ ```
64
+
65
+ Run the 256Γ—256 resolution evaluation script, change the corresponding path:
66
+ ```
67
+ bash scripts/eval_tokenizer/eval_metrics_ddp.sh
68
+ ```
69
+
70
+
71
+ ### Inference
72
+
73
+ Simply test the effect of model reconstruction:
74
+ ```
75
+ python recon.py --ckpt_path path_to_ckpt
76
+ ```