Add metadata and improve model card
#1
by nielsr HF Staff - opened
README.md
CHANGED
|
@@ -1,24 +1,20 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
<div align="center">
|
| 2 |
<h1>WinTok: A Win-Win Hybrid Tokenizer via Decomposing Visual Understanding and Generation with Transferable Tokens</h1>
|
| 3 |
|
| 4 |
-
[](https://huggingface.co/markyw/WinTok/tree/main)
|
| 7 |
</div>
|
| 8 |
|
| 9 |
This project introduces **WinTok**, a concise hybrid visual tokenizer designed to resolve the long-standing conflict between visual understanding and generation. By decoupling semantic and pixel tokens with an asymmetric distillation mechanism, WinTok achieves a win-win across reconstruction, understanding, and generation, surpassing strong baselines with substantially less training data. <br><br>
|
| 10 |
|
| 11 |
-
>
|
| 12 |
-
>
|
| 13 |
> Shenzhen Institutes of Advanced Technology (Chinese Academy of Sciences), WeChat Vision (Tencent Inc.), Shanghai Jiao Tong University<br>
|
| 14 |
-
> ```
|
| 15 |
-
> @article{guo2026wintok,
|
| 16 |
-
> title={WinTok: A Win-Win Hybrid Tokenizer via Decomposing Visual Understanding and Generation with Transferable Tokens},
|
| 17 |
-
> author={Guo, Yiwei and Zhuang, Shaobin and Huang, Zhipeng and Fu, Canmiao and Li, Chen and LYU, Jing and Wang, Yali},
|
| 18 |
-
> journal={arXiv preprint arXiv:2605.18115},
|
| 19 |
-
> year={2026}
|
| 20 |
-
> }
|
| 21 |
-
> ```
|
| 22 |
|
| 23 |
<p align="center">
|
| 24 |
<img src="./assets/visualization.jpg" width="90%">
|
|
@@ -33,7 +29,7 @@ This project introduces **WinTok**, a concise hybrid visual tokenizer designed t
|
|
| 33 |
|
| 34 |
### 🛠️ Installation
|
| 35 |
- **Dependencies**:
|
| 36 |
-
```
|
| 37 |
bash env.sh
|
| 38 |
```
|
| 39 |
|
|
@@ -49,7 +45,7 @@ imagenet
|
|
| 49 |
```
|
| 50 |
|
| 51 |
Run the 256×256 resolution evaluation script, change the corresponding path:
|
| 52 |
-
```
|
| 53 |
bash scripts/eval_tokenizer/eval_metrics_ddp.sh
|
| 54 |
```
|
| 55 |
|
|
@@ -63,7 +59,7 @@ MSCOCO2017
|
|
| 63 |
```
|
| 64 |
|
| 65 |
Run the 256×256 resolution evaluation script, change the corresponding path:
|
| 66 |
-
```
|
| 67 |
bash scripts/eval_tokenizer/eval_metrics_ddp.sh
|
| 68 |
```
|
| 69 |
|
|
@@ -71,6 +67,17 @@ bash scripts/eval_tokenizer/eval_metrics_ddp.sh
|
|
| 71 |
### Inference
|
| 72 |
|
| 73 |
Simply test the effect of model reconstruction:
|
| 74 |
-
```
|
| 75 |
python recon.py --ckpt_path path_to_ckpt
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 76 |
```
|
|
|
|
| 1 |
+
---
|
| 2 |
+
pipeline_tag: image-feature-extraction
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
<div align="center">
|
| 6 |
<h1>WinTok: A Win-Win Hybrid Tokenizer via Decomposing Visual Understanding and Generation with Transferable Tokens</h1>
|
| 7 |
|
| 8 |
+
[](https://arxiv.org/abs/2605.18115)
|
| 9 |
+
[](https://github.com/markywg/WinTok)
|
| 10 |
[](https://huggingface.co/markyw/WinTok/tree/main)
|
| 11 |
</div>
|
| 12 |
|
| 13 |
This project introduces **WinTok**, a concise hybrid visual tokenizer designed to resolve the long-standing conflict between visual understanding and generation. By decoupling semantic and pixel tokens with an asymmetric distillation mechanism, WinTok achieves a win-win across reconstruction, understanding, and generation, surpassing strong baselines with substantially less training data. <br><br>
|
| 14 |
|
| 15 |
+
> [WinTok: A Win-Win Hybrid Tokenizer via Decomposing Visual Understanding and Generation with Transferable Tokens](https://huggingface.co/papers/2605.18115)<br>
|
| 16 |
+
> Yiwei Guo, Shaobin Zhuang, Canmiao Fu, Zhipeng Huang, Chen Li, Jing LYU, Yali Wang<br>
|
| 17 |
> Shenzhen Institutes of Advanced Technology (Chinese Academy of Sciences), WeChat Vision (Tencent Inc.), Shanghai Jiao Tong University<br>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
|
| 19 |
<p align="center">
|
| 20 |
<img src="./assets/visualization.jpg" width="90%">
|
|
|
|
| 29 |
|
| 30 |
### 🛠️ Installation
|
| 31 |
- **Dependencies**:
|
| 32 |
+
```bash
|
| 33 |
bash env.sh
|
| 34 |
```
|
| 35 |
|
|
|
|
| 45 |
```
|
| 46 |
|
| 47 |
Run the 256×256 resolution evaluation script, change the corresponding path:
|
| 48 |
+
```bash
|
| 49 |
bash scripts/eval_tokenizer/eval_metrics_ddp.sh
|
| 50 |
```
|
| 51 |
|
|
|
|
| 59 |
```
|
| 60 |
|
| 61 |
Run the 256×256 resolution evaluation script, change the corresponding path:
|
| 62 |
+
```bash
|
| 63 |
bash scripts/eval_tokenizer/eval_metrics_ddp.sh
|
| 64 |
```
|
| 65 |
|
|
|
|
| 67 |
### Inference
|
| 68 |
|
| 69 |
Simply test the effect of model reconstruction:
|
| 70 |
+
```bash
|
| 71 |
python recon.py --ckpt_path path_to_ckpt
|
| 72 |
+
```
|
| 73 |
+
|
| 74 |
+
## Citation
|
| 75 |
+
|
| 76 |
+
```bibtex
|
| 77 |
+
@article{guo2026wintok,
|
| 78 |
+
title={WinTok: A Win-Win Hybrid Tokenizer via Decomposing Visual Understanding and Generation with Transferable Tokens},
|
| 79 |
+
author={Guo, Yiwei and Zhuang, Shaobin and Huang, Zhipeng and Fu, Canmiao and Li, Chen and LYU, Jing and Wang, Yali},
|
| 80 |
+
journal={arXiv preprint arXiv:2605.18115},
|
| 81 |
+
year={2026}
|
| 82 |
+
}
|
| 83 |
```
|