Add metadata and improve model card
Browse filesThis PR adds the `image-feature-extraction` pipeline tag to the model card metadata and ensures the README is properly formatted. This will improve the visibility and discoverability of the model on the Hugging Face Hub.
README.md
CHANGED
|
@@ -1,24 +1,20 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
<div align="center">
|
| 2 |
<h1>WinTok: A Win-Win Hybrid Tokenizer via Decomposing Visual Understanding and Generation with Transferable Tokens</h1>
|
| 3 |
|
| 4 |
-
[](https://huggingface.co/markyw/WinTok/tree/main)
|
| 7 |
</div>
|
| 8 |
|
| 9 |
This project introduces **WinTok**, a concise hybrid visual tokenizer designed to resolve the long-standing conflict between visual understanding and generation. By decoupling semantic and pixel tokens with an asymmetric distillation mechanism, WinTok achieves a win-win across reconstruction, understanding, and generation, surpassing strong baselines with substantially less training data. <br><br>
|
| 10 |
|
| 11 |
-
>
|
| 12 |
-
>
|
| 13 |
> Shenzhen Institutes of Advanced Technology (Chinese Academy of Sciences), WeChat Vision (Tencent Inc.), Shanghai Jiao Tong University<br>
|
| 14 |
-
> ```
|
| 15 |
-
> @article{guo2026wintok,
|
| 16 |
-
> title={WinTok: A Win-Win Hybrid Tokenizer via Decomposing Visual Understanding and Generation with Transferable Tokens},
|
| 17 |
-
> author={Guo, Yiwei and Zhuang, Shaobin and Huang, Zhipeng and Fu, Canmiao and Li, Chen and LYU, Jing and Wang, Yali},
|
| 18 |
-
> journal={arXiv preprint arXiv:2605.18115},
|
| 19 |
-
> year={2026}
|
| 20 |
-
> }
|
| 21 |
-
> ```
|
| 22 |
|
| 23 |
<p align="center">
|
| 24 |
<img src="./assets/visualization.jpg" width="90%">
|
|
@@ -33,7 +29,7 @@ This project introduces **WinTok**, a concise hybrid visual tokenizer designed t
|
|
| 33 |
|
| 34 |
### 🛠️ Installation
|
| 35 |
- **Dependencies**:
|
| 36 |
-
```
|
| 37 |
bash env.sh
|
| 38 |
```
|
| 39 |
|
|
@@ -49,7 +45,7 @@ imagenet
|
|
| 49 |
```
|
| 50 |
|
| 51 |
Run the 256×256 resolution evaluation script, change the corresponding path:
|
| 52 |
-
```
|
| 53 |
bash scripts/eval_tokenizer/eval_metrics_ddp.sh
|
| 54 |
```
|
| 55 |
|
|
@@ -63,7 +59,7 @@ MSCOCO2017
|
|
| 63 |
```
|
| 64 |
|
| 65 |
Run the 256×256 resolution evaluation script, change the corresponding path:
|
| 66 |
-
```
|
| 67 |
bash scripts/eval_tokenizer/eval_metrics_ddp.sh
|
| 68 |
```
|
| 69 |
|
|
@@ -71,6 +67,17 @@ bash scripts/eval_tokenizer/eval_metrics_ddp.sh
|
|
| 71 |
### Inference
|
| 72 |
|
| 73 |
Simply test the effect of model reconstruction:
|
| 74 |
-
```
|
| 75 |
python recon.py --ckpt_path path_to_ckpt
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 76 |
```
|
|
|
|
| 1 |
+
---
|
| 2 |
+
pipeline_tag: image-feature-extraction
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
<div align="center">
|
| 6 |
<h1>WinTok: A Win-Win Hybrid Tokenizer via Decomposing Visual Understanding and Generation with Transferable Tokens</h1>
|
| 7 |
|
| 8 |
+
[](https://arxiv.org/abs/2605.18115)
|
| 9 |
+
[](https://github.com/markywg/WinTok)
|
| 10 |
[](https://huggingface.co/markyw/WinTok/tree/main)
|
| 11 |
</div>
|
| 12 |
|
| 13 |
This project introduces **WinTok**, a concise hybrid visual tokenizer designed to resolve the long-standing conflict between visual understanding and generation. By decoupling semantic and pixel tokens with an asymmetric distillation mechanism, WinTok achieves a win-win across reconstruction, understanding, and generation, surpassing strong baselines with substantially less training data. <br><br>
|
| 14 |
|
| 15 |
+
> [WinTok: A Win-Win Hybrid Tokenizer via Decomposing Visual Understanding and Generation with Transferable Tokens](https://huggingface.co/papers/2605.18115)<br>
|
| 16 |
+
> Yiwei Guo, Shaobin Zhuang, Canmiao Fu, Zhipeng Huang, Chen Li, Jing LYU, Yali Wang<br>
|
| 17 |
> Shenzhen Institutes of Advanced Technology (Chinese Academy of Sciences), WeChat Vision (Tencent Inc.), Shanghai Jiao Tong University<br>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
|
| 19 |
<p align="center">
|
| 20 |
<img src="./assets/visualization.jpg" width="90%">
|
|
|
|
| 29 |
|
| 30 |
### 🛠️ Installation
|
| 31 |
- **Dependencies**:
|
| 32 |
+
```bash
|
| 33 |
bash env.sh
|
| 34 |
```
|
| 35 |
|
|
|
|
| 45 |
```
|
| 46 |
|
| 47 |
Run the 256×256 resolution evaluation script, change the corresponding path:
|
| 48 |
+
```bash
|
| 49 |
bash scripts/eval_tokenizer/eval_metrics_ddp.sh
|
| 50 |
```
|
| 51 |
|
|
|
|
| 59 |
```
|
| 60 |
|
| 61 |
Run the 256×256 resolution evaluation script, change the corresponding path:
|
| 62 |
+
```bash
|
| 63 |
bash scripts/eval_tokenizer/eval_metrics_ddp.sh
|
| 64 |
```
|
| 65 |
|
|
|
|
| 67 |
### Inference
|
| 68 |
|
| 69 |
Simply test the effect of model reconstruction:
|
| 70 |
+
```bash
|
| 71 |
python recon.py --ckpt_path path_to_ckpt
|
| 72 |
+
```
|
| 73 |
+
|
| 74 |
+
## Citation
|
| 75 |
+
|
| 76 |
+
```bibtex
|
| 77 |
+
@article{guo2026wintok,
|
| 78 |
+
title={WinTok: A Win-Win Hybrid Tokenizer via Decomposing Visual Understanding and Generation with Transferable Tokens},
|
| 79 |
+
author={Guo, Yiwei and Zhuang, Shaobin and Huang, Zhipeng and Fu, Canmiao and Li, Chen and LYU, Jing and Wang, Yali},
|
| 80 |
+
journal={arXiv preprint arXiv:2605.18115},
|
| 81 |
+
year={2026}
|
| 82 |
+
}
|
| 83 |
```
|