nielsr HF Staff commited on
Commit
07aa866
·
verified ·
1 Parent(s): e29c335

Add metadata and improve model card

Browse files

This PR adds the `image-feature-extraction` pipeline tag to the model card metadata and ensures the README is properly formatted. This will improve the visibility and discoverability of the model on the Hugging Face Hub.

Files changed (1) hide show
  1. README.md +23 -16
README.md CHANGED
@@ -1,24 +1,20 @@
 
 
 
 
1
  <div align="center">
2
  <h1>WinTok: A Win-Win Hybrid Tokenizer via Decomposing Visual Understanding and Generation with Transferable Tokens</h1>
3
 
4
- [![arXiv](https://img.shields.io/badge/arXiv-2508.05599-b31b1b.svg)](https://arxiv.org/abs/2605.18115)
5
- [![Github](https://img.shields.io/badge/Github-WeTok-blue)](https://github.com/markywg/WinTok)
6
  [![Hugging Face Model](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-yellow)](https://huggingface.co/markyw/WinTok/tree/main)
7
  </div>
8
 
9
  This project introduces **WinTok**, a concise hybrid visual tokenizer designed to resolve the long-standing conflict between visual understanding and generation. By decoupling semantic and pixel tokens with an asymmetric distillation mechanism, WinTok achieves a win-win across reconstruction, understanding, and generation, surpassing strong baselines with substantially less training data. <br><br>
10
 
11
- > <a href="https://arxiv.org/abs/2605.18115">WinTok: A Win-Win Hybrid Tokenizer via Decomposing Visual Understanding and Generation with Transferable Tokens</a><br>
12
- > [Yiwei Guo](https://scholar.google.com/citations?user=HCAyeJIAAAAJ&hl=zh-CN&oi=ao), [Shaobin Zhuang](https://scholar.google.com/citations?user=PGaDirMAAAAJ&hl=zh-CN&oi=ao), Canmiao Fu, [Zhipeng Huang](https://scholar.google.com/citations?user=_fnuIHUAAAAJ&hl=zh-CN&oi=ao), [Chen Li](https://scholar.google.com/citations?hl=zh-CN&user=WDJL3gYAAAAJ), Jing LYU, [Yali Wang](https://scholar.google.com/citations?hl=zh-CN&user=hD948dkAAAAJ)<br>
13
  > Shenzhen Institutes of Advanced Technology (Chinese Academy of Sciences), WeChat Vision (Tencent Inc.), Shanghai Jiao Tong University<br>
14
- > ```
15
- > @article{guo2026wintok,
16
- > title={WinTok: A Win-Win Hybrid Tokenizer via Decomposing Visual Understanding and Generation with Transferable Tokens},
17
- > author={Guo, Yiwei and Zhuang, Shaobin and Huang, Zhipeng and Fu, Canmiao and Li, Chen and LYU, Jing and Wang, Yali},
18
- > journal={arXiv preprint arXiv:2605.18115},
19
- > year={2026}
20
- > }
21
- > ```
22
 
23
  <p align="center">
24
  <img src="./assets/visualization.jpg" width="90%">
@@ -33,7 +29,7 @@ This project introduces **WinTok**, a concise hybrid visual tokenizer designed t
33
 
34
  ### 🛠️ Installation
35
  - **Dependencies**:
36
- ```
37
  bash env.sh
38
  ```
39
 
@@ -49,7 +45,7 @@ imagenet
49
  ```
50
 
51
  Run the 256×256 resolution evaluation script, change the corresponding path:
52
- ```
53
  bash scripts/eval_tokenizer/eval_metrics_ddp.sh
54
  ```
55
 
@@ -63,7 +59,7 @@ MSCOCO2017
63
  ```
64
 
65
  Run the 256×256 resolution evaluation script, change the corresponding path:
66
- ```
67
  bash scripts/eval_tokenizer/eval_metrics_ddp.sh
68
  ```
69
 
@@ -71,6 +67,17 @@ bash scripts/eval_tokenizer/eval_metrics_ddp.sh
71
  ### Inference
72
 
73
  Simply test the effect of model reconstruction:
74
- ```
75
  python recon.py --ckpt_path path_to_ckpt
 
 
 
 
 
 
 
 
 
 
 
76
  ```
 
1
+ ---
2
+ pipeline_tag: image-feature-extraction
3
+ ---
4
+
5
  <div align="center">
6
  <h1>WinTok: A Win-Win Hybrid Tokenizer via Decomposing Visual Understanding and Generation with Transferable Tokens</h1>
7
 
8
+ [![arXiv](https://img.shields.io/badge/arXiv-2605.18115-b31b1b.svg)](https://arxiv.org/abs/2605.18115)
9
+ [![Github](https://img.shields.io/badge/Github-WinTok-blue)](https://github.com/markywg/WinTok)
10
  [![Hugging Face Model](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-yellow)](https://huggingface.co/markyw/WinTok/tree/main)
11
  </div>
12
 
13
  This project introduces **WinTok**, a concise hybrid visual tokenizer designed to resolve the long-standing conflict between visual understanding and generation. By decoupling semantic and pixel tokens with an asymmetric distillation mechanism, WinTok achieves a win-win across reconstruction, understanding, and generation, surpassing strong baselines with substantially less training data. <br><br>
14
 
15
+ > [WinTok: A Win-Win Hybrid Tokenizer via Decomposing Visual Understanding and Generation with Transferable Tokens](https://huggingface.co/papers/2605.18115)<br>
16
+ > Yiwei Guo, Shaobin Zhuang, Canmiao Fu, Zhipeng Huang, Chen Li, Jing LYU, Yali Wang<br>
17
  > Shenzhen Institutes of Advanced Technology (Chinese Academy of Sciences), WeChat Vision (Tencent Inc.), Shanghai Jiao Tong University<br>
 
 
 
 
 
 
 
 
18
 
19
  <p align="center">
20
  <img src="./assets/visualization.jpg" width="90%">
 
29
 
30
  ### 🛠️ Installation
31
  - **Dependencies**:
32
+ ```bash
33
  bash env.sh
34
  ```
35
 
 
45
  ```
46
 
47
  Run the 256×256 resolution evaluation script, change the corresponding path:
48
+ ```bash
49
  bash scripts/eval_tokenizer/eval_metrics_ddp.sh
50
  ```
51
 
 
59
  ```
60
 
61
  Run the 256×256 resolution evaluation script, change the corresponding path:
62
+ ```bash
63
  bash scripts/eval_tokenizer/eval_metrics_ddp.sh
64
  ```
65
 
 
67
  ### Inference
68
 
69
  Simply test the effect of model reconstruction:
70
+ ```bash
71
  python recon.py --ckpt_path path_to_ckpt
72
+ ```
73
+
74
+ ## Citation
75
+
76
+ ```bibtex
77
+ @article{guo2026wintok,
78
+ title={WinTok: A Win-Win Hybrid Tokenizer via Decomposing Visual Understanding and Generation with Transferable Tokens},
79
+ author={Guo, Yiwei and Zhuang, Shaobin and Huang, Zhipeng and Fu, Canmiao and Li, Chen and LYU, Jing and Wang, Yali},
80
+ journal={arXiv preprint arXiv:2605.18115},
81
+ year={2026}
82
+ }
83
  ```