Add metadata and improve model card

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +23 -16
README.md CHANGED
@@ -1,24 +1,20 @@
 
 
 
 
1
  <div align="center">
2
  <h1>WinTok: A Win-Win Hybrid Tokenizer via Decomposing Visual Understanding and Generation with Transferable Tokens</h1>
3
 
4
- [![arXiv](https://img.shields.io/badge/arXiv-2508.05599-b31b1b.svg)](https://arxiv.org/abs/2605.18115)
5
- [![Github](https://img.shields.io/badge/Github-WeTok-blue)](https://github.com/markywg/WinTok)
6
  [![Hugging Face Model](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-yellow)](https://huggingface.co/markyw/WinTok/tree/main)
7
  </div>
8
 
9
  This project introduces **WinTok**, a concise hybrid visual tokenizer designed to resolve the long-standing conflict between visual understanding and generation. By decoupling semantic and pixel tokens with an asymmetric distillation mechanism, WinTok achieves a win-win across reconstruction, understanding, and generation, surpassing strong baselines with substantially less training data. <br><br>
10
 
11
- > <a href="https://arxiv.org/abs/2605.18115">WinTok: A Win-Win Hybrid Tokenizer via Decomposing Visual Understanding and Generation with Transferable Tokens</a><br>
12
- > [Yiwei Guo](https://scholar.google.com/citations?user=HCAyeJIAAAAJ&hl=zh-CN&oi=ao), [Shaobin Zhuang](https://scholar.google.com/citations?user=PGaDirMAAAAJ&hl=zh-CN&oi=ao), Canmiao Fu, [Zhipeng Huang](https://scholar.google.com/citations?user=_fnuIHUAAAAJ&hl=zh-CN&oi=ao), [Chen Li](https://scholar.google.com/citations?hl=zh-CN&user=WDJL3gYAAAAJ), Jing LYU, [Yali Wang](https://scholar.google.com/citations?hl=zh-CN&user=hD948dkAAAAJ)<br>
13
  > Shenzhen Institutes of Advanced Technology (Chinese Academy of Sciences), WeChat Vision (Tencent Inc.), Shanghai Jiao Tong University<br>
14
- > ```
15
- > @article{guo2026wintok,
16
- > title={WinTok: A Win-Win Hybrid Tokenizer via Decomposing Visual Understanding and Generation with Transferable Tokens},
17
- > author={Guo, Yiwei and Zhuang, Shaobin and Huang, Zhipeng and Fu, Canmiao and Li, Chen and LYU, Jing and Wang, Yali},
18
- > journal={arXiv preprint arXiv:2605.18115},
19
- > year={2026}
20
- > }
21
- > ```
22
 
23
  <p align="center">
24
  <img src="./assets/visualization.jpg" width="90%">
@@ -33,7 +29,7 @@ This project introduces **WinTok**, a concise hybrid visual tokenizer designed t
33
 
34
  ### 🛠️ Installation
35
  - **Dependencies**:
36
- ```
37
  bash env.sh
38
  ```
39
 
@@ -49,7 +45,7 @@ imagenet
49
  ```
50
 
51
  Run the 256×256 resolution evaluation script, change the corresponding path:
52
- ```
53
  bash scripts/eval_tokenizer/eval_metrics_ddp.sh
54
  ```
55
 
@@ -63,7 +59,7 @@ MSCOCO2017
63
  ```
64
 
65
  Run the 256×256 resolution evaluation script, change the corresponding path:
66
- ```
67
  bash scripts/eval_tokenizer/eval_metrics_ddp.sh
68
  ```
69
 
@@ -71,6 +67,17 @@ bash scripts/eval_tokenizer/eval_metrics_ddp.sh
71
  ### Inference
72
 
73
  Simply test the effect of model reconstruction:
74
- ```
75
  python recon.py --ckpt_path path_to_ckpt
 
 
 
 
 
 
 
 
 
 
 
76
  ```
 
1
+ ---
2
+ pipeline_tag: image-feature-extraction
3
+ ---
4
+
5
  <div align="center">
6
  <h1>WinTok: A Win-Win Hybrid Tokenizer via Decomposing Visual Understanding and Generation with Transferable Tokens</h1>
7
 
8
+ [![arXiv](https://img.shields.io/badge/arXiv-2605.18115-b31b1b.svg)](https://arxiv.org/abs/2605.18115)
9
+ [![Github](https://img.shields.io/badge/Github-WinTok-blue)](https://github.com/markywg/WinTok)
10
  [![Hugging Face Model](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-yellow)](https://huggingface.co/markyw/WinTok/tree/main)
11
  </div>
12
 
13
  This project introduces **WinTok**, a concise hybrid visual tokenizer designed to resolve the long-standing conflict between visual understanding and generation. By decoupling semantic and pixel tokens with an asymmetric distillation mechanism, WinTok achieves a win-win across reconstruction, understanding, and generation, surpassing strong baselines with substantially less training data. <br><br>
14
 
15
+ > [WinTok: A Win-Win Hybrid Tokenizer via Decomposing Visual Understanding and Generation with Transferable Tokens](https://huggingface.co/papers/2605.18115)<br>
16
+ > Yiwei Guo, Shaobin Zhuang, Canmiao Fu, Zhipeng Huang, Chen Li, Jing LYU, Yali Wang<br>
17
  > Shenzhen Institutes of Advanced Technology (Chinese Academy of Sciences), WeChat Vision (Tencent Inc.), Shanghai Jiao Tong University<br>
 
 
 
 
 
 
 
 
18
 
19
  <p align="center">
20
  <img src="./assets/visualization.jpg" width="90%">
 
29
 
30
  ### 🛠️ Installation
31
  - **Dependencies**:
32
+ ```bash
33
  bash env.sh
34
  ```
35
 
 
45
  ```
46
 
47
  Run the 256×256 resolution evaluation script, change the corresponding path:
48
+ ```bash
49
  bash scripts/eval_tokenizer/eval_metrics_ddp.sh
50
  ```
51
 
 
59
  ```
60
 
61
  Run the 256×256 resolution evaluation script, change the corresponding path:
62
+ ```bash
63
  bash scripts/eval_tokenizer/eval_metrics_ddp.sh
64
  ```
65
 
 
67
  ### Inference
68
 
69
  Simply test the effect of model reconstruction:
70
+ ```bash
71
  python recon.py --ckpt_path path_to_ckpt
72
+ ```
73
+
74
+ ## Citation
75
+
76
+ ```bibtex
77
+ @article{guo2026wintok,
78
+ title={WinTok: A Win-Win Hybrid Tokenizer via Decomposing Visual Understanding and Generation with Transferable Tokens},
79
+ author={Guo, Yiwei and Zhuang, Shaobin and Huang, Zhipeng and Fu, Canmiao and Li, Chen and LYU, Jing and Wang, Yali},
80
+ journal={arXiv preprint arXiv:2605.18115},
81
+ year={2026}
82
+ }
83
  ```