Add metadata and improve model card

This PR adds the `image-feature-extraction` pipeline tag to the model card metadata and ensures the README is properly formatted. This will improve the visibility and discoverability of the model on the Hugging Face Hub.

Files changed (1) hide show

README.md +23 -16

README.md CHANGED Viewed

@@ -1,24 +1,20 @@
 <div align="center">
 <h1>WinTok: A Win-Win Hybrid Tokenizer via Decomposing Visual Understanding and Generation with Transferable Tokens</h1>
-[![arXiv](https://img.shields.io/badge/arXiv-2508.05599-b31b1b.svg)](https://arxiv.org/abs/2605.18115)
-[![Github](https://img.shields.io/badge/Github-WeTok-blue)](https://github.com/markywg/WinTok)
 [![Hugging Face Model](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-yellow)](https://huggingface.co/markyw/WinTok/tree/main)
 </div>
 This project introduces **WinTok**, a concise hybrid visual tokenizer designed to resolve the long-standing conflict between visual understanding and generation. By decoupling semantic and pixel tokens with an asymmetric distillation mechanism, WinTok achieves a win-win across reconstruction, understanding, and generation, surpassing strong baselines with substantially less training data. <br><br>
-> <a href="https://arxiv.org/abs/2605.18115">WinTok: A Win-Win Hybrid Tokenizer via Decomposing Visual Understanding and Generation with Transferable Tokens</a><br>
-> [Yiwei Guo](https://scholar.google.com/citations?user=HCAyeJIAAAAJ&hl=zh-CN&oi=ao), [Shaobin Zhuang](https://scholar.google.com/citations?user=PGaDirMAAAAJ&hl=zh-CN&oi=ao), Canmiao Fu, [Zhipeng Huang](https://scholar.google.com/citations?user=_fnuIHUAAAAJ&hl=zh-CN&oi=ao), [Chen Li](https://scholar.google.com/citations?hl=zh-CN&user=WDJL3gYAAAAJ), Jing LYU, [Yali Wang](https://scholar.google.com/citations?hl=zh-CN&user=hD948dkAAAAJ)<br>
 > Shenzhen Institutes of Advanced Technology (Chinese Academy of Sciences), WeChat Vision (Tencent Inc.), Shanghai Jiao Tong University<br>
-> ```
-> @article{guo2026wintok,
->   title={WinTok: A Win-Win Hybrid Tokenizer via Decomposing Visual Understanding and Generation with Transferable Tokens},
->   author={Guo, Yiwei and Zhuang, Shaobin and Huang, Zhipeng and Fu, Canmiao and Li, Chen and LYU, Jing and Wang, Yali},
->   journal={arXiv preprint arXiv:2605.18115},
->   year={2026}
-> }
-> ```
 <p align="center">
   <img src="./assets/visualization.jpg" width="90%">
@@ -33,7 +29,7 @@ This project introduces **WinTok**, a concise hybrid visual tokenizer designed t
 ### 🛠️ Installation
 - **Dependencies**:
-```
 bash env.sh
 ```
@@ -49,7 +45,7 @@ imagenet
 ```
 Run the 256×256 resolution evaluation script, change the corresponding path:
-```
 bash scripts/eval_tokenizer/eval_metrics_ddp.sh
 ```
@@ -63,7 +59,7 @@ MSCOCO2017
 ```
 Run the 256×256 resolution evaluation script, change the corresponding path:
-```
 bash scripts/eval_tokenizer/eval_metrics_ddp.sh
 ```
@@ -71,6 +67,17 @@ bash scripts/eval_tokenizer/eval_metrics_ddp.sh
 ### Inference
 Simply test the effect of model reconstruction:
-```
 python recon.py --ckpt_path path_to_ckpt
 ```

+---
+pipeline_tag: image-feature-extraction
+---
 <div align="center">
 <h1>WinTok: A Win-Win Hybrid Tokenizer via Decomposing Visual Understanding and Generation with Transferable Tokens</h1>
+[![arXiv](https://img.shields.io/badge/arXiv-2605.18115-b31b1b.svg)](https://arxiv.org/abs/2605.18115)
+[![Github](https://img.shields.io/badge/Github-WinTok-blue)](https://github.com/markywg/WinTok)
 [![Hugging Face Model](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-yellow)](https://huggingface.co/markyw/WinTok/tree/main)
 </div>
 This project introduces **WinTok**, a concise hybrid visual tokenizer designed to resolve the long-standing conflict between visual understanding and generation. By decoupling semantic and pixel tokens with an asymmetric distillation mechanism, WinTok achieves a win-win across reconstruction, understanding, and generation, surpassing strong baselines with substantially less training data. <br><br>
+> [WinTok: A Win-Win Hybrid Tokenizer via Decomposing Visual Understanding and Generation with Transferable Tokens](https://huggingface.co/papers/2605.18115)<br>
+> Yiwei Guo, Shaobin Zhuang, Canmiao Fu, Zhipeng Huang, Chen Li, Jing LYU, Yali Wang<br>
 > Shenzhen Institutes of Advanced Technology (Chinese Academy of Sciences), WeChat Vision (Tencent Inc.), Shanghai Jiao Tong University<br>
 <p align="center">
   <img src="./assets/visualization.jpg" width="90%">
 ### 🛠️ Installation
 - **Dependencies**:
+```bash
 bash env.sh
 ```
 ```
 Run the 256×256 resolution evaluation script, change the corresponding path:
+```bash
 bash scripts/eval_tokenizer/eval_metrics_ddp.sh
 ```
 ```
 Run the 256×256 resolution evaluation script, change the corresponding path:
+```bash
 bash scripts/eval_tokenizer/eval_metrics_ddp.sh
 ```
 ### Inference
 Simply test the effect of model reconstruction:
+```bash
 python recon.py --ckpt_path path_to_ckpt
+```
+## Citation
+```bibtex
+@article{guo2026wintok,
+  title={WinTok: A Win-Win Hybrid Tokenizer via Decomposing Visual Understanding and Generation with Transferable Tokens},
+  author={Guo, Yiwei and Zhuang, Shaobin and Huang, Zhipeng and Fu, Canmiao and Li, Chen and LYU, Jing and Wang, Yali},
+  journal={arXiv preprint arXiv:2605.18115},
+  year={2026}
+}
 ```