Add model card and image-classification metadata (#1)

- Add model card and image-classification metadata (055b795647bd7e7e2c19c162b2a9b3e63bcb5387)

Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>

Files changed (1) hide show

README.md +66 -3

README.md CHANGED Viewed

@@ -1,3 +1,66 @@
----
-license: mit
----

+---
+license: mit
+pipeline_tag: image-classification
+tags:
+- vision
+- vit
+- image-classification
+---
+# Thicker and Quicker: A Jumbo Token for Fast Plain Vision Transformers (ICLR 2026)
+This repository contains the weights for **Jumbo**, a simple and scalable architecture that makes Vision Transformers (ViTs) faster. Jumbo reduces patch token width while increasing global token width through a new "Jumbo" token processed by a shared, wider FFN.
+- **Paper:** [Thicker and Quicker: A Jumbo Token for Fast Plain Vision Transformers](https://arxiv.org/abs/2502.15021)
+- **GitHub Repository:** [https://github.com/antofuller/jumbo](https://github.com/antofuller/jumbo)
+## Model Description
+ViTs are general and accurate, but often slow. Jumbo addresses this by reducing patch token width while adding a wider Jumbo token processed by its own wider FFN. This approach increases model capacity efficiently: the Jumbo FFN processes only a single token for speed, and its parameters are shared across all layers for memory efficiency. Crucially, Jumbo is attention-only and non-hierarchical, maintaining compatibility with plain ViT methods.
+## ImageNet-1K Performance
+The following accuracies were achieved on ImageNet-1K:
+| Model | Top-1 Accuracy |
+| :--- | :--- |
+| Jumbo-pico | 69.156% |
+| Jumbo-nano | 74.528% |
+| Jumbo-tiny | 78.366% |
+| Jumbo-small | 82.558% |
+| Jumbo-base | 84.954% |
+## Usage
+For installation and running ImageNet-1K evals, attention visualization, and speed measurement, please follow the instructions in the official repository.
+### Installation
+```bash
+pip install -r requirements.txt
+```
+### Evaluation
+```bash
+python eval_i1k.py --model_path YOUR_PATH/jumbo_small.pth --model_size small
+```
+### Measuring Speed
+```bash
+python measure_speed.py --model_size small
+```
+### Visualizing Attention Maps
+```bash
+python visualize_attn.py --model_path YOUR_PATH/jumbo_small.pth --model_size small --out_dir YOUR_PATH/attn_maps --num_images 50
+```
+## Citation
+```bibtex
+@article{fuller2025thicker,
+  title={Thicker and Quicker: A Jumbo Token for Fast Plain Vision Transformers},
+  author={Fuller, Anthony and Yassin, Yousef and Kyrollos, Daniel G. and Shelhamer, Evan and Green, James R.},
+  journal={arXiv preprint arXiv:2502.15021},
+  year={2025}
+}
+```