antofuller nielsr HF Staff commited on
Commit
167b861
·
1 Parent(s): 8510756

Add model card and image-classification metadata (#1)

Browse files

- Add model card and image-classification metadata (055b795647bd7e7e2c19c162b2a9b3e63bcb5387)


Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +66 -3
README.md CHANGED
@@ -1,3 +1,66 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ pipeline_tag: image-classification
4
+ tags:
5
+ - vision
6
+ - vit
7
+ - image-classification
8
+ ---
9
+
10
+ # Thicker and Quicker: A Jumbo Token for Fast Plain Vision Transformers (ICLR 2026)
11
+
12
+ This repository contains the weights for **Jumbo**, a simple and scalable architecture that makes Vision Transformers (ViTs) faster. Jumbo reduces patch token width while increasing global token width through a new "Jumbo" token processed by a shared, wider FFN.
13
+
14
+ - **Paper:** [Thicker and Quicker: A Jumbo Token for Fast Plain Vision Transformers](https://arxiv.org/abs/2502.15021)
15
+ - **GitHub Repository:** [https://github.com/antofuller/jumbo](https://github.com/antofuller/jumbo)
16
+
17
+ ## Model Description
18
+
19
+ ViTs are general and accurate, but often slow. Jumbo addresses this by reducing patch token width while adding a wider Jumbo token processed by its own wider FFN. This approach increases model capacity efficiently: the Jumbo FFN processes only a single token for speed, and its parameters are shared across all layers for memory efficiency. Crucially, Jumbo is attention-only and non-hierarchical, maintaining compatibility with plain ViT methods.
20
+
21
+ ## ImageNet-1K Performance
22
+
23
+ The following accuracies were achieved on ImageNet-1K:
24
+
25
+ | Model | Top-1 Accuracy |
26
+ | :--- | :--- |
27
+ | Jumbo-pico | 69.156% |
28
+ | Jumbo-nano | 74.528% |
29
+ | Jumbo-tiny | 78.366% |
30
+ | Jumbo-small | 82.558% |
31
+ | Jumbo-base | 84.954% |
32
+
33
+ ## Usage
34
+
35
+ For installation and running ImageNet-1K evals, attention visualization, and speed measurement, please follow the instructions in the official repository.
36
+
37
+ ### Installation
38
+ ```bash
39
+ pip install -r requirements.txt
40
+ ```
41
+
42
+ ### Evaluation
43
+ ```bash
44
+ python eval_i1k.py --model_path YOUR_PATH/jumbo_small.pth --model_size small
45
+ ```
46
+
47
+ ### Measuring Speed
48
+ ```bash
49
+ python measure_speed.py --model_size small
50
+ ```
51
+
52
+ ### Visualizing Attention Maps
53
+ ```bash
54
+ python visualize_attn.py --model_path YOUR_PATH/jumbo_small.pth --model_size small --out_dir YOUR_PATH/attn_maps --num_images 50
55
+ ```
56
+
57
+ ## Citation
58
+
59
+ ```bibtex
60
+ @article{fuller2025thicker,
61
+ title={Thicker and Quicker: A Jumbo Token for Fast Plain Vision Transformers},
62
+ author={Fuller, Anthony and Yassin, Yousef and Kyrollos, Daniel G. and Shelhamer, Evan and Green, James R.},
63
+ journal={arXiv preprint arXiv:2502.15021},
64
+ year={2025}
65
+ }
66
+ ```