ILSVRC/imagenet-1k
Viewer • Updated • 1.43M • 85k • 807
How to use NCPS/thinkingvit_deit-3h-6h-imagenet1k with timm:
import timm
model = timm.create_model("hf_hub:NCPS/thinkingvit_deit-3h-6h-imagenet1k", pretrained=True)This repository contains the ImageNet-1K EMA weights for ThinkingViT DeiT 3H -> 6H ImageNet-1K from ThinkingViT: Matryoshka Thinking Vision Transformer for Elastic Inference.
state_dict_emamodel.safetensorsimport torch
from timm.models import create_model
# Run from the ThinkingViT repository root, or put this repository on PYTHONPATH.
model = create_model("hf-hub:alihjt/thinkingvit_deit-3h-6h-imagenet1k", pretrained=True)
model.eval()
x = torch.randn(1, 3, 224, 224)
with torch.no_grad():
logits, stage = model(x, threshold=1.0)
print(logits.shape, stage)
This is a custom timm-based architecture. Use the code from the ThinkingViT repository when loading this model.
The entropy threshold controls early exit. Lower thresholds send more samples to the 6-head stage; higher thresholds exit earlier at the 3-head stage.
| Threshold | Acc@1 (%) | GMACs |
|---|---|---|
| 0.0 | 81.440 | 5.850 |
| 0.5 | 81.368 | 3.977 |
| 1.0 | 80.310 | 2.907 |
| 1.6 | 77.292 | 1.944 |
| 10.0 | 73.536 | 1.250 |
Please cite the ThinkingViT paper if you use this model: https://arxiv.org/abs/2507.10800