ThinkingViT-Swin / Swin-S ImageNet-1K

This repository contains the ImageNet-1K EMA weights for ThinkingViT-Swin / Swin-S ImageNet-1K from ThinkingViT: Matryoshka Thinking Vision Transformer for Elastic Inference.

Paper: https://arxiv.org/abs/2507.10800
Code: https://github.com/ds-kiel/ThinkingViT
Project page: https://ds-kiel.github.io/ThinkingViT-project-page/
Exported checkpoint key: state_dict_ema
Weight format: model.safetensors

Usage

import torch
from timm.models import create_model

# Run from the ThinkingViT repository root, or put this repository on PYTHONPATH.
model = create_model("hf-hub:alihjt/thinkingvit-swin-s-imagenet1k", pretrained=True)
model.eval()

x = torch.randn(1, 3, 224, 224)
with torch.no_grad():
    logits, stage = model(x, threshold=1.0)
print(logits.shape, stage)

This is a custom timm-based architecture. Use the code from the ThinkingViT repository when loading this model.

Threshold Behavior

The entropy threshold controls early exit. Lower thresholds send more samples to the full Swin-S round; higher thresholds exit after the reduced-head round.

ImageNet-1K Results

Threshold	Acc@1 (%)	GMACs
0.0	83.516	11.68
0.5	83.386	6.76
1.0	82.124	4.88
1.6	79.746	3.53
5.0	77.990	2.82

Citation

Please cite the ThinkingViT paper if you use this model: https://arxiv.org/abs/2507.10800

Downloads last month: 8

Safetensors

Model size

49.7M params

Tensor type

F32

Dataset used to train NCPS/thinkingvit-swin-s-imagenet1k

Paper for NCPS/thinkingvit-swin-s-imagenet1k

ThinkingViT: Matryoshka Thinking Vision Transformer for Elastic Inference

Paper • 2507.10800 • Published Jul 14, 2025