ThinkingViT-Swin / Swin-S ImageNet-1K

This repository contains the ImageNet-1K EMA weights for ThinkingViT-Swin / Swin-S ImageNet-1K from ThinkingViT: Matryoshka Thinking Vision Transformer for Elastic Inference.

Usage

import torch
from timm.models import create_model

# Run from the ThinkingViT repository root, or put this repository on PYTHONPATH.
model = create_model("hf-hub:alihjt/thinkingvit-swin-s-imagenet1k", pretrained=True)
model.eval()

x = torch.randn(1, 3, 224, 224)
with torch.no_grad():
    logits, stage = model(x, threshold=1.0)
print(logits.shape, stage)

This is a custom timm-based architecture. Use the code from the ThinkingViT repository when loading this model.

Threshold Behavior

The entropy threshold controls early exit. Lower thresholds send more samples to the full Swin-S round; higher thresholds exit after the reduced-head round.

ImageNet-1K Results

Threshold Acc@1 (%) GMACs
0.0 83.516 11.68
0.5 83.386 6.76
1.0 82.124 4.88
1.6 79.746 3.53
5.0 77.990 2.82

Citation

Please cite the ThinkingViT paper if you use this model: https://arxiv.org/abs/2507.10800

Downloads last month
8
Safetensors
Model size
49.7M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train NCPS/thinkingvit-swin-s-imagenet1k

Paper for NCPS/thinkingvit-swin-s-imagenet1k