ThinkingViT DeiT 3H -> 6H ImageNet-1K

This repository contains the ImageNet-1K EMA weights for ThinkingViT DeiT 3H -> 6H ImageNet-1K from ThinkingViT: Matryoshka Thinking Vision Transformer for Elastic Inference.

Usage

import torch
from timm.models import create_model

# Run from the ThinkingViT repository root, or put this repository on PYTHONPATH.
model = create_model("hf-hub:alihjt/thinkingvit_deit-3h-6h-imagenet1k", pretrained=True)
model.eval()

x = torch.randn(1, 3, 224, 224)
with torch.no_grad():
    logits, stage = model(x, threshold=1.0)
print(logits.shape, stage)

This is a custom timm-based architecture. Use the code from the ThinkingViT repository when loading this model.

Threshold Behavior

The entropy threshold controls early exit. Lower thresholds send more samples to the 6-head stage; higher thresholds exit earlier at the 3-head stage.

ImageNet-1K Results

Threshold Acc@1 (%) GMACs
0.0 81.440 5.850
0.5 81.368 3.977
1.0 80.310 2.907
1.6 77.292 1.944
10.0 73.536 1.250

Citation

Please cite the ThinkingViT paper if you use this model: https://arxiv.org/abs/2507.10800

Downloads last month
34
Safetensors
Model size
22.1M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train NCPS/thinkingvit_deit-3h-6h-imagenet1k

Paper for NCPS/thinkingvit_deit-3h-6h-imagenet1k