| ---
|
| license: apache-2.0
|
| ---
|
| ## FDViT: Improve the Hierarchical Architecture of Vision Transformer (ICCV 2023)
|
|
|
| **Yixing Xu, Chao Li, Dong Li, Xiao Sheng, Fan Jiang, Lu Tian, Ashish Sirasao** | [Paper](https://openaccess.thecvf.com/content/ICCV2023/papers/Xu_FDViT_Improve_the_Hierarchical_Architecture_of_Vision_Transformer_ICCV_2023_paper.pdf)
|
|
|
| Advanced Micro Devices, Inc.
|
|
|
| ---
|
|
|
| ## Dependancies
|
|
|
| ```bash
|
| torch == 1.13.1
|
| torchvision == 0.14.1
|
| timm == 0.6.12
|
| einops == 0.6.1
|
| ```
|
|
|
| ## Model performance
|
|
|
| The image classification results of FDViT models on ImageNet dataset are shown in the following table.
|
|
|
| |Model|Parameters (M)|FLOPs(G)|Top-1 Accuracy (%)|
|
| |-|-|-|-|
|
| |FDViT-Ti|4.6|0.6|73.74|
|
| |FDViT-S|21.6|2.8|81.45|
|
| |FDViT-B|68.1|11.9|82.39|
|
|
|
| ## Model Usage
|
|
|
| ```bash
|
| from transformers import AutoModelForImageClassification
|
| import torch
|
|
|
| model = AutoModelForImageClassification.from_pretrained("FDViT_b", trust_remote_code=True)
|
|
|
| model.eval()
|
|
|
| inp = torch.ones(1,3,224,224)
|
| out = model(inp)
|
| ```
|
|
|
| ## Citation
|
|
|
| ```
|
| @inproceedings{xu2023fdvit,
|
| title={FDViT: Improve the Hierarchical Architecture of Vision Transformer},
|
| author={Xu, Yixing and Li, Chao and Li, Dong and Sheng, Xiao and Jiang, Fan and Tian, Lu and Sirasao, Ashish},
|
| booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
|
| pages={5950--5960},
|
| year={2023}
|
| }
|
| ```
|
|
|