raykuo188/vlm-ssm-vision-encoders-checkpoints

Released inference checkpoints for the paper Do VLMs Need Vision Transformers? Evaluating State Space Models as Vision Encoders.

Paper: Do VLMs Need Vision Transformers? Evaluating State Space Models as Vision Encoders

These artifacts are intended to be used with the public inference code release in vlm-ssm-vision-encoders.

Each artifact contains exactly:

  • config.json
  • checkpoints/latest-checkpoint.pt

Usage remains subject to the licenses and terms of the underlying pretrained components, including the Vicuna base model and the released vision backbones.

Released Artifacts

Public ID Family Task Artifact Size (GiB) SHA256
vit-s-in1k-224 vit classification vit-s-in1k-224.tar 25.17 9359fc2e2bcd3a5afe5b801fcf99b964dad8becc398eac79ac118c9334340505
maxvit-t-in1k-224-s3 maxvit classification maxvit-t-in1k-224-s3.tar 25.17 2b844750143b028f90bce4d96f00696713c15c0e332ccc7dac770fa485869f4e
mambavision-b-in1k-224-s3 mambavision classification mambavision-b-in1k-224-s3.tar 25.17 c8cf2e870dbda8bd6d45b6388c4bc71e9ee8296b28cdf44c843d9e5916eeeeeb
vmamba-s-in1k-224-s3 vmamba classification vmamba-s-in1k-224-s3.tar 25.17 0b3febfb685975ea8b9e81ab3e9f2f9637b67b8cca92f52a7899ba5a1130108f
vitdet-b-coco-1024 vitdet detection vitdet-b-coco-1024.tar 25.18 15d4c2bc08c44c9fe731ca0fbdf7449833782fcc1daace1866765dd753fec1de
vit-adapter-deit-b-ade20k-512 vit_adapter segmentation vit-adapter-deit-b-ade20k-512.tar 25.18 d0cbd0a1d698496bbed4f38ca05e50e8a0f2a07e265157b7053beb4c9f64a73f
vmamba-s-coco-1333x800 vmamba detection vmamba-s-coco-1333x800.tar 25.17 e9629e577275c042b0a9c48e666c9af81fef7291b7d23ed2b47ca03b4b0fa118
vmamba-s-ade20k-512 vmamba segmentation vmamba-s-ade20k-512.tar 25.17 bb3eef2ba5f7b9abd2cdd0d459e88c06d05ab75d8250843a7c51c819a509b188

Training and evaluation code will be released separately.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for raykuo188/vlm-ssm-vision-encoders-checkpoints