Ayase Models

Pre-downloaded model weights for Ayase modules that require non-HuggingFace downloads.

These models are hosted here because their original CDNs (dl.fbaipublicfiles.com, openaipublic.azureedge.net, GitHub releases) can be unreliable or blocked on some servers.

Structure

ayase-models/
├── dover/                    # DOVER video quality assessment (ICCV 2023)
│   ├── DOVER.pth             # 229 MB — S-Lab License 1.0
│   └── convnext_tiny_1k_224_ema.pth  # 110 MB — MIT
├── i2v_similarity/           # Image-to-Video similarity metrics
│   ├── ViT-B-32.safetensors  # 338 MB — MIT (OpenAI CLIP)
│   ├── dinov2_vitb14_pretrain.pth  # 331 MB — Apache 2.0 (Meta)
│   └── alex.pth              # 6 KB — BSD-2 (LPIPS)
├── advanced_flow/            # RAFT optical flow (ECCV 2020)
│   ├── raft_large_C_T_SKHT_V2-ff5fadd5.pth  # 21 MB — BSD-3
│   └── raft_small_C_T_V2-01064c6d.pth       # 3.9 MB — BSD-3
├── fast_vqa/                 # FAST-VQA quality assessment (ECCV 2022)
│   ├── FAST_VQA_3D_1_1.pth   # 121 MB — MIT
│   ├── FAST_VQA_B_1_4.pth    # 121 MB — MIT
│   └── FAST_VQA_M_1_4.pth    # 105 MB — MIT
├── aesthetic_scoring/        # LAION aesthetic predictor
│   └── sac+logos+ava1-l14-linearMSE.pth  # 3.5 MB — MIT
├── video_memorability/       # Video memorability estimation
│   └── dinov2_vits14_pretrain.pth  # 84 MB — Apache 2.0 (Meta)
├── spectral/                 # Spectral analysis
│   └── dinov2_vits14_pretrain.pth  # 84 MB — Apache 2.0 (Meta)
├── trajan/                   # Point tracking (CoTracker2)
│   └── cotracker2.pth        # 194 MB — Apache 2.0 (Meta)
├── depth_map_quality/        # Monocular depth estimation
│   ├── dpt_swin2_tiny_256.pt # 164 MB — MIT (Intel ISL MiDaS)
│   └── midas_v21_small_256.pt # 82 MB — MIT (Intel ISL MiDaS)
├── depth_consistency/        # Temporal depth consistency
│   ├── dpt_swin2_tiny_256.pt # 164 MB — MIT
│   └── midas_v21_small_256.pt # 82 MB — MIT
├── motion_smoothness/        # RIFE motion smoothness
│   └── flownet.pkl           # Motion interpolation network
├── brightvq/                 # BrightRate / BrightVQ HDR no-reference quality
│   ├── brightrate_brightvq.pt             # 161 MB — BrightRate regressor
│   ├── CONTRIQUE_checkpoint25.tar         # 107 MB — CONTRIQUE feature extractor
│   ├── frames_modelparameters.mat         # 8 KB — NIQE/HDR stats params
│   ├── ViT-B-32.safetensors              # 338 MB — MIT (OpenAI CLIP)
│   ├── ViT-L-14.safetensors              # 890 MB — MIT (OpenAI CLIP)
│   ├── CLIP-IQA+_learned_prompts-603f3273.pth # 17 KB — CLIP-IQA+ prompts
│   ├── CLIPIQA+_RN50_512-89f5d940.pth     # 309 KB — CLIP-IQA+ RN50 weights
│   └── CLIPIQA+_ViTL14_512-e66488f2.pth   # 463 KB — CLIP-IQA+ ViT-L/14 weights
├── rqvqa/                    # RQ-VQA rich quality-aware VQA (CVPR 2024 NTIRE)
│   ├── LIQE.pt               # 337 MB — LIQE feature extractor
│   └── Swin_b_384_in22k_SlowFast_Fast_LSVQ.pth  # 345 MB — Swin-B + SlowFast backbone
└── song_eval/                # SongEval song aesthetic evaluation
    └── model.safetensors     # 96 MB — Apache 2.0 (ASLP-lab)

Total: ~4.6 GB

Models hosted on HuggingFace Hub (not included here)

These models are downloaded directly via transformers / open_clip and work without re-hosting:

Module	HF Model	License
semantic_alignment	openai/clip-vit-base-patch32	MIT
clip_temporal	openai/clip-vit-base-patch32	MIT
captioning	Salesforce/blip-image-captioning-base	BSD-3
sd_reference	stabilityai/stable-diffusion-xl-base-1.0	CreativeML Open RAIL++-M
action_recognition	MCG-NJU/videomae-large-finetuned-kinetics	CC-BY-NC-4.0
ocr_fidelity	PaddleOCR (self-managed)	Apache 2.0

License

Each subdirectory contains a LICENSE.md with attribution for the respective model weights. All models are redistributed under their original open-source licenses.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support