Ayase Models
Pre-downloaded model weights for Ayase modules that require non-HuggingFace downloads.
These models are hosted here because their original CDNs (dl.fbaipublicfiles.com, openaipublic.azureedge.net, GitHub releases) can be unreliable or blocked on some servers.
Structure
ayase-models/
βββ dover/ # DOVER video quality assessment (ICCV 2023)
β βββ DOVER.pth # 229 MB β S-Lab License 1.0
β βββ convnext_tiny_1k_224_ema.pth # 110 MB β MIT
βββ i2v_similarity/ # Image-to-Video similarity metrics
β βββ ViT-B-32.safetensors # 338 MB β MIT (OpenAI CLIP)
β βββ dinov2_vitb14_pretrain.pth # 331 MB β Apache 2.0 (Meta)
β βββ alex.pth # 6 KB β BSD-2 (LPIPS)
βββ advanced_flow/ # RAFT optical flow (ECCV 2020)
β βββ raft_large_C_T_SKHT_V2-ff5fadd5.pth # 21 MB β BSD-3
β βββ raft_small_C_T_V2-01064c6d.pth # 3.9 MB β BSD-3
βββ fast_vqa/ # FAST-VQA quality assessment (ECCV 2022)
β βββ FAST_VQA_3D_1_1.pth # 121 MB β MIT
β βββ FAST_VQA_B_1_4.pth # 121 MB β MIT
β βββ FAST_VQA_M_1_4.pth # 105 MB β MIT
βββ aesthetic_scoring/ # LAION aesthetic predictor
β βββ sac+logos+ava1-l14-linearMSE.pth # 3.5 MB β MIT
βββ video_memorability/ # Video memorability estimation
β βββ dinov2_vits14_pretrain.pth # 84 MB β Apache 2.0 (Meta)
βββ spectral/ # Spectral analysis
β βββ dinov2_vits14_pretrain.pth # 84 MB β Apache 2.0 (Meta)
βββ trajan/ # Point tracking (CoTracker2)
β βββ cotracker2.pth # 194 MB β Apache 2.0 (Meta)
βββ depth_map_quality/ # Monocular depth estimation
β βββ dpt_swin2_tiny_256.pt # 164 MB β MIT (Intel ISL MiDaS)
β βββ midas_v21_small_256.pt # 82 MB β MIT (Intel ISL MiDaS)
βββ depth_consistency/ # Temporal depth consistency
β βββ dpt_swin2_tiny_256.pt # 164 MB β MIT
β βββ midas_v21_small_256.pt # 82 MB β MIT
βββ motion_smoothness/ # RIFE motion smoothness
β βββ flownet.pkl # Motion interpolation network
βββ brightvq/ # BrightRate / BrightVQ HDR no-reference quality
β βββ brightrate_brightvq.pt # 161 MB β BrightRate regressor
β βββ CONTRIQUE_checkpoint25.tar # 107 MB β CONTRIQUE feature extractor
β βββ frames_modelparameters.mat # 8 KB β NIQE/HDR stats params
β βββ ViT-B-32.safetensors # 338 MB β MIT (OpenAI CLIP)
β βββ ViT-L-14.safetensors # 890 MB β MIT (OpenAI CLIP)
β βββ CLIP-IQA+_learned_prompts-603f3273.pth # 17 KB β CLIP-IQA+ prompts
β βββ CLIPIQA+_RN50_512-89f5d940.pth # 309 KB β CLIP-IQA+ RN50 weights
β βββ CLIPIQA+_ViTL14_512-e66488f2.pth # 463 KB β CLIP-IQA+ ViT-L/14 weights
βββ rqvqa/ # RQ-VQA rich quality-aware VQA (CVPR 2024 NTIRE)
β βββ LIQE.pt # 337 MB β LIQE feature extractor
β βββ Swin_b_384_in22k_SlowFast_Fast_LSVQ.pth # 345 MB β Swin-B + SlowFast backbone
βββ song_eval/ # SongEval song aesthetic evaluation
βββ model.safetensors # 96 MB β Apache 2.0 (ASLP-lab)
Total: ~4.6 GB
Models hosted on HuggingFace Hub (not included here)
These models are downloaded directly via transformers / open_clip and work without re-hosting:
| Module | HF Model | License |
|---|---|---|
| semantic_alignment | openai/clip-vit-base-patch32 | MIT |
| clip_temporal | openai/clip-vit-base-patch32 | MIT |
| captioning | Salesforce/blip-image-captioning-base | BSD-3 |
| sd_reference | stabilityai/stable-diffusion-xl-base-1.0 | CreativeML Open RAIL++-M |
| action_recognition | MCG-NJU/videomae-large-finetuned-kinetics | CC-BY-NC-4.0 |
| ocr_fidelity | PaddleOCR (self-managed) | Apache 2.0 |
License
Each subdirectory contains a LICENSE.md with attribution for the respective model weights. All models are redistributed under their original open-source licenses.
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support