Chinese-CLIP ViT-B/16 (ONNX)

Pre-converted ONNX models for OFA-Sys/chinese-clip-vit-base-patch16.

Files

File Description
cn_clip_vision.onnx Vision encoder (ViT-B/16 + visual projection), ~330 MB
cn_clip_text.onnx Text encoder (BERT + text projection), ~391 MB

Usage

import onnxruntime as ort
from huggingface_hub import hf_hub_download

for f in ("cn_clip_vision.onnx", "cn_clip_text.onnx"):
    hf_hub_download("felixdu/chinese-clip-vit-base-patch16-onnx", f, local_dir="cn-clip-b16-onnx")

vis_session = ort.InferenceSession("cn-clip-b16-onnx/cn_clip_vision.onnx")
txt_session = ort.InferenceSession("cn-clip-b16-onnx/cn_clip_text.onnx")

Details

  • Base model: OFA-Sys/chinese-clip-vit-base-patch16
  • ONNX opset: 18
  • Image input: 224x224 RGB, normalized with CLIP mean/std
  • Text tokenizer: BertTokenizerFast from OFA-Sys/chinese-clip-vit-base-patch16 (max_length=52)
  • Output dim: 512 (L2-normalized)
  • Verified: PyTorch vs ONNX cosine similarity = 1.000000 for both vision and text
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for felixdu/chinese-clip-vit-base-patch16-onnx

Quantized
(3)
this model