Chinese-CLIP ViT-B/16 (ONNX)
Pre-converted ONNX models for OFA-Sys/chinese-clip-vit-base-patch16.
Files
| File | Description |
|---|---|
| cn_clip_vision.onnx | Vision encoder (ViT-B/16 + visual projection), ~330 MB |
| cn_clip_text.onnx | Text encoder (BERT + text projection), ~391 MB |
Usage
import onnxruntime as ort
from huggingface_hub import hf_hub_download
for f in ("cn_clip_vision.onnx", "cn_clip_text.onnx"):
hf_hub_download("felixdu/chinese-clip-vit-base-patch16-onnx", f, local_dir="cn-clip-b16-onnx")
vis_session = ort.InferenceSession("cn-clip-b16-onnx/cn_clip_vision.onnx")
txt_session = ort.InferenceSession("cn-clip-b16-onnx/cn_clip_text.onnx")
Details
- Base model: OFA-Sys/chinese-clip-vit-base-patch16
- ONNX opset: 18
- Image input: 224x224 RGB, normalized with CLIP mean/std
- Text tokenizer: BertTokenizerFast from OFA-Sys/chinese-clip-vit-base-patch16 (max_length=52)
- Output dim: 512 (L2-normalized)
- Verified: PyTorch vs ONNX cosine similarity = 1.000000 for both vision and text
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for felixdu/chinese-clip-vit-base-patch16-onnx
Base model
OFA-Sys/chinese-clip-vit-base-patch16