jina-omni-webgpu / README.md
shreyask's picture
Space README + frontmatter
82b438d verified
metadata
title: Jina v5 Omni WebGPU
emoji: 🎵
colorFrom: indigo
colorTo: blue
sdk: static
pinned: false
license: cc-by-nc-4.0
short_description: Cross-modal search on WebGPU with jina-embeddings-v5-omni
models:
  - jinaai/jina-embeddings-v5-omni-nano
  - onnx-community/jina-embeddings-v5-omni-nano-ONNX
tags:
  - multimodal
  - cross-modal-retrieval
  - webgpu
  - transformers.js
  - onnx
  - jina-embeddings

jina · omni · webgpu

In-browser cross-modal search powered by jinaai/jina-embeddings-v5-omni-nano running entirely on WebGPU via transformers.js v4 + ONNX Runtime Web.

One vector space for text, images, audio, and (eventually) video — a text query ranks image/audio corpus items and vice-versa without re-indexing.

What's inside

  • Model load gate with a precision selector (q4f16 default — 2.14 GB, or fp16 for cleaner numerics at 2.68 GB). All three task-specific ONNX graphs (text / vision / audio) download in parallel with per-bundle progress bars; cached in your browser for subsequent loads.
  • Curated query chips to demo cross-modal retrieval against the seeded corpus.
  • Result cards render the actual asset — image thumbnails inline, audio with a ▶/❚❚ inline play toggle. When audio rarely cracks top-K (v5-omni's text→audio alignment is weaker than text→image), the closest audio match is appended after the top-K with an explainer.
  • Corpus editor — clear the seeded 25-item corpus and add your own text / image / audio. Embedding runs in-browser via the same ONNX session the query uses.

Assets + attribution

The seeded corpus mixes 12 text snippets, 10 instrument photos, and 3 audio clips — all images and audio are sourced from Wikimedia Commons with full inline artist + license + source attribution. Model license is CC BY-NC 4.0 (inherited from the base model — non-commercial use only; contact sales@jina.ai for commercial).

Links