shreyask commited on
Commit
6cdd82e
·
verified ·
1 Parent(s): 1bc1978

Space README + frontmatter

Browse files
Files changed (1) hide show
  1. README.md +38 -5
README.md CHANGED
@@ -1,10 +1,43 @@
1
  ---
2
- title: Jina Omni Webgpu
3
- emoji: 👀
4
- colorFrom: yellow
5
- colorTo: red
6
  sdk: static
7
  pinned: false
 
 
 
 
 
 
 
 
 
 
 
 
8
  ---
9
 
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Jina v5 Omni WebGPU
3
+ emoji: "\U0001F3B5"
4
+ colorFrom: indigo
5
+ colorTo: blue
6
  sdk: static
7
  pinned: false
8
+ license: cc-by-nc-4.0
9
+ short_description: Cross-modal search on WebGPU with jina-embeddings-v5-omni
10
+ models:
11
+ - jinaai/jina-embeddings-v5-omni-nano
12
+ - shreyask/jina-embeddings-v5-omni-nano-ONNX
13
+ tags:
14
+ - multimodal
15
+ - cross-modal-retrieval
16
+ - webgpu
17
+ - transformers.js
18
+ - onnx
19
+ - jina-embeddings
20
  ---
21
 
22
+ # jina · omni · webgpu
23
+
24
+ In-browser cross-modal search powered by [`jinaai/jina-embeddings-v5-omni-nano`](https://huggingface.co/jinaai/jina-embeddings-v5-omni-nano) running entirely on WebGPU via [transformers.js](https://huggingface.co/docs/transformers.js) v4 + ONNX Runtime Web.
25
+
26
+ One vector space for **text, images, audio, and (eventually) video** — a text query ranks image/audio corpus items and vice-versa without re-indexing.
27
+
28
+ ## What's inside
29
+
30
+ - **Model load gate** with a precision selector (`q4f16` default — 2.14 GB, or `fp16` for cleaner numerics at 2.68 GB). All three task-specific ONNX graphs (text / vision / audio) download in parallel with per-bundle progress bars; cached in your browser for subsequent loads.
31
+ - **Curated query chips** to demo cross-modal retrieval against the seeded corpus.
32
+ - **Result cards** render the actual asset — image thumbnails inline, audio with a ▶/❚❚ inline play toggle. When audio rarely cracks top-K (v5-omni's text→audio alignment is weaker than text→image), the closest audio match is appended after the top-K with an explainer.
33
+ - **Corpus editor** — clear the seeded 25-item corpus and add your own text / image / audio. Embedding runs in-browser via the same ONNX session the query uses.
34
+
35
+ ## Assets + attribution
36
+
37
+ The seeded corpus mixes 12 text snippets, 10 instrument photos, and 3 audio clips — all images and audio are sourced from [Wikimedia Commons](https://commons.wikimedia.org/) with full inline artist + license + source attribution. Model license is **CC BY-NC 4.0** (inherited from the base model — non-commercial use only; contact `sales@jina.ai` for commercial).
38
+
39
+ ## Links
40
+
41
+ - Base model: [jinaai/jina-embeddings-v5-omni-nano](https://huggingface.co/jinaai/jina-embeddings-v5-omni-nano)
42
+ - Blog: [Jina embeddings v5 omni](https://jina.ai/news/jina-embeddings-v5-omni-multimodal-embeddings-for-text-image-audio-and-video/)
43
+ - ONNX weights: [shreyask/jina-embeddings-v5-omni-nano-ONNX](https://huggingface.co/shreyask/jina-embeddings-v5-omni-nano-ONNX)