jina_v5_h256_distilled (Distilled)
Compact multilingual sentence encoder compressed from jinaai/jina-embeddings-v5-text-nano (12x compression).
Model Details
| Property |
Value |
| Base model |
jinaai/jina-embeddings-v5-text-nano |
| Architecture |
eurobert (decoder) |
| Hidden dim |
256 (from 768) |
| Layers |
6 (from 12) |
| Intermediate |
1024 |
| Attention heads |
4 |
| KV heads |
4 |
| Vocab size |
41,778 (from 128,256) |
| Parameters |
~17.0M |
| Model size (FP32) |
64.8MB |
| Compression |
12x |
| Distilled |
Yes |
Quick Start
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("jina_v5_h256_distilled", trust_remote_code=True)
sentences = [
"Hello, how are you?",
"안녕하세요, 잘 지내세요?",
"こんにちは、元気ですか?",
"你好,你好吗?",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
MTEB Evaluation Results
Overall Average: 50.77%
| Task Group |
Average |
| Classification |
56.35% |
| Clustering |
32.67% |
| STS |
63.3% |
Classification
| Task |
Average |
Details |
| AmazonCounterfactualClassification |
65.07% |
de: 68.66%, en: 66.66%, en-ext: 66.57%, ja: 58.39% |
| Banking77Classification |
76.16% |
default: 76.16% |
| ImdbClassification |
67.53% |
default: 67.53% |
| MTOPDomainClassification |
65.06% |
en: 80.9%, es: 75.71%, fr: 70.84%, de: 70.05%, th: 46.93% |
| MassiveIntentClassification |
27.25% |
zh-CN: 67.17%, en: 66.34%, ja: 63.51%, fr: 61.69%, ko: 60.2% |
| MassiveScenarioClassification |
34.0% |
zh-CN: 73.44%, en: 73.16%, de: 70.43%, fr: 68.91%, ja: 68.77% |
| ToxicConversationsClassification |
58.82% |
default: 58.82% |
| TweetSentimentExtractionClassification |
56.93% |
default: 56.93% |
Clustering
| Task |
Average |
Details |
| ArXivHierarchicalClusteringP2P |
51.55% |
default: 51.55% |
| ArXivHierarchicalClusteringS2S |
48.1% |
default: 48.1% |
| BiorxivClusteringP2P.v2 |
17.44% |
default: 17.44% |
| MedrxivClusteringP2P.v2 |
25.15% |
default: 25.15% |
| MedrxivClusteringS2S.v2 |
21.6% |
default: 21.6% |
| StackExchangeClustering.v2 |
44.4% |
default: 44.4% |
| StackExchangeClusteringP2P.v2 |
34.83% |
default: 34.83% |
| TwentyNewsgroupsClustering.v2 |
18.27% |
default: 18.27% |
STS
| Task |
Average |
Details |
| BIOSSES |
59.18% |
default: 59.18% |
| SICK-R |
71.03% |
default: 71.03% |
| STS12 |
64.24% |
default: 64.24% |
| STS13 |
70.9% |
default: 70.9% |
| STS14 |
67.0% |
default: 67.0% |
| STS15 |
75.87% |
default: 75.87% |
| STS17 |
25.43% |
en-en: 71.54%, es-es: 68.16%, ko-ko: 58.07%, ar-ar: 52.76%, fr-en: 19.09% |
| STSBenchmark |
72.71% |
default: 72.71% |
Distillation Impact
| Task |
Before |
After |
Delta |
| AmazonCounterfactualClassification |
61.0% |
65.07% |
+4.07%p |
| ArXivHierarchicalClusteringP2P |
47.44% |
51.55% |
+4.11%p |
| ArXivHierarchicalClusteringS2S |
47.23% |
48.1% |
+0.87%p |
| BIOSSES |
46.49% |
59.18% |
+12.69%p |
| Banking77Classification |
40.63% |
76.16% |
+35.53%p |
| BiorxivClusteringP2P.v2 |
12.75% |
17.44% |
+4.69%p |
| ImdbClassification |
53.68% |
67.53% |
+13.85%p |
| MTOPDomainClassification |
42.82% |
65.06% |
+22.24%p |
| MassiveIntentClassification |
25.56% |
27.25% |
+1.69%p |
| MassiveScenarioClassification |
26.49% |
34.0% |
+7.51%p |
| MedrxivClusteringP2P.v2 |
22.35% |
25.15% |
+2.8%p |
| MedrxivClusteringS2S.v2 |
19.66% |
21.6% |
+1.94%p |
| SICK-R |
51.25% |
71.03% |
+19.78%p |
| STS12 |
32.58% |
64.24% |
+31.66%p |
| STS13 |
40.72% |
70.9% |
+30.18%p |
| STS14 |
40.39% |
67.0% |
+26.61%p |
| STS15 |
54.56% |
75.87% |
+21.31%p |
| STS17 |
21.6% |
25.43% |
+3.83%p |
| STSBenchmark |
33.56% |
72.71% |
+39.15%p |
| StackExchangeClustering.v2 |
38.95% |
44.4% |
+5.45%p |
| StackExchangeClusteringP2P.v2 |
32.89% |
34.83% |
+1.94%p |
| ToxicConversationsClassification |
53.11% |
58.82% |
+5.71%p |
| TweetSentimentExtractionClassification |
37.0% |
56.93% |
+19.93%p |
| TwentyNewsgroupsClustering.v2 |
9.32% |
18.27% |
+8.95%p |
Training
Stage 1: Model Compression
- Teacher:
jinaai/jina-embeddings-v5-text-nano (12L, 768d)
- Compression: Layer pruning + Vocab pruning
- Result: 6L / 256d / 41,778 vocab
Stage 2: Knowledge Distillation
- Method: MSE + Cosine Similarity loss
- Data: MTEB Classification/Clustering/STS task datasets
- Optimizer: AdamW (lr=2e-5, weight_decay=0.01)
- Schedule: Cosine annealing over 3 epochs
License
This model is a derivative of Jina AI's jina-embeddings-v5-text-nano. The original model is provided under CC BY-NC 4.0 license. See jina-embeddings-v5-text-nano for details.
Supported Languages (16)
ko, en, ja, zh, es, fr, de, pt, it, ru, ar, hi, th, vi, id, pl