jina_v5_h256_distilled (Distilled)

Compact multilingual sentence encoder compressed from jinaai/jina-embeddings-v5-text-nano (12x compression).

Model Details

Property Value
Base model jinaai/jina-embeddings-v5-text-nano
Architecture eurobert (decoder)
Hidden dim 256 (from 768)
Layers 6 (from 12)
Intermediate 1024
Attention heads 4
KV heads 4
Vocab size 41,778 (from 128,256)
Parameters ~17.0M
Model size (FP32) 64.8MB
Compression 12x
Distilled Yes

Quick Start

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("jina_v5_h256_distilled", trust_remote_code=True)

sentences = [
    "Hello, how are you?",
    "안녕하세요, 잘 지내세요?",
    "こんにちは、元気ですか?",
    "你好,你好吗?",
]

embeddings = model.encode(sentences)
print(embeddings.shape)  # (4, 256)

MTEB Evaluation Results

Overall Average: 50.77%

Task Group Average
Classification 56.35%
Clustering 32.67%
STS 63.3%

Classification

Task Average Details
AmazonCounterfactualClassification 65.07% de: 68.66%, en: 66.66%, en-ext: 66.57%, ja: 58.39%
Banking77Classification 76.16% default: 76.16%
ImdbClassification 67.53% default: 67.53%
MTOPDomainClassification 65.06% en: 80.9%, es: 75.71%, fr: 70.84%, de: 70.05%, th: 46.93%
MassiveIntentClassification 27.25% zh-CN: 67.17%, en: 66.34%, ja: 63.51%, fr: 61.69%, ko: 60.2%
MassiveScenarioClassification 34.0% zh-CN: 73.44%, en: 73.16%, de: 70.43%, fr: 68.91%, ja: 68.77%
ToxicConversationsClassification 58.82% default: 58.82%
TweetSentimentExtractionClassification 56.93% default: 56.93%

Clustering

Task Average Details
ArXivHierarchicalClusteringP2P 51.55% default: 51.55%
ArXivHierarchicalClusteringS2S 48.1% default: 48.1%
BiorxivClusteringP2P.v2 17.44% default: 17.44%
MedrxivClusteringP2P.v2 25.15% default: 25.15%
MedrxivClusteringS2S.v2 21.6% default: 21.6%
StackExchangeClustering.v2 44.4% default: 44.4%
StackExchangeClusteringP2P.v2 34.83% default: 34.83%
TwentyNewsgroupsClustering.v2 18.27% default: 18.27%

STS

Task Average Details
BIOSSES 59.18% default: 59.18%
SICK-R 71.03% default: 71.03%
STS12 64.24% default: 64.24%
STS13 70.9% default: 70.9%
STS14 67.0% default: 67.0%
STS15 75.87% default: 75.87%
STS17 25.43% en-en: 71.54%, es-es: 68.16%, ko-ko: 58.07%, ar-ar: 52.76%, fr-en: 19.09%
STSBenchmark 72.71% default: 72.71%

Distillation Impact

Task Before After Delta
AmazonCounterfactualClassification 61.0% 65.07% +4.07%p
ArXivHierarchicalClusteringP2P 47.44% 51.55% +4.11%p
ArXivHierarchicalClusteringS2S 47.23% 48.1% +0.87%p
BIOSSES 46.49% 59.18% +12.69%p
Banking77Classification 40.63% 76.16% +35.53%p
BiorxivClusteringP2P.v2 12.75% 17.44% +4.69%p
ImdbClassification 53.68% 67.53% +13.85%p
MTOPDomainClassification 42.82% 65.06% +22.24%p
MassiveIntentClassification 25.56% 27.25% +1.69%p
MassiveScenarioClassification 26.49% 34.0% +7.51%p
MedrxivClusteringP2P.v2 22.35% 25.15% +2.8%p
MedrxivClusteringS2S.v2 19.66% 21.6% +1.94%p
SICK-R 51.25% 71.03% +19.78%p
STS12 32.58% 64.24% +31.66%p
STS13 40.72% 70.9% +30.18%p
STS14 40.39% 67.0% +26.61%p
STS15 54.56% 75.87% +21.31%p
STS17 21.6% 25.43% +3.83%p
STSBenchmark 33.56% 72.71% +39.15%p
StackExchangeClustering.v2 38.95% 44.4% +5.45%p
StackExchangeClusteringP2P.v2 32.89% 34.83% +1.94%p
ToxicConversationsClassification 53.11% 58.82% +5.71%p
TweetSentimentExtractionClassification 37.0% 56.93% +19.93%p
TwentyNewsgroupsClustering.v2 9.32% 18.27% +8.95%p

Training

Stage 1: Model Compression

  • Teacher: jinaai/jina-embeddings-v5-text-nano (12L, 768d)
  • Compression: Layer pruning + Vocab pruning
  • Result: 6L / 256d / 41,778 vocab

Stage 2: Knowledge Distillation

  • Method: MSE + Cosine Similarity loss
  • Data: MTEB Classification/Clustering/STS task datasets
  • Optimizer: AdamW (lr=2e-5, weight_decay=0.01)
  • Schedule: Cosine annealing over 3 epochs

License

This model is a derivative of Jina AI's jina-embeddings-v5-text-nano. The original model is provided under CC BY-NC 4.0 license. See jina-embeddings-v5-text-nano for details.

Supported Languages (16)

ko, en, ja, zh, es, fr, de, pt, it, ru, ar, hi, th, vi, id, pl

Downloads last month
35
Safetensors
Model size
17M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support