Surpem
/

Supertron-embedding-300M

+---
+model-index:
+- name: Gemma-Embedding-300m-Finetuned
+  results:
+  - task:
+      type: STS
+      name: STSBenchmark
+    dataset:
+      name: MTEB STSBenchmark
+      type: mteb/STSBenchmark
+      config: default
+      split: test
+    metrics:
+    - type: cos_sim_spearman
+      value: 87.1012
+  - task:
+      type: STS
+      name: STS12
+    dataset:
+      name: MTEB STS12
+      type: mteb/STS12
+      config: default
+      split: test
+    metrics:
+    - type: cos_sim_spearman
+      value: 80.1767
+  - task:
+      type: STS
+      name: BIOSSES
+    dataset:
+      name: MTEB BIOSSES
+      type: mteb/BIOSSES
+      config: default
+      split: test
+    metrics:
+    - type: cos_sim_spearman
+      value: 82.9778
+  - task:
+      type: Retrieval
+      name: NFCorpus
+    dataset:
+      name: MTEB NFCorpus
+      type: mteb/NFCorpus
+      config: default
+      split: test
+    metrics:
+    - type: ndcg_at_10
+      value: 37.074
+  - task:
+      type: Classification
+      name: AmazonCounterfactualClassification
+    dataset:
+      name: MTEB AmazonCounterfactualClassification
+      type: mteb/AmazonCounterfactualClassification
+      config: default
+      split: test
+    metrics:
+    - type: accuracy
+      value: 83.3415625
+  - task:
+      type: Clustering
+      name: TwentyNewsgroupsClustering.v2
+    dataset:
+      name: MTEB TwentyNewsgroupsClustering.v2
+      type: mteb/TwentyNewsgroupsClustering.v2
+      config: default
+      split: test
+    metrics:
+    - type: v_measure
+      value: 50.01057211780597
+---
+# Gemma-Embedding-300m-Finetuned
+## Model Description
+This model is a fine-tuned version of the google/embeddinggemma-300m architecture. It has been optimized for semantic textual similarity (STS), retrieval, and classification tasks. The model represents a high-efficiency solution for embedding generation, providing a favorable balance between computational overhead and semantic accuracy.
+- **Base Model:** google/embeddinggemma-300m
+- **Maximum Sequence Length:** 256 tokens
+- **Output Dimensionality:** 1024
+- **Language:** English
+## Evaluation Results
+The model has been benchmarked using the Massive Text Embedding Benchmark (MTEB). The following table summarizes its performance across various task categories:
+| Task Category | Task Name | Metric | Score |
+| :--- | :--- | :--- | :--- |
+| Semantic Similarity | STSBenchmark | cos_sim_spearman | 87.10 |
+| Semantic Similarity | STS12 | cos_sim_spearman | 80.18 |
+| Semantic Similarity | BIOSSES | cos_sim_spearman | 82.98 |
+| Retrieval | NFCorpus | NDCG@10 | 37.07 |
+| Classification | AmazonCounterfactual | Accuracy | 83.34 |
+| Clustering | TwentyNewsgroups | V-Measure | 50.01 |
+## Usage
+### Sentence-Transformers
+The model can be implemented directly using the `sentence-transformers` library:
+```python
+from sentence_transformers import SentenceTransformer
+# Load the model from the Hugging Face Hub
+model = SentenceTransformer("your-username/Gemma-Embedding-300m-Finetuned")
+# Define input text
+sentences = [
+    "The atmospheric conditions are favorable for flight.",
+    "The weather is good for flying today."
+]
+# Generate embeddings
+embeddings = model.encode(sentences)
+# Calculate semantic similarity
+similarity = model.similarity(embeddings[0], embeddings[1])
+print(similarity)