File size: 4,285 Bytes
dbc4ec8
8f6eaf6
 
 
 
 
 
 
 
 
 
 
 
 
 
dbc4ec8
8f6eaf6
dbc4ec8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8f6eaf6
dbc4ec8
 
 
8f6eaf6
dbc4ec8
8f6eaf6
 
 
 
 
 
dbc4ec8
8f6eaf6
dbc4ec8
8f6eaf6
dbc4ec8
 
 
 
 
 
 
 
 
 
8f6eaf6
dbc4ec8
8f6eaf6
dbc4ec8
 
 
 
8f6eaf6
 
 
 
dbc4ec8
8f6eaf6
dbc4ec8
8f6eaf6
 
dbc4ec8
 
8f6eaf6
dbc4ec8
 
8f6eaf6
dbc4ec8
8f6eaf6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
---
license: apache-2.0
language:
- en
base_model:
- google/embeddinggemma-300m
pipeline_tag: sentence-similarity
library_name: sentence-transformers
tags:
- mteb
- sentence-transformers
- feature-extraction
- sentence-similarity
- transformers
- pytorch
model-index:
- name: Supertron-embedding-300M
  results:
  - task:
      type: STS
      name: STSBenchmark
    dataset:
      name: MTEB STSBenchmark
      type: mteb/STSBenchmark
      config: default
      split: test
    metrics:
    - type: cos_sim_spearman
      value: 87.1012
  - task:
      type: STS
      name: STS12
    dataset:
      name: MTEB STS12
      type: mteb/STS12
      config: default
      split: test
    metrics:
    - type: cos_sim_spearman
      value: 80.1767
  - task:
      type: STS
      name: BIOSSES
    dataset:
      name: MTEB BIOSSES
      type: mteb/BIOSSES
      config: default
      split: test
    metrics:
    - type: cos_sim_spearman
      value: 82.9778
  - task:
      type: Retrieval
      name: NFCorpus
    dataset:
      name: MTEB NFCorpus
      type: mteb/NFCorpus
      config: default
      split: test
    metrics:
    - type: ndcg_at_10
      value: 37.074
  - task:
      type: Classification
      name: AmazonCounterfactualClassification
    dataset:
      name: MTEB AmazonCounterfactualClassification
      type: mteb/AmazonCounterfactualClassification
      config: default
      split: test
    metrics:
    - type: accuracy
      value: 83.3415625
  - task:
      type: Clustering
      name: TwentyNewsgroupsClustering.v2
    dataset:
      name: MTEB TwentyNewsgroupsClustering.v2
      type: mteb/TwentyNewsgroupsClustering.v2
      config: default
      split: test
    metrics:
    - type: v_measure
      value: 50.01057211780597
---

# Supertron-embedding-300M: High-Efficiency Semantic Representation Model

## Model Description

Supertron-embedding-300M is a high-performance, compact embedding model fine-tuned from the google/embeddinggemma-300m architecture. It is specifically designed to provide state-of-the-art semantic representations for Retrieval-Augmented Generation (RAG), semantic search, and document clustering applications while maintaining a low computational footprint suitable for production environments.

* **Developed by:** Surpem
* **Model Type:** Sentence Transformer
* **Architecture:** Gemma-based Dense Transformer
* **Base Model:** [google/embeddinggemma-300m](https://huggingface.co/google/embeddinggemma-300m)
* **License:** Apache 2.0
* **Language:** English (en)

## Results

Supertron-embedding-300M demonstrates competitive performance across the Massive Text Embedding Benchmark (MTEB). It is particularly effective in Semantic Textual Similarity (STS) tasks, outperforming many larger models in its weight class.

| Task Category | Task Name | Metric | Score |
| :--- | :--- | :--- | :--- |
| Semantic Similarity | STSBenchmark | cos_sim_spearman | 87.10 |
| Semantic Similarity | STS12 | cos_sim_spearman | 80.18 |
| Semantic Similarity | BIOSSES | cos_sim_spearman | 82.98 |
| Retrieval | NFCorpus | NDCG@10 | 37.07 |
| Classification | AmazonCounterfactual | Accuracy | 83.34 |
| Clustering | TwentyNewsgroups | V-Measure | 50.01 |

## Get Started

This model can be easily integrated using the `sentence-transformers` library.

```python
from sentence_transformers import SentenceTransformer

model_id = "surpem/Supertron-embedding-300M"

# Load the model
model = SentenceTransformer(model_id)

# Define target text
sentences = [
    "The financial results exceeded market expectations.",
    "The company reported better than expected quarterly earnings."
]

# Compute embeddings
embeddings = model.encode(sentences)

# Calculate cosine similarity
similarity = model.similarity(embeddings[0], embeddings[1])
print(f"Semantic Similarity: {similarity.item():.4f}")
Training Procedure
Hyperparameters
Precision: bfloat16

Max Sequence Length: 256 tokens

Optimizer: AdamW

Batch Size: 256

Learning Rate: 2e-5

Citation
Code-Snippet
@misc{surpem2026supertron,
      title={Supertron-embedding-300M: High-Efficiency Semantic Representation Model},
      author={Surpem},
      year={2026},
      url={[https://huggingface.co/surpem/Supertron-embedding-300M](https://huggingface.co/surpem/Supertron-embedding-300M)},
}