Sentence Similarity
sentence-transformers
Safetensors
Transformers
PyTorch
English
gemma3_text
feature-extraction
mteb
Eval Results (legacy)
text-embeddings-inference
Instructions to use Surpem/Supertron-embedding-300M with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use Surpem/Supertron-embedding-300M with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("Surpem/Supertron-embedding-300M") sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Transformers
How to use Surpem/Supertron-embedding-300M with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("Surpem/Supertron-embedding-300M") model = AutoModel.from_pretrained("Surpem/Supertron-embedding-300M") - Notebooks
- Google Colab
- Kaggle
File size: 4,285 Bytes
dbc4ec8 8f6eaf6 dbc4ec8 8f6eaf6 dbc4ec8 8f6eaf6 dbc4ec8 8f6eaf6 dbc4ec8 8f6eaf6 dbc4ec8 8f6eaf6 dbc4ec8 8f6eaf6 dbc4ec8 8f6eaf6 dbc4ec8 8f6eaf6 dbc4ec8 8f6eaf6 dbc4ec8 8f6eaf6 dbc4ec8 8f6eaf6 dbc4ec8 8f6eaf6 dbc4ec8 8f6eaf6 dbc4ec8 8f6eaf6 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 | ---
license: apache-2.0
language:
- en
base_model:
- google/embeddinggemma-300m
pipeline_tag: sentence-similarity
library_name: sentence-transformers
tags:
- mteb
- sentence-transformers
- feature-extraction
- sentence-similarity
- transformers
- pytorch
model-index:
- name: Supertron-embedding-300M
results:
- task:
type: STS
name: STSBenchmark
dataset:
name: MTEB STSBenchmark
type: mteb/STSBenchmark
config: default
split: test
metrics:
- type: cos_sim_spearman
value: 87.1012
- task:
type: STS
name: STS12
dataset:
name: MTEB STS12
type: mteb/STS12
config: default
split: test
metrics:
- type: cos_sim_spearman
value: 80.1767
- task:
type: STS
name: BIOSSES
dataset:
name: MTEB BIOSSES
type: mteb/BIOSSES
config: default
split: test
metrics:
- type: cos_sim_spearman
value: 82.9778
- task:
type: Retrieval
name: NFCorpus
dataset:
name: MTEB NFCorpus
type: mteb/NFCorpus
config: default
split: test
metrics:
- type: ndcg_at_10
value: 37.074
- task:
type: Classification
name: AmazonCounterfactualClassification
dataset:
name: MTEB AmazonCounterfactualClassification
type: mteb/AmazonCounterfactualClassification
config: default
split: test
metrics:
- type: accuracy
value: 83.3415625
- task:
type: Clustering
name: TwentyNewsgroupsClustering.v2
dataset:
name: MTEB TwentyNewsgroupsClustering.v2
type: mteb/TwentyNewsgroupsClustering.v2
config: default
split: test
metrics:
- type: v_measure
value: 50.01057211780597
---
# Supertron-embedding-300M: High-Efficiency Semantic Representation Model
## Model Description
Supertron-embedding-300M is a high-performance, compact embedding model fine-tuned from the google/embeddinggemma-300m architecture. It is specifically designed to provide state-of-the-art semantic representations for Retrieval-Augmented Generation (RAG), semantic search, and document clustering applications while maintaining a low computational footprint suitable for production environments.
* **Developed by:** Surpem
* **Model Type:** Sentence Transformer
* **Architecture:** Gemma-based Dense Transformer
* **Base Model:** [google/embeddinggemma-300m](https://huggingface.co/google/embeddinggemma-300m)
* **License:** Apache 2.0
* **Language:** English (en)
## Results
Supertron-embedding-300M demonstrates competitive performance across the Massive Text Embedding Benchmark (MTEB). It is particularly effective in Semantic Textual Similarity (STS) tasks, outperforming many larger models in its weight class.
| Task Category | Task Name | Metric | Score |
| :--- | :--- | :--- | :--- |
| Semantic Similarity | STSBenchmark | cos_sim_spearman | 87.10 |
| Semantic Similarity | STS12 | cos_sim_spearman | 80.18 |
| Semantic Similarity | BIOSSES | cos_sim_spearman | 82.98 |
| Retrieval | NFCorpus | NDCG@10 | 37.07 |
| Classification | AmazonCounterfactual | Accuracy | 83.34 |
| Clustering | TwentyNewsgroups | V-Measure | 50.01 |
## Get Started
This model can be easily integrated using the `sentence-transformers` library.
```python
from sentence_transformers import SentenceTransformer
model_id = "surpem/Supertron-embedding-300M"
# Load the model
model = SentenceTransformer(model_id)
# Define target text
sentences = [
"The financial results exceeded market expectations.",
"The company reported better than expected quarterly earnings."
]
# Compute embeddings
embeddings = model.encode(sentences)
# Calculate cosine similarity
similarity = model.similarity(embeddings[0], embeddings[1])
print(f"Semantic Similarity: {similarity.item():.4f}")
Training Procedure
Hyperparameters
Precision: bfloat16
Max Sequence Length: 256 tokens
Optimizer: AdamW
Batch Size: 256
Learning Rate: 2e-5
Citation
Code-Snippet
@misc{surpem2026supertron,
title={Supertron-embedding-300M: High-Efficiency Semantic Representation Model},
author={Surpem},
year={2026},
url={[https://huggingface.co/surpem/Supertron-embedding-300M](https://huggingface.co/surpem/Supertron-embedding-300M)},
} |