File size: 4,285 Bytes

dbc4ec8
8f6eaf6
 
 
 
 
 
 
 
 
 
 
 
 
 
dbc4ec8
8f6eaf6
dbc4ec8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8f6eaf6
dbc4ec8
 
 
8f6eaf6
dbc4ec8
8f6eaf6
 
 
 
 
 
dbc4ec8
8f6eaf6
dbc4ec8
8f6eaf6
dbc4ec8
 
 
 
 
 
 
 
 
 
8f6eaf6
dbc4ec8
8f6eaf6
dbc4ec8
 
 
 
8f6eaf6
 
 
 
dbc4ec8
8f6eaf6
dbc4ec8
8f6eaf6
 
dbc4ec8
 
8f6eaf6
dbc4ec8
 
8f6eaf6
dbc4ec8
8f6eaf6

---
license: apache-2.0
language:
- en
base_model:
- google/embeddinggemma-300m
pipeline_tag: sentence-similarity
library_name: sentence-transformers
tags:
- mteb
- sentence-transformers
- feature-extraction
- sentence-similarity
- transformers
- pytorch
model-index:
- name: Supertron-embedding-300M
  results:
  - task:
      type: STS
      name: STSBenchmark
    dataset:
      name: MTEB STSBenchmark
      type: mteb/STSBenchmark
      config: default
      split: test
    metrics:
    - type: cos_sim_spearman
      value: 87.1012
  - task:
      type: STS
      name: STS12
    dataset:
      name: MTEB STS12
      type: mteb/STS12
      config: default
      split: test
    metrics:
    - type: cos_sim_spearman
      value: 80.1767
  - task:
      type: STS
      name: BIOSSES
    dataset:
      name: MTEB BIOSSES
      type: mteb/BIOSSES
      config: default
      split: test
    metrics:
    - type: cos_sim_spearman
      value: 82.9778
  - task:
      type: Retrieval
      name: NFCorpus
    dataset:
      name: MTEB NFCorpus
      type: mteb/NFCorpus
      config: default
      split: test
    metrics:
    - type: ndcg_at_10
      value: 37.074
  - task:
      type: Classification
      name: AmazonCounterfactualClassification
    dataset:
      name: MTEB AmazonCounterfactualClassification
      type: mteb/AmazonCounterfactualClassification
      config: default
      split: test
    metrics:
    - type: accuracy
      value: 83.3415625
  - task:
      type: Clustering
      name: TwentyNewsgroupsClustering.v2
    dataset:
      name: MTEB TwentyNewsgroupsClustering.v2
      type: mteb/TwentyNewsgroupsClustering.v2
      config: default
      split: test
    metrics:
    - type: v_measure
      value: 50.01057211780597
---

# Supertron-embedding-300M: High-Efficiency Semantic Representation Model

## Model Description

Supertron-embedding-300M is a high-performance, compact embedding model fine-tuned from the google/embeddinggemma-300m architecture. It is specifically designed to provide state-of-the-art semantic representations for Retrieval-Augmented Generation (RAG), semantic search, and document clustering applications while maintaining a low computational footprint suitable for production environments.

* **Developed by:** Surpem
* **Model Type:** Sentence Transformer
* **Architecture:** Gemma-based Dense Transformer
* **Base Model:** [google/embeddinggemma-300m](https://huggingface.co/google/embeddinggemma-300m)
* **License:** Apache 2.0
* **Language:** English (en)

## Results

Supertron-embedding-300M demonstrates competitive performance across the Massive Text Embedding Benchmark (MTEB). It is particularly effective in Semantic Textual Similarity (STS) tasks, outperforming many larger models in its weight class.

| Task Category | Task Name | Metric | Score |
| :--- | :--- | :--- | :--- |
| Semantic Similarity | STSBenchmark | cos_sim_spearman | 87.10 |
| Semantic Similarity | STS12 | cos_sim_spearman | 80.18 |
| Semantic Similarity | BIOSSES | cos_sim_spearman | 82.98 |
| Retrieval | NFCorpus | NDCG@10 | 37.07 |
| Classification | AmazonCounterfactual | Accuracy | 83.34 |
| Clustering | TwentyNewsgroups | V-Measure | 50.01 |

## Get Started

This model can be easily integrated using the `sentence-transformers` library.

```python
from sentence_transformers import SentenceTransformer

model_id = "surpem/Supertron-embedding-300M"

# Load the model
model = SentenceTransformer(model_id)

# Define target text
sentences = [
    "The financial results exceeded market expectations.",
    "The company reported better than expected quarterly earnings."
]

# Compute embeddings
embeddings = model.encode(sentences)

# Calculate cosine similarity
similarity = model.similarity(embeddings[0], embeddings[1])
print(f"Semantic Similarity: {similarity.item():.4f}")
Training Procedure
Hyperparameters
Precision: bfloat16

Max Sequence Length: 256 tokens

Optimizer: AdamW

Batch Size: 256

Learning Rate: 2e-5

Citation
Code-Snippet
@misc{surpem2026supertron,
      title={Supertron-embedding-300M: High-Efficiency Semantic Representation Model},
      author={Surpem},
      year={2026},
      url={[https://huggingface.co/surpem/Supertron-embedding-300M](https://huggingface.co/surpem/Supertron-embedding-300M)},
}