---
license: apache-2.0
language:
- en
base_model:
- google/embeddinggemma-300m
pipeline_tag: sentence-similarity
library_name: sentence-transformers
tags:
- mteb
- sentence-transformers
- feature-extraction
- sentence-similarity
- transformers
- pytorch
model-index:
- name: Supertron-embedding-300M
  results:
  - task:
      type: STS
      name: STSBenchmark
    dataset:
      name: MTEB STSBenchmark
      type: mteb/STSBenchmark
      config: default
      split: test
    metrics:
    - type: cos_sim_spearman
      value: 87.1012
  - task:
      type: STS
      name: STS12
    dataset:
      name: MTEB STS12
      type: mteb/STS12
      config: default
      split: test
    metrics:
    - type: cos_sim_spearman
      value: 80.1767
  - task:
      type: STS
      name: BIOSSES
    dataset:
      name: MTEB BIOSSES
      type: mteb/BIOSSES
      config: default
      split: test
    metrics:
    - type: cos_sim_spearman
      value: 82.9778
  - task:
      type: Retrieval
      name: NFCorpus
    dataset:
      name: MTEB NFCorpus
      type: mteb/NFCorpus
      config: default
      split: test
    metrics:
    - type: ndcg_at_10
      value: 37.074
  - task:
      type: Classification
      name: AmazonCounterfactualClassification
    dataset:
      name: MTEB AmazonCounterfactualClassification
      type: mteb/AmazonCounterfactualClassification
      config: default
      split: test
    metrics:
    - type: accuracy
      value: 83.3415625
  - task:
      type: Clustering
      name: TwentyNewsgroupsClustering.v2
    dataset:
      name: MTEB TwentyNewsgroupsClustering.v2
      type: mteb/TwentyNewsgroupsClustering.v2
      config: default
      split: test
    metrics:
    - type: v_measure
      value: 50.01057211780597
---

# Supertron-embedding-300M: High-Efficiency Semantic Representation Model

## Model Description

Supertron-embedding-300M is a high-performance, compact embedding model fine-tuned from the google/embeddinggemma-300m architecture. It is specifically designed to provide state-of-the-art semantic representations for Retrieval-Augmented Generation (RAG), semantic search, and document clustering applications while maintaining a low computational footprint suitable for production environments.

* **Developed by:** Surpem
* **Model Type:** Sentence Transformer
* **Architecture:** Gemma-based Dense Transformer
* **Base Model:** [google/embeddinggemma-300m](https://huggingface.co/google/embeddinggemma-300m)
* **License:** Apache 2.0
* **Language:** English (en)

## Results

Supertron-embedding-300M demonstrates competitive performance across the Massive Text Embedding Benchmark (MTEB). It is particularly effective in Semantic Textual Similarity (STS) tasks, outperforming many larger models in its weight class.

| Task Category | Task Name | Metric | Score |
| :--- | :--- | :--- | :--- |
| Semantic Similarity | STSBenchmark | cos_sim_spearman | 87.10 |
| Semantic Similarity | STS12 | cos_sim_spearman | 80.18 |
| Semantic Similarity | BIOSSES | cos_sim_spearman | 82.98 |
| Retrieval | NFCorpus | NDCG@10 | 37.07 |
| Classification | AmazonCounterfactual | Accuracy | 83.34 |
| Clustering | TwentyNewsgroups | V-Measure | 50.01 |

## Get Started

This model can be easily integrated using the `sentence-transformers` library.

```python
from sentence_transformers import SentenceTransformer

model_id = "surpem/Supertron-embedding-300M"

# Load the model
model = SentenceTransformer(model_id)

# Define target text
sentences = [
    "The financial results exceeded market expectations.",
    "The company reported better than expected quarterly earnings."
]

# Compute embeddings
embeddings = model.encode(sentences)

# Calculate cosine similarity
similarity = model.similarity(embeddings[0], embeddings[1])
print(f"Semantic Similarity: {similarity.item():.4f}")
Training Procedure
Hyperparameters
Precision: bfloat16

Max Sequence Length: 256 tokens

Optimizer: AdamW

Batch Size: 256

Learning Rate: 2e-5

Citation
Code-Snippet
@misc{surpem2026supertron,
      title={Supertron-embedding-300M: High-Efficiency Semantic Representation Model},
      author={Surpem},
      year={2026},
      url={[https://huggingface.co/surpem/Supertron-embedding-300M](https://huggingface.co/surpem/Supertron-embedding-300M)},
}