Instructions to use Surpem/Supertron-embedding-300M with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use Surpem/Supertron-embedding-300M with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("Surpem/Supertron-embedding-300M") sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Transformers
How to use Surpem/Supertron-embedding-300M with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("Surpem/Supertron-embedding-300M") model = AutoModel.from_pretrained("Surpem/Supertron-embedding-300M") - Notebooks
- Google Colab
- Kaggle
license: apache-2.0
language:
- en
base_model:
- google/embeddinggemma-300m
pipeline_tag: sentence-similarity
library_name: sentence-transformers
tags:
- mteb
- sentence-transformers
- feature-extraction
- sentence-similarity
- transformers
- pytorch
model-index:
- name: Supertron-embedding-300M
results:
- task:
type: STS
name: STSBenchmark
dataset:
name: MTEB STSBenchmark
type: mteb/STSBenchmark
config: default
split: test
metrics:
- type: cos_sim_spearman
value: 87.1012
- task:
type: STS
name: STS12
dataset:
name: MTEB STS12
type: mteb/STS12
config: default
split: test
metrics:
- type: cos_sim_spearman
value: 80.1767
- task:
type: STS
name: BIOSSES
dataset:
name: MTEB BIOSSES
type: mteb/BIOSSES
config: default
split: test
metrics:
- type: cos_sim_spearman
value: 82.9778
- task:
type: Retrieval
name: NFCorpus
dataset:
name: MTEB NFCorpus
type: mteb/NFCorpus
config: default
split: test
metrics:
- type: ndcg_at_10
value: 37.074
- task:
type: Classification
name: AmazonCounterfactualClassification
dataset:
name: MTEB AmazonCounterfactualClassification
type: mteb/AmazonCounterfactualClassification
config: default
split: test
metrics:
- type: accuracy
value: 83.3415625
- task:
type: Clustering
name: TwentyNewsgroupsClustering.v2
dataset:
name: MTEB TwentyNewsgroupsClustering.v2
type: mteb/TwentyNewsgroupsClustering.v2
config: default
split: test
metrics:
- type: v_measure
value: 50.01057211780597
Supertron-embedding-300M: High-Efficiency Semantic Representation Model
Model Description
Supertron-embedding-300M is a high-performance, compact embedding model fine-tuned from the google/embeddinggemma-300m architecture. It is specifically designed to provide state-of-the-art semantic representations for Retrieval-Augmented Generation (RAG), semantic search, and document clustering applications while maintaining a low computational footprint suitable for production environments.
- Developed by: Surpem
- Model Type: Sentence Transformer
- Architecture: Gemma-based Dense Transformer
- Base Model: google/embeddinggemma-300m
- License: Apache 2.0
- Language: English (en)
Results
Supertron-embedding-300M demonstrates competitive performance across the Massive Text Embedding Benchmark (MTEB). It is particularly effective in Semantic Textual Similarity (STS) tasks, outperforming many larger models in its weight class.
| Task Category | Task Name | Metric | Score |
|---|---|---|---|
| Semantic Similarity | STSBenchmark | cos_sim_spearman | 87.10 |
| Semantic Similarity | STS12 | cos_sim_spearman | 80.18 |
| Semantic Similarity | BIOSSES | cos_sim_spearman | 82.98 |
| Retrieval | NFCorpus | NDCG@10 | 37.07 |
| Classification | AmazonCounterfactual | Accuracy | 83.34 |
| Clustering | TwentyNewsgroups | V-Measure | 50.01 |
Get Started
This model can be easily integrated using the sentence-transformers library.
from sentence_transformers import SentenceTransformer
model_id = "surpem/Supertron-embedding-300M"
# Load the model
model = SentenceTransformer(model_id)
# Define target text
sentences = [
"The financial results exceeded market expectations.",
"The company reported better than expected quarterly earnings."
]
# Compute embeddings
embeddings = model.encode(sentences)
# Calculate cosine similarity
similarity = model.similarity(embeddings[0], embeddings[1])
print(f"Semantic Similarity: {similarity.item():.4f}")
Training Procedure
Hyperparameters
Precision: bfloat16
Max Sequence Length: 256 tokens
Optimizer: AdamW
Batch Size: 256
Learning Rate: 2e-5
Citation
Code-Snippet
@misc{surpem2026supertron,
title={Supertron-embedding-300M: High-Efficiency Semantic Representation Model},
author={Surpem},
year={2026},
url={[https://huggingface.co/surpem/Supertron-embedding-300M](https://huggingface.co/surpem/Supertron-embedding-300M)},
}