vincent-benton commited on
Commit
8f6eaf6
·
verified ·
1 Parent(s): 27ab649

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +57 -21
README.md CHANGED
@@ -1,6 +1,20 @@
1
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  model-index:
3
- - name: Gemma-Embedding-300m-Finetuned
4
  results:
5
  - task:
6
  type: STS
@@ -70,20 +84,22 @@ model-index:
70
  value: 50.01057211780597
71
  ---
72
 
73
- # Gemma-Embedding-300m-Finetuned
74
 
75
  ## Model Description
76
 
77
- This model is a fine-tuned version of the google/embeddinggemma-300m architecture. It has been optimized for semantic textual similarity (STS), retrieval, and classification tasks. The model represents a high-efficiency solution for embedding generation, providing a favorable balance between computational overhead and semantic accuracy.
78
 
79
- - **Base Model:** google/embeddinggemma-300m
80
- - **Maximum Sequence Length:** 256 tokens
81
- - **Output Dimensionality:** 1024
82
- - **Language:** English
 
 
83
 
84
- ## Evaluation Results
85
 
86
- The model has been benchmarked using the Massive Text Embedding Benchmark (MTEB). The following table summarizes its performance across various task categories:
87
 
88
  | Task Category | Task Name | Metric | Score |
89
  | :--- | :--- | :--- | :--- |
@@ -94,27 +110,47 @@ The model has been benchmarked using the Massive Text Embedding Benchmark (MTEB)
94
  | Classification | AmazonCounterfactual | Accuracy | 83.34 |
95
  | Clustering | TwentyNewsgroups | V-Measure | 50.01 |
96
 
97
- ## Usage
98
 
99
- ### Sentence-Transformers
100
-
101
- The model can be implemented directly using the `sentence-transformers` library:
102
 
103
  ```python
104
  from sentence_transformers import SentenceTransformer
105
 
106
- # Load the model from the Hugging Face Hub
107
- model = SentenceTransformer("your-username/Gemma-Embedding-300m-Finetuned")
 
 
108
 
109
- # Define input text
110
  sentences = [
111
- "The atmospheric conditions are favorable for flight.",
112
- "The weather is good for flying today."
113
  ]
114
 
115
- # Generate embeddings
116
  embeddings = model.encode(sentences)
117
 
118
- # Calculate semantic similarity
119
  similarity = model.similarity(embeddings[0], embeddings[1])
120
- print(similarity)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ base_model:
6
+ - google/embeddinggemma-300m
7
+ pipeline_tag: sentence-similarity
8
+ library_name: sentence-transformers
9
+ tags:
10
+ - mteb
11
+ - sentence-transformers
12
+ - feature-extraction
13
+ - sentence-similarity
14
+ - transformers
15
+ - pytorch
16
  model-index:
17
+ - name: Supertron-embedding-300M
18
  results:
19
  - task:
20
  type: STS
 
84
  value: 50.01057211780597
85
  ---
86
 
87
+ # Supertron-embedding-300M: High-Efficiency Semantic Representation Model
88
 
89
  ## Model Description
90
 
91
+ Supertron-embedding-300M is a high-performance, compact embedding model fine-tuned from the google/embeddinggemma-300m architecture. It is specifically designed to provide state-of-the-art semantic representations for Retrieval-Augmented Generation (RAG), semantic search, and document clustering applications while maintaining a low computational footprint suitable for production environments.
92
 
93
+ * **Developed by:** Surpem
94
+ * **Model Type:** Sentence Transformer
95
+ * **Architecture:** Gemma-based Dense Transformer
96
+ * **Base Model:** [google/embeddinggemma-300m](https://huggingface.co/google/embeddinggemma-300m)
97
+ * **License:** Apache 2.0
98
+ * **Language:** English (en)
99
 
100
+ ## Results
101
 
102
+ Supertron-embedding-300M demonstrates competitive performance across the Massive Text Embedding Benchmark (MTEB). It is particularly effective in Semantic Textual Similarity (STS) tasks, outperforming many larger models in its weight class.
103
 
104
  | Task Category | Task Name | Metric | Score |
105
  | :--- | :--- | :--- | :--- |
 
110
  | Classification | AmazonCounterfactual | Accuracy | 83.34 |
111
  | Clustering | TwentyNewsgroups | V-Measure | 50.01 |
112
 
113
+ ## Get Started
114
 
115
+ This model can be easily integrated using the `sentence-transformers` library.
 
 
116
 
117
  ```python
118
  from sentence_transformers import SentenceTransformer
119
 
120
+ model_id = "surpem/Supertron-embedding-300M"
121
+
122
+ # Load the model
123
+ model = SentenceTransformer(model_id)
124
 
125
+ # Define target text
126
  sentences = [
127
+ "The financial results exceeded market expectations.",
128
+ "The company reported better than expected quarterly earnings."
129
  ]
130
 
131
+ # Compute embeddings
132
  embeddings = model.encode(sentences)
133
 
134
+ # Calculate cosine similarity
135
  similarity = model.similarity(embeddings[0], embeddings[1])
136
+ print(f"Semantic Similarity: {similarity.item():.4f}")
137
+ Training Procedure
138
+ Hyperparameters
139
+ Precision: bfloat16
140
+
141
+ Max Sequence Length: 256 tokens
142
+
143
+ Optimizer: AdamW
144
+
145
+ Batch Size: 256
146
+
147
+ Learning Rate: 2e-5
148
+
149
+ Citation
150
+ Code-Snippet
151
+ @misc{surpem2026supertron,
152
+ title={Supertron-embedding-300M: High-Efficiency Semantic Representation Model},
153
+ author={Surpem},
154
+ year={2026},
155
+ url={[https://huggingface.co/surpem/Supertron-embedding-300M](https://huggingface.co/surpem/Supertron-embedding-300M)},
156
+ }