vincent-benton commited on
Commit
dbc4ec8
·
verified ·
1 Parent(s): 10601fb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +120 -3
README.md CHANGED
@@ -1,3 +1,120 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ model-index:
3
+ - name: Gemma-Embedding-300m-Finetuned
4
+ results:
5
+ - task:
6
+ type: STS
7
+ name: STSBenchmark
8
+ dataset:
9
+ name: MTEB STSBenchmark
10
+ type: mteb/STSBenchmark
11
+ config: default
12
+ split: test
13
+ metrics:
14
+ - type: cos_sim_spearman
15
+ value: 87.1012
16
+ - task:
17
+ type: STS
18
+ name: STS12
19
+ dataset:
20
+ name: MTEB STS12
21
+ type: mteb/STS12
22
+ config: default
23
+ split: test
24
+ metrics:
25
+ - type: cos_sim_spearman
26
+ value: 80.1767
27
+ - task:
28
+ type: STS
29
+ name: BIOSSES
30
+ dataset:
31
+ name: MTEB BIOSSES
32
+ type: mteb/BIOSSES
33
+ config: default
34
+ split: test
35
+ metrics:
36
+ - type: cos_sim_spearman
37
+ value: 82.9778
38
+ - task:
39
+ type: Retrieval
40
+ name: NFCorpus
41
+ dataset:
42
+ name: MTEB NFCorpus
43
+ type: mteb/NFCorpus
44
+ config: default
45
+ split: test
46
+ metrics:
47
+ - type: ndcg_at_10
48
+ value: 37.074
49
+ - task:
50
+ type: Classification
51
+ name: AmazonCounterfactualClassification
52
+ dataset:
53
+ name: MTEB AmazonCounterfactualClassification
54
+ type: mteb/AmazonCounterfactualClassification
55
+ config: default
56
+ split: test
57
+ metrics:
58
+ - type: accuracy
59
+ value: 83.3415625
60
+ - task:
61
+ type: Clustering
62
+ name: TwentyNewsgroupsClustering.v2
63
+ dataset:
64
+ name: MTEB TwentyNewsgroupsClustering.v2
65
+ type: mteb/TwentyNewsgroupsClustering.v2
66
+ config: default
67
+ split: test
68
+ metrics:
69
+ - type: v_measure
70
+ value: 50.01057211780597
71
+ ---
72
+
73
+ # Gemma-Embedding-300m-Finetuned
74
+
75
+ ## Model Description
76
+
77
+ This model is a fine-tuned version of the google/embeddinggemma-300m architecture. It has been optimized for semantic textual similarity (STS), retrieval, and classification tasks. The model represents a high-efficiency solution for embedding generation, providing a favorable balance between computational overhead and semantic accuracy.
78
+
79
+ - **Base Model:** google/embeddinggemma-300m
80
+ - **Maximum Sequence Length:** 256 tokens
81
+ - **Output Dimensionality:** 1024
82
+ - **Language:** English
83
+
84
+ ## Evaluation Results
85
+
86
+ The model has been benchmarked using the Massive Text Embedding Benchmark (MTEB). The following table summarizes its performance across various task categories:
87
+
88
+ | Task Category | Task Name | Metric | Score |
89
+ | :--- | :--- | :--- | :--- |
90
+ | Semantic Similarity | STSBenchmark | cos_sim_spearman | 87.10 |
91
+ | Semantic Similarity | STS12 | cos_sim_spearman | 80.18 |
92
+ | Semantic Similarity | BIOSSES | cos_sim_spearman | 82.98 |
93
+ | Retrieval | NFCorpus | NDCG@10 | 37.07 |
94
+ | Classification | AmazonCounterfactual | Accuracy | 83.34 |
95
+ | Clustering | TwentyNewsgroups | V-Measure | 50.01 |
96
+
97
+ ## Usage
98
+
99
+ ### Sentence-Transformers
100
+
101
+ The model can be implemented directly using the `sentence-transformers` library:
102
+
103
+ ```python
104
+ from sentence_transformers import SentenceTransformer
105
+
106
+ # Load the model from the Hugging Face Hub
107
+ model = SentenceTransformer("your-username/Gemma-Embedding-300m-Finetuned")
108
+
109
+ # Define input text
110
+ sentences = [
111
+ "The atmospheric conditions are favorable for flight.",
112
+ "The weather is good for flying today."
113
+ ]
114
+
115
+ # Generate embeddings
116
+ embeddings = model.encode(sentences)
117
+
118
+ # Calculate semantic similarity
119
+ similarity = model.similarity(embeddings[0], embeddings[1])
120
+ print(similarity)