ColeH0415 commited on
Commit
0c8fab7
·
verified ·
1 Parent(s): b978496

CE fine-tuned epoch 1/3 best_val=0.8170

Browse files
Files changed (2) hide show
  1. README.md +39 -39
  2. model.safetensors +1 -1
README.md CHANGED
@@ -4,7 +4,7 @@ tags:
4
  - cross-encoder
5
  - reranker
6
  - generated_from_trainer
7
- - dataset_size:4815
8
  - loss:BinaryCrossEntropyLoss
9
  base_model: cross-encoder/ms-marco-MiniLM-L6-v2
10
  pipeline_tag: text-ranking
@@ -28,25 +28,25 @@ model-index:
28
  type: ce-val
29
  metrics:
30
  - type: accuracy
31
- value: 0.8411214953271028
32
  name: Accuracy
33
  - type: accuracy_threshold
34
- value: -0.21951499581336975
35
  name: Accuracy Threshold
36
  - type: f1
37
- value: 0.8989547038327527
38
  name: F1
39
  - type: f1_threshold
40
- value: -1.483269453048706
41
  name: F1 Threshold
42
  - type: precision
43
- value: 0.8543046357615894
44
  name: Precision
45
  - type: recall
46
- value: 0.9485294117647058
47
  name: Recall
48
  - type: average_precision
49
- value: 0.9597205150279606
50
  name: Average Precision
51
  ---
52
 
@@ -99,25 +99,25 @@ from sentence_transformers import CrossEncoder
99
  model = CrossEncoder("cross_encoder_model_id")
100
  # Get scores for pairs of inputs
101
  pairs = [
102
- ['Climate change is a hoax invented by the Chinese.', '"The concept of global warming was created by and for the Chinese in order to make U.S. manufacturing non-competitive" (Tweet).'],
103
- ['“The oceans, which absorb more than 90% of the extra CO2 pumped into the atmosphere“', 'Most of the CO 2 taken up by the ocean, which is about 30% of the total released into the atmosphere, forms carbonic acid in equilibrium with bicarbonate.'],
104
- ['“The jet stream forms a boundary between the cold north and the warmer south, but the lower temperature difference means the winds are now weaker.', 'Therefore, the strong eastward moving jet streams are in part a simple consequence of the fact that the Equator is warmer than the North and South poles.'],
105
- ['climate models predict too much warming in the troposphere', 'While the satellite data now show global warming, there is still some difference between what climate models predict and what the satellite data show for warming of the lower troposphere, with the climate models predicting slightly more warming than what the satellites measure.'],
106
- ['Nine years into that 11-year hurricane drought, a NASA scientist computed it as a 1-in-177-year event.', 'It is approximately 177 light years from the Earth.'],
107
  ]
108
  scores = model.predict(pairs)
109
  print(scores)
110
- # [ 5.2169 2.2205 2.9421 -1.5924 -1.1596]
111
 
112
  # Or rank different texts based on similarity to a single text
113
  ranks = model.rank(
114
- 'Climate change is a hoax invented by the Chinese.',
115
  [
116
- '"The concept of global warming was created by and for the Chinese in order to make U.S. manufacturing non-competitive" (Tweet).',
117
- 'Most of the CO 2 taken up by the ocean, which is about 30% of the total released into the atmosphere, forms carbonic acid in equilibrium with bicarbonate.',
118
- 'Therefore, the strong eastward moving jet streams are in part a simple consequence of the fact that the Equator is warmer than the North and South poles.',
119
- 'While the satellite data now show global warming, there is still some difference between what climate models predict and what the satellite data show for warming of the lower troposphere, with the climate models predicting slightly more warming than what the satellites measure.',
120
- 'It is approximately 177 light years from the Earth.',
121
  ]
122
  )
123
  # [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
@@ -158,13 +158,13 @@ You can finetune this model on your own dataset.
158
 
159
  | Metric | Value |
160
  |:----------------------|:-----------|
161
- | accuracy | 0.8411 |
162
- | accuracy_threshold | -0.2195 |
163
- | f1 | 0.899 |
164
- | f1_threshold | -1.4833 |
165
- | precision | 0.8543 |
166
- | recall | 0.9485 |
167
- | **average_precision** | **0.9597** |
168
 
169
  <!--
170
  ## Bias, Risks and Limitations
@@ -184,19 +184,19 @@ You can finetune this model on your own dataset.
184
 
185
  #### Unnamed Dataset
186
 
187
- * Size: 4,815 training samples
188
  * Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
189
  * Approximate statistics based on the first 1000 samples:
190
- | | sentence_0 | sentence_1 | label |
191
- |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------|
192
- | type | string | string | float |
193
- | details | <ul><li>min: 7 tokens</li><li>mean: 27.21 tokens</li><li>max: 73 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 35.1 tokens</li><li>max: 333 tokens</li></ul> | <ul><li>min: 0.0</li><li>mean: 0.78</li><li>max: 1.0</li></ul> |
194
  * Samples:
195
- | sentence_0 | sentence_1 | label |
196
- |:----------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------|
197
- | <code>Climate change is a hoax invented by the Chinese.</code> | <code>"The concept of global warming was created by and for the Chinese in order to make U.S. manufacturing non-competitive" (Tweet).</code> | <code>1.0</code> |
198
- | <code>“The oceans, which absorb more than 90% of the extra CO2 pumped into the atmosphere“</code> | <code>Most of the CO 2 taken up by the ocean, which is about 30% of the total released into the atmosphere, forms carbonic acid in equilibrium with bicarbonate.</code> | <code>1.0</code> |
199
- | <code>“The jet stream forms a boundary between the cold north and the warmer south, but the lower temperature difference means the winds are now weaker.</code> | <code>Therefore, the strong eastward moving jet streams are in part a simple consequence of the fact that the Equator is warmer than the North and South poles.</code> | <code>1.0</code> |
200
  * Loss: [<code>BinaryCrossEntropyLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#binarycrossentropyloss) with these parameters:
201
  ```json
202
  {
@@ -317,11 +317,11 @@ You can finetune this model on your own dataset.
317
  ### Training Logs
318
  | Epoch | Step | ce-val_average_precision |
319
  |:-----:|:----:|:------------------------:|
320
- | 1.0 | 301 | 0.9597 |
321
 
322
 
323
  ### Training Time
324
- - **Training**: 23.5 seconds
325
 
326
  ### Framework Versions
327
  - Python: 3.12.13
 
4
  - cross-encoder
5
  - reranker
6
  - generated_from_trainer
7
+ - dataset_size:7419
8
  - loss:BinaryCrossEntropyLoss
9
  base_model: cross-encoder/ms-marco-MiniLM-L6-v2
10
  pipeline_tag: text-ranking
 
28
  type: ce-val
29
  metrics:
30
  - type: accuracy
31
+ value: 0.816969696969697
32
  name: Accuracy
33
  - type: accuracy_threshold
34
+ value: -0.6473360061645508
35
  name: Accuracy Threshold
36
  - type: f1
37
+ value: 0.8307349665924276
38
  name: F1
39
  - type: f1_threshold
40
+ value: -1.014681339263916
41
  name: F1 Threshold
42
  - type: precision
43
+ value: 0.7690721649484537
44
  name: Precision
45
  - type: recall
46
+ value: 0.9031476997578692
47
  name: Recall
48
  - type: average_precision
49
+ value: 0.8791017476679602
50
  name: Average Precision
51
  ---
52
 
 
99
  model = CrossEncoder("cross_encoder_model_id")
100
  # Get scores for pairs of inputs
101
  pairs = [
102
+ ['The last time the planet was even four degrees warmer, Peter Brannen points out in The Ends of the World, his new history of the planet’s major extinction events, the oceans were hundreds of feet higher.', 'It was designed by Jung Brannen Associates.'],
103
+ ['[S]unspot activity on the surface of our star has dropped to a new low.', 'This surface activity produces starspots, which are regions of strong magnetic fields and lower than normal surface temperatures.'],
104
+ ['More money is dedicated within the Department of Homeland Security to climate change than what\'s spent combating "Islamist terrorists radicalizing over the Internet in the United States of America."', 'According to The Washington Post, "Online recruiting has exponentially increased, with Facebook, YouTube and the increasing sophistication of people online".'],
105
+ ['Worst-case global heating scenarios may need to be revised upwards in light of a better understanding of the role of clouds, scientists have said.', 'Climate change is more accurate scientifically to describe the various effects of greenhouse gases on the world because it includes extreme weather, storms and changes in rainfall patterns, ocean acidification and sea level.".'],
106
+ ['Prof Adam Scaife, a climate modelling expert at the UK’s Met Office, said the evidence for a link to shrinking Arctic ice was now good: ‘The consensus points towards that being a real effect.’”', 'Category : Ceremonial officers in the United Kingdom'],
107
  ]
108
  scores = model.predict(pairs)
109
  print(scores)
110
+ # [-4.6643 0.3606 0.9265 3.4254 -5.0308]
111
 
112
  # Or rank different texts based on similarity to a single text
113
  ranks = model.rank(
114
+ 'The last time the planet was even four degrees warmer, Peter Brannen points out in The Ends of the World, his new history of the planet’s major extinction events, the oceans were hundreds of feet higher.',
115
  [
116
+ 'It was designed by Jung Brannen Associates.',
117
+ 'This surface activity produces starspots, which are regions of strong magnetic fields and lower than normal surface temperatures.',
118
+ 'According to The Washington Post, "Online recruiting has exponentially increased, with Facebook, YouTube and the increasing sophistication of people online".',
119
+ 'Climate change is more accurate scientifically to describe the various effects of greenhouse gases on the world because it includes extreme weather, storms and changes in rainfall patterns, ocean acidification and sea level.".',
120
+ 'Category : Ceremonial officers in the United Kingdom',
121
  ]
122
  )
123
  # [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
 
158
 
159
  | Metric | Value |
160
  |:----------------------|:-----------|
161
+ | accuracy | 0.817 |
162
+ | accuracy_threshold | -0.6473 |
163
+ | f1 | 0.8307 |
164
+ | f1_threshold | -1.0147 |
165
+ | precision | 0.7691 |
166
+ | recall | 0.9031 |
167
+ | **average_precision** | **0.8791** |
168
 
169
  <!--
170
  ## Bias, Risks and Limitations
 
184
 
185
  #### Unnamed Dataset
186
 
187
+ * Size: 7,419 training samples
188
  * Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
189
  * Approximate statistics based on the first 1000 samples:
190
+ | | sentence_0 | sentence_1 | label |
191
+ |:--------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:---------------------------------------------------------------|
192
+ | type | string | string | float |
193
+ | details | <ul><li>min: 7 tokens</li><li>mean: 27.57 tokens</li><li>max: 82 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 31.56 tokens</li><li>max: 321 tokens</li></ul> | <ul><li>min: 0.0</li><li>mean: 0.52</li><li>max: 1.0</li></ul> |
194
  * Samples:
195
+ | sentence_0 | sentence_1 | label |
196
+ |:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------|
197
+ | <code>The last time the planet was even four degrees warmer, Peter Brannen points out in The Ends of the World, his new history of the planet’s major extinction events, the oceans were hundreds of feet higher.</code> | <code>It was designed by Jung Brannen Associates.</code> | <code>0.0</code> |
198
+ | <code>[S]unspot activity on the surface of our star has dropped to a new low.</code> | <code>This surface activity produces starspots, which are regions of strong magnetic fields and lower than normal surface temperatures.</code> | <code>1.0</code> |
199
+ | <code>More money is dedicated within the Department of Homeland Security to climate change than what's spent combating "Islamist terrorists radicalizing over the Internet in the United States of America."</code> | <code>According to The Washington Post, "Online recruiting has exponentially increased, with Facebook, YouTube and the increasing sophistication of people online".</code> | <code>1.0</code> |
200
  * Loss: [<code>BinaryCrossEntropyLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#binarycrossentropyloss) with these parameters:
201
  ```json
202
  {
 
317
  ### Training Logs
318
  | Epoch | Step | ce-val_average_precision |
319
  |:-----:|:----:|:------------------------:|
320
+ | -1 | -1 | 0.8791 |
321
 
322
 
323
  ### Training Time
324
+ - **Training**: 28.9 seconds
325
 
326
  ### Framework Versions
327
  - Python: 3.12.13
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:be5e3a21981a266058942d48b96e0472f718fd3dcdfa0269c4c4bee7cde16de9
3
  size 90866404
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:63f48e7c8fd1c45a0ca9652968db194b8bfeba59c550dd8e80fd22dd4efd5b04
3
  size 90866404