ColeH0415
/

comp90042-crossencoder-factcheck

@@ -4,7 +4,7 @@ tags:
 - cross-encoder
 - reranker
 - generated_from_trainer
-- dataset_size:4815
 - loss:BinaryCrossEntropyLoss
 base_model: cross-encoder/ms-marco-MiniLM-L6-v2
 pipeline_tag: text-ranking
@@ -28,25 +28,25 @@ model-index:
       type: ce-val
     metrics:
     - type: accuracy
-      value: 0.8411214953271028
       name: Accuracy
     - type: accuracy_threshold
-      value: -0.21951499581336975
       name: Accuracy Threshold
     - type: f1
-      value: 0.8989547038327527
       name: F1
     - type: f1_threshold
-      value: -1.483269453048706
       name: F1 Threshold
     - type: precision
-      value: 0.8543046357615894
       name: Precision
     - type: recall
-      value: 0.9485294117647058
       name: Recall
     - type: average_precision
-      value: 0.9597205150279606
       name: Average Precision
 ---
@@ -99,25 +99,25 @@ from sentence_transformers import CrossEncoder
 model = CrossEncoder("cross_encoder_model_id")
 # Get scores for pairs of inputs
 pairs = [
-    ['Climate change is a hoax invented by the Chinese.', '"The concept of global warming was created by and for the Chinese in order to make U.S. manufacturing non-competitive" (Tweet).'],
-    ['“The oceans, which absorb more than 90% of the extra CO2 pumped into the atmosphere“', 'Most of the CO 2 taken up by the ocean, which is about 30% of the total released into the atmosphere, forms carbonic acid in equilibrium with bicarbonate.'],
-    ['“The jet stream forms a boundary between the cold north and the warmer south, but the lower temperature difference means the winds are now weaker.', 'Therefore, the strong eastward moving jet streams are in part a simple consequence of the fact that the Equator is warmer than the North and South poles.'],
-    ['climate models predict too much warming in the troposphere', 'While the satellite data now show global warming, there is still some difference between what climate models predict and what the satellite data show for warming of the lower troposphere, with the climate models predicting slightly more warming than what the satellites measure.'],
-    ['Nine years into that 11-year hurricane drought, a NASA scientist computed it as a 1-in-177-year event.', 'It is approximately 177 light years from the Earth.'],
 ]
 scores = model.predict(pairs)
 print(scores)
-# [ 5.2169  2.2205  2.9421 -1.5924 -1.1596]
 # Or rank different texts based on similarity to a single text
 ranks = model.rank(
-    'Climate change is a hoax invented by the Chinese.',
     [
-        '"The concept of global warming was created by and for the Chinese in order to make U.S. manufacturing non-competitive" (Tweet).',
-        'Most of the CO 2 taken up by the ocean, which is about 30% of the total released into the atmosphere, forms carbonic acid in equilibrium with bicarbonate.',
-        'Therefore, the strong eastward moving jet streams are in part a simple consequence of the fact that the Equator is warmer than the North and South poles.',
-        'While the satellite data now show global warming, there is still some difference between what climate models predict and what the satellite data show for warming of the lower troposphere, with the climate models predicting slightly more warming than what the satellites measure.',
-        'It is approximately 177 light years from the Earth.',
     ]
 )
 # [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
@@ -158,13 +158,13 @@ You can finetune this model on your own dataset.
 | Metric                | Value      |
 |:----------------------|:-----------|
-| accuracy              | 0.8411     |
-| accuracy_threshold    | -0.2195    |
-| f1                    | 0.899      |
-| f1_threshold          | -1.4833    |
-| precision             | 0.8543     |
-| recall                | 0.9485     |
-| **average_precision** | **0.9597** |
 <!--
 ## Bias, Risks and Limitations
@@ -184,19 +184,19 @@ You can finetune this model on your own dataset.
 #### Unnamed Dataset
-* Size: 4,815 training samples
 * Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
 * Approximate statistics based on the first 1000 samples:
-  |         | sentence_0                                                                        | sentence_1                                                                        | label                                                          |
-  |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------|
-  | type    | string                                                                            | string                                                                            | float                                                          |
-  | details | <ul><li>min: 7 tokens</li><li>mean: 27.21 tokens</li><li>max: 73 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 35.1 tokens</li><li>max: 333 tokens</li></ul> | <ul><li>min: 0.0</li><li>mean: 0.78</li><li>max: 1.0</li></ul> |
 * Samples:
-  | sentence_0                                                                                                                                                      | sentence_1                                                                                                                                                              | label            |
-  |:----------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------|
-  | <code>Climate change is a hoax invented by the Chinese.</code>                                                                                                  | <code>"The concept of global warming was created by and for the Chinese in order to make U.S. manufacturing non-competitive" (Tweet).</code>                            | <code>1.0</code> |
-  | <code>“The oceans, which absorb more than 90% of the extra CO2 pumped into the atmosphere“</code>                                                               | <code>Most of the CO 2 taken up by the ocean, which is about 30% of the total released into the atmosphere, forms carbonic acid in equilibrium with bicarbonate.</code> | <code>1.0</code> |
-  | <code>“The jet stream forms a boundary between the cold north and the warmer south, but the lower temperature difference means the winds are now weaker.</code> | <code>Therefore, the strong eastward moving jet streams are in part a simple consequence of the fact that the Equator is warmer than the North and South poles.</code>  | <code>1.0</code> |
 * Loss: [<code>BinaryCrossEntropyLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#binarycrossentropyloss) with these parameters:
   ```json
   {
@@ -317,11 +317,11 @@ You can finetune this model on your own dataset.
 ### Training Logs
 | Epoch | Step | ce-val_average_precision |
 |:-----:|:----:|:------------------------:|
-| 1.0   | 301  | 0.9597                   |
 ### Training Time
-- **Training**: 23.5 seconds
 ### Framework Versions
 - Python: 3.12.13

 - cross-encoder
 - reranker
 - generated_from_trainer
+- dataset_size:7419
 - loss:BinaryCrossEntropyLoss
 base_model: cross-encoder/ms-marco-MiniLM-L6-v2
 pipeline_tag: text-ranking
       type: ce-val
     metrics:
     - type: accuracy
+      value: 0.816969696969697
       name: Accuracy
     - type: accuracy_threshold
+      value: -0.6473360061645508
       name: Accuracy Threshold
     - type: f1
+      value: 0.8307349665924276
       name: F1
     - type: f1_threshold
+      value: -1.014681339263916
       name: F1 Threshold
     - type: precision
+      value: 0.7690721649484537
       name: Precision
     - type: recall
+      value: 0.9031476997578692
       name: Recall
     - type: average_precision
+      value: 0.8791017476679602
       name: Average Precision
 ---
 model = CrossEncoder("cross_encoder_model_id")
 # Get scores for pairs of inputs
 pairs = [
+    ['The last time the planet was even four degrees warmer, Peter Brannen points out in The Ends of the World, his new history of the planet’s major extinction events, the oceans were hundreds of feet higher.', 'It was designed by Jung Brannen Associates.'],
+    ['[S]unspot activity on the surface of our star has dropped to a new low.', 'This surface activity produces starspots, which are regions of strong magnetic fields and lower than normal surface temperatures.'],
+    ['More money is dedicated within the Department of Homeland Security to climate change than what\'s spent combating "Islamist terrorists radicalizing over the Internet in the United States of America."', 'According to The Washington Post, "Online recruiting has exponentially increased, with Facebook, YouTube and the increasing sophistication of people online".'],
+    ['Worst-case global heating scenarios may need to be revised upwards in light of a better understanding of the role of clouds, scientists have said.', 'Climate change is more accurate scientifically to describe the various effects of greenhouse gases on the world because it includes extreme weather, storms and changes in rainfall patterns, ocean acidification and sea level.".'],
+    ['Prof Adam Scaife, a climate modelling expert at the UK’s Met Office, said the evidence for a link to shrinking Arctic ice was now good: ‘The consensus points towards that being a real effect.’”', 'Category : Ceremonial officers in the United Kingdom'],
 ]
 scores = model.predict(pairs)
 print(scores)
+# [-4.6643  0.3606  0.9265  3.4254 -5.0308]
 # Or rank different texts based on similarity to a single text
 ranks = model.rank(
+    'The last time the planet was even four degrees warmer, Peter Brannen points out in The Ends of the World, his new history of the planet’s major extinction events, the oceans were hundreds of feet higher.',
     [
+        'It was designed by Jung Brannen Associates.',
+        'This surface activity produces starspots, which are regions of strong magnetic fields and lower than normal surface temperatures.',
+        'According to The Washington Post, "Online recruiting has exponentially increased, with Facebook, YouTube and the increasing sophistication of people online".',
+        'Climate change is more accurate scientifically to describe the various effects of greenhouse gases on the world because it includes extreme weather, storms and changes in rainfall patterns, ocean acidification and sea level.".',
+        'Category : Ceremonial officers in the United Kingdom',
     ]
 )
 # [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
 | Metric                | Value      |
 |:----------------------|:-----------|
+| accuracy              | 0.817      |
+| accuracy_threshold    | -0.6473    |
+| f1                    | 0.8307     |
+| f1_threshold          | -1.0147    |
+| precision             | 0.7691     |
+| recall                | 0.9031     |
+| **average_precision** | **0.8791** |
 <!--
 ## Bias, Risks and Limitations
 #### Unnamed Dataset
+* Size: 7,419 training samples
 * Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
 * Approximate statistics based on the first 1000 samples:
+  |         | sentence_0                                                                        | sentence_1                                                                         | label                                                          |
+  |:--------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:---------------------------------------------------------------|
+  | type    | string                                                                            | string                                                                             | float                                                          |
+  | details | <ul><li>min: 7 tokens</li><li>mean: 27.57 tokens</li><li>max: 82 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 31.56 tokens</li><li>max: 321 tokens</li></ul> | <ul><li>min: 0.0</li><li>mean: 0.52</li><li>max: 1.0</li></ul> |
 * Samples:
+  | sentence_0                                                                                                                                                                                                               | sentence_1                                                                                                                                                                 | label            |
+  |:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------|
+  | <code>The last time the planet was even four degrees warmer, Peter Brannen points out in The Ends of the World, his new history of the planet’s major extinction events, the oceans were hundreds of feet higher.</code> | <code>It was designed by Jung Brannen Associates.</code>                                                                                                                   | <code>0.0</code> |
+  | <code>[S]unspot activity on the surface of our star has dropped to a new low.</code>                                                                                                                                     | <code>This surface activity produces starspots, which are regions of strong magnetic fields and lower than normal surface temperatures.</code>                             | <code>1.0</code> |
+  | <code>More money is dedicated within the Department of Homeland Security to climate change than what's spent combating "Islamist terrorists radicalizing over the Internet in the United States of America."</code>      | <code>According to The Washington Post, "Online recruiting has exponentially increased, with Facebook, YouTube and the increasing sophistication of people online".</code> | <code>1.0</code> |
 * Loss: [<code>BinaryCrossEntropyLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#binarycrossentropyloss) with these parameters:
   ```json
   {
 ### Training Logs
 | Epoch | Step | ce-val_average_precision |
 |:-----:|:----:|:------------------------:|
+| -1    | -1   | 0.8791                   |
 ### Training Time
+- **Training**: 28.9 seconds
 ### Framework Versions
 - Python: 3.12.13

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:be5e3a21981a266058942d48b96e0472f718fd3dcdfa0269c4c4bee7cde16de9
 size 90866404

 version https://git-lfs.github.com/spec/v1
+oid sha256:63f48e7c8fd1c45a0ca9652968db194b8bfeba59c550dd8e80fd22dd4efd5b04
 size 90866404