CE fine-tuned epoch 1/3

Browse files

Files changed (9) hide show

README.md +368 -0
config.json +36 -0
config_sentence_transformers.json +11 -0
eval/CrossEncoderClassificationEvaluator_ce-val_results.csv +2 -0
model.safetensors +3 -0
modules.json +8 -0
sentence_bert_config.json +10 -0
tokenizer.json +0 -0
tokenizer_config.json +18 -0

README.md ADDED Viewed

	@@ -0,0 +1,368 @@

+---
+tags:
+- sentence-transformers
+- cross-encoder
+- reranker
+- generated_from_trainer
+- dataset_size:4815
+- loss:BinaryCrossEntropyLoss
+base_model: cross-encoder/ms-marco-MiniLM-L6-v2
+pipeline_tag: text-ranking
+library_name: sentence-transformers
+metrics:
+- accuracy
+- accuracy_threshold
+- f1
+- f1_threshold
+- precision
+- recall
+- average_precision
+model-index:
+- name: CrossEncoder based on cross-encoder/ms-marco-MiniLM-L6-v2
+  results:
+  - task:
+      type: cross-encoder-binary-classification
+      name: Cross Encoder Binary Classification
+    dataset:
+      name: ce val
+      type: ce-val
+    metrics:
+    - type: accuracy
+      value: 0.8130841121495327
+      name: Accuracy
+    - type: accuracy_threshold
+      value: -0.2579485774040222
+      name: Accuracy Threshold
+    - type: f1
+      value: 0.8845265588914549
+      name: F1
+    - type: f1_threshold
+      value: -0.43757033348083496
+      name: F1 Threshold
+    - type: precision
+      value: 0.8362445414847162
+      name: Precision
+    - type: recall
+      value: 0.9387254901960784
+      name: Recall
+    - type: average_precision
+      value: 0.9335939639352934
+      name: Average Precision
+---
+# CrossEncoder based on cross-encoder/ms-marco-MiniLM-L6-v2
+This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model finetuned from [cross-encoder/ms-marco-MiniLM-L6-v2](https://huggingface.co/cross-encoder/ms-marco-MiniLM-L6-v2) using the [sentence-transformers](https://www.SBERT.net) library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.
+## Model Details
+### Model Description
+- **Model Type:** Cross Encoder
+- **Base model:** [cross-encoder/ms-marco-MiniLM-L6-v2](https://huggingface.co/cross-encoder/ms-marco-MiniLM-L6-v2) <!-- at revision c5ee24cb16019beea0893ab7796b1df96625c6b8 -->
+- **Maximum Sequence Length:** 512 tokens
+- **Number of Output Labels:** 1 label
+- **Supported Modality:** Text
+<!-- - **Training Dataset:** Unknown -->
+<!-- - **Language:** Unknown -->
+<!-- - **License:** Unknown -->
+### Model Sources
+- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
+- **Documentation:** [Cross Encoder Documentation](https://www.sbert.net/docs/cross_encoder/usage/usage.html)
+- **Repository:** [Sentence Transformers on GitHub](https://github.com/huggingface/sentence-transformers)
+- **Hugging Face:** [Cross Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=cross-encoder)
+### Full Model Architecture
+```
+CrossEncoder(
+  (0): Transformer({'transformer_task': 'sequence-classification', 'modality_config': {'text': {'method': 'forward', 'method_output_name': 'logits'}}, 'module_output_name': 'scores', 'architecture': 'BertForSequenceClassification'})
+)
+```
+## Usage
+### Direct Usage (Sentence Transformers)
+First install the Sentence Transformers library:
+```bash
+pip install -U sentence-transformers
+```
+Then you can load this model and run inference.
+```python
+from sentence_transformers import CrossEncoder
+# Download from the 🤗 Hub
+model = CrossEncoder("cross_encoder_model_id")
+# Get scores for pairs of inputs
+pairs = [
+    ['Yet a study published just this week, by the Bjerknes Centre for Climate Research in Bergen, Norway, found that the natural climate system can change abruptly, without the need for any external forces.', "Analysis of the layering and chemical composition of the cores has provided a revolutionary new record of climate change in the Northern Hemisphere going back about 100,000\xa0years and illustrated that the world's weather and temperature have often shifted rapidly from one seemingly stable state to another, with worldwide consequences."],
+    ['Schmittner finds low climate sensitivity.', 'Herbaceous perennials are also able to tolerate the extremes of cold in temperate and Arctic winters, with less sensitivity than trees or shrubs.'],
+    ['Meeting the 2025 emissions reduction target alone could subtract $250 billion from our GDP and eliminate 2.7 million jobs.', 'Judging by the continued growth in the Renewable Fuel Standard and the extension of the biodiesel tax incentive, the number of jobs can increase to 50,725, $2.7 billion in income, and reaching $5 billion in GDP by 2012 and 2013.'],
+    ['Preventing global warming is relatively cheap; business-as-usual will cause accelerating climate damage costs that economists struggle to even estimate.', 'Researchers have warned that current economic modeling may seriously underestimate the impact of potentially catastrophic climate change and point to the need for new models that give a more accurate picture of potential damages.'],
+    ['Domino-effect of climate events could move Earth into a ‘hothouse’ state', 'This one change creates a domino effect which alters many events and characters in the DC Universe.'],
+]
+scores = model.predict(pairs)
+print(scores)
+# [ 3.0222  1.1202  2.4127  1.3557 -0.496 ]
+# Or rank different texts based on similarity to a single text
+ranks = model.rank(
+    'Yet a study published just this week, by the Bjerknes Centre for Climate Research in Bergen, Norway, found that the natural climate system can change abruptly, without the need for any external forces.',
+    [
+        "Analysis of the layering and chemical composition of the cores has provided a revolutionary new record of climate change in the Northern Hemisphere going back about 100,000\xa0years and illustrated that the world's weather and temperature have often shifted rapidly from one seemingly stable state to another, with worldwide consequences.",
+        'Herbaceous perennials are also able to tolerate the extremes of cold in temperate and Arctic winters, with less sensitivity than trees or shrubs.',
+        'Judging by the continued growth in the Renewable Fuel Standard and the extension of the biodiesel tax incentive, the number of jobs can increase to 50,725, $2.7 billion in income, and reaching $5 billion in GDP by 2012 and 2013.',
+        'Researchers have warned that current economic modeling may seriously underestimate the impact of potentially catastrophic climate change and point to the need for new models that give a more accurate picture of potential damages.',
+        'This one change creates a domino effect which alters many events and characters in the DC Universe.',
+    ]
+)
+# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
+```
+<!--
+### Direct Usage (Transformers)
+<details><summary>Click to see the direct usage in Transformers</summary>
+</details>
+-->
+<!--
+### Downstream Usage (Sentence Transformers)
+You can finetune this model on your own dataset.
+<details><summary>Click to expand</summary>
+</details>
+-->
+<!--
+### Out-of-Scope Use
+*List how the model may foreseeably be misused and address what users ought not to do with the model.*
+-->
+## Evaluation
+### Metrics
+#### Cross Encoder Binary Classification
+* Dataset: `ce-val`
+* Evaluated with [<code>CEBinaryClassificationEvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CEBinaryClassificationEvaluator)
+| Metric                | Value      |
+|:----------------------|:-----------|
+| accuracy              | 0.8131     |
+| accuracy_threshold    | -0.2579    |
+| f1                    | 0.8845     |
+| f1_threshold          | -0.4376    |
+| precision             | 0.8362     |
+| recall                | 0.9387     |
+| **average_precision** | **0.9336** |
+<!--
+## Bias, Risks and Limitations
+*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
+-->
+<!--
+### Recommendations
+*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
+-->
+## Training Details
+### Training Dataset
+#### Unnamed Dataset
+* Size: 4,815 training samples
+* Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
+* Approximate statistics based on the first 1000 samples:
+  |         | sentence_0                                                                       | sentence_1                                                                         | label                                                          |
+  |:--------|:---------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:---------------------------------------------------------------|
+  | type    | string                                                                           | string                                                                             | float                                                          |
+  | details | <ul><li>min: 7 tokens</li><li>mean: 26.5 tokens</li><li>max: 66 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 34.21 tokens</li><li>max: 333 tokens</li></ul> | <ul><li>min: 0.0</li><li>mean: 0.77</li><li>max: 1.0</li></ul> |
+* Samples:
+  | sentence_0                                                                                                                                                                                                             | sentence_1                                                                                                                                                                                                                                                                                                                                                   | label            |
+  |:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------|
+  | <code>Yet a study published just this week, by the Bjerknes Centre for Climate Research in Bergen, Norway, found that the natural climate system can change abruptly, without the need for any external forces.</code> | <code>Analysis of the layering and chemical composition of the cores has provided a revolutionary new record of climate change in the Northern Hemisphere going back about 100,000 years and illustrated that the world's weather and temperature have often shifted rapidly from one seemingly stable state to another, with worldwide consequences.</code> | <code>1.0</code> |
+  | <code>Schmittner finds low climate sensitivity.</code>                                                                                                                                                                 | <code>Herbaceous perennials are also able to tolerate the extremes of cold in temperate and Arctic winters, with less sensitivity than trees or shrubs.</code>                                                                                                                                                                                               | <code>1.0</code> |
+  | <code>Meeting the 2025 emissions reduction target alone could subtract $250 billion from our GDP and eliminate 2.7 million jobs.</code>                                                                                | <code>Judging by the continued growth in the Renewable Fuel Standard and the extension of the biodiesel tax incentive, the number of jobs can increase to 50,725, $2.7 billion in income, and reaching $5 billion in GDP by 2012 and 2013.</code>                                                                                                            | <code>1.0</code> |
+* Loss: [<code>BinaryCrossEntropyLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#binarycrossentropyloss) with these parameters:
+  ```json
+  {
+      "activation_fn": "torch.nn.modules.linear.Identity",
+      "pos_weight": null
+  }
+  ```
+### Training Hyperparameters
+#### Non-Default Hyperparameters
+- `per_device_train_batch_size`: 16
+- `per_device_eval_batch_size`: 16
+- `num_train_epochs`: 1
+#### All Hyperparameters
+<details><summary>Click to expand</summary>
+- `do_predict`: False
+- `prediction_loss_only`: True
+- `per_device_train_batch_size`: 16
+- `per_device_eval_batch_size`: 16
+- `gradient_accumulation_steps`: 1
+- `eval_accumulation_steps`: None
+- `torch_empty_cache_steps`: None
+- `learning_rate`: 5e-05
+- `weight_decay`: 0.0
+- `adam_beta1`: 0.9
+- `adam_beta2`: 0.999
+- `adam_epsilon`: 1e-08
+- `max_grad_norm`: 1
+- `num_train_epochs`: 1
+- `max_steps`: -1
+- `lr_scheduler_type`: linear
+- `lr_scheduler_kwargs`: None
+- `warmup_ratio`: None
+- `warmup_steps`: 0
+- `log_level`: passive
+- `log_level_replica`: warning
+- `log_on_each_node`: True
+- `logging_nan_inf_filter`: True
+- `enable_jit_checkpoint`: False
+- `save_on_each_node`: False
+- `save_only_model`: False
+- `restore_callback_states_from_checkpoint`: False
+- `use_cpu`: False
+- `seed`: 42
+- `data_seed`: None
+- `bf16`: False
+- `fp16`: False
+- `bf16_full_eval`: False
+- `fp16_full_eval`: False
+- `tf32`: None
+- `local_rank`: -1
+- `ddp_backend`: None
+- `debug`: []
+- `dataloader_drop_last`: False
+- `dataloader_num_workers`: 0
+- `dataloader_prefetch_factor`: None
+- `disable_tqdm`: False
+- `remove_unused_columns`: True
+- `label_names`: None
+- `load_best_model_at_end`: False
+- `ignore_data_skip`: False
+- `fsdp`: []
+- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
+- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
+- `parallelism_config`: None
+- `deepspeed`: None
+- `label_smoothing_factor`: 0.0
+- `optim`: adamw_torch_fused
+- `optim_args`: None
+- `group_by_length`: False
+- `length_column_name`: length
+- `project`: huggingface
+- `trackio_space_id`: trackio
+- `ddp_find_unused_parameters`: None
+- `ddp_bucket_cap_mb`: None
+- `ddp_broadcast_buffers`: False
+- `dataloader_pin_memory`: True
+- `dataloader_persistent_workers`: False
+- `skip_memory_metrics`: True
+- `push_to_hub`: False
+- `resume_from_checkpoint`: None
+- `hub_model_id`: None
+- `hub_strategy`: every_save
+- `hub_private_repo`: None
+- `hub_always_push`: False
+- `hub_revision`: None
+- `gradient_checkpointing`: False
+- `gradient_checkpointing_kwargs`: None
+- `include_for_metrics`: []
+- `eval_do_concat_batches`: True
+- `auto_find_batch_size`: False
+- `full_determinism`: False
+- `ddp_timeout`: 1800
+- `torch_compile`: False
+- `torch_compile_backend`: None
+- `torch_compile_mode`: None
+- `include_num_input_tokens_seen`: no
+- `neftune_noise_alpha`: None
+- `optim_target_modules`: None
+- `batch_eval_metrics`: False
+- `eval_on_start`: False
+- `use_liger_kernel`: False
+- `liger_kernel_config`: None
+- `eval_use_gather_object`: False
+- `average_tokens_across_devices`: True
+- `use_cache`: False
+- `prompts`: None
+- `batch_sampler`: batch_sampler
+- `multi_dataset_batch_sampler`: proportional
+- `router_mapping`: {}
+- `learning_rate_mapping`: {}
+</details>
+### Training Logs
+| Epoch | Step | ce-val_average_precision |
+|:-----:|:----:|:------------------------:|
+| 1.0   | 301  | 0.9336                   |
+### Training Time
+- **Training**: 24.8 seconds
+### Framework Versions
+- Python: 3.12.13
+- Sentence Transformers: 5.4.1
+- Transformers: 5.0.0
+- PyTorch: 2.10.0+cu128
+- Accelerate: 1.13.0
+- Datasets: 4.0.0
+- Tokenizers: 0.22.2
+## Citation
+### BibTeX
+#### Sentence Transformers
+```bibtex
+@inproceedings{reimers-2019-sentence-bert,
+    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
+    author = "Reimers, Nils and Gurevych, Iryna",
+    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
+    month = "11",
+    year = "2019",
+    publisher = "Association for Computational Linguistics",
+    url = "https://arxiv.org/abs/1908.10084",
+}
+```
+<!--
+## Glossary
+*Clearly define terms in order to be accessible across audiences.*
+-->
+<!--
+## Model Card Authors
+*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
+-->
+<!--
+## Model Card Contact
+*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
+-->

config.json ADDED Viewed

	@@ -0,0 +1,36 @@

+{
+  "add_cross_attention": false,
+  "architectures": [
+    "BertForSequenceClassification"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "bos_token_id": null,
+  "classifier_dropout": null,
+  "dtype": "float32",
+  "eos_token_id": null,
+  "gradient_checkpointing": false,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 384,
+  "id2label": {
+    "0": "LABEL_0"
+  },
+  "initializer_range": 0.02,
+  "intermediate_size": 1536,
+  "is_decoder": false,
+  "label2id": {
+    "LABEL_0": 0
+  },
+  "layer_norm_eps": 1e-12,
+  "max_position_embeddings": 512,
+  "model_type": "bert",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 6,
+  "pad_token_id": 0,
+  "position_embedding_type": "absolute",
+  "tie_word_embeddings": true,
+  "transformers_version": "5.0.0",
+  "type_vocab_size": 2,
+  "use_cache": false,
+  "vocab_size": 30522
+}

config_sentence_transformers.json ADDED Viewed

	@@ -0,0 +1,11 @@

+{
+  "__version__": {
+    "pytorch": "2.10.0+cu128",
+    "sentence_transformers": "5.4.1",
+    "transformers": "5.0.0"
+  },
+  "activation_fn": "torch.nn.modules.linear.Identity",
+  "default_prompt_name": null,
+  "model_type": "CrossEncoder",
+  "prompts": {}
+}

eval/CrossEncoderClassificationEvaluator_ce-val_results.csv ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ epoch,steps,Accuracy,Accuracy_Threshold,F1,F1_Threshold,Precision,Recall,Average_Precision
2	+ 1.0,301,0.8130841121495327,-0.25794858,0.8845265588914549,-0.43757033,0.8362445414847162,0.9387254901960784,0.9335939639352934

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:160442534ea4f7d804bce531ea9d64d7765f92e6df6c366ebfb0002c2043e133
+size 90866404

modules.json ADDED Viewed

	@@ -0,0 +1,8 @@

+[
+  {
+    "idx": 0,
+    "name": "0",
+    "path": "",
+    "type": "sentence_transformers.base.modules.transformer.Transformer"
+  }
+]

sentence_bert_config.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+    "transformer_task": "sequence-classification",
+    "modality_config": {
+        "text": {
+            "method": "forward",
+            "method_output_name": "logits"
+        }
+    },
+    "module_output_name": "scores"
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,18 @@

+{
+  "backend": "tokenizers",
+  "clean_up_tokenization_spaces": true,
+  "cls_token": "[CLS]",
+  "do_basic_tokenize": true,
+  "do_lower_case": true,
+  "is_local": false,
+  "mask_token": "[MASK]",
+  "model_max_length": 512,
+  "model_specific_special_tokens": {},
+  "never_split": null,
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "BertTokenizer",
+  "unk_token": "[UNK]"
+}