foochun
/

bge-reranker-ft

@@ -2,9 +2,10 @@
 tags:
 - sentence-transformers
 - cross-encoder
 - generated_from_trainer
-- dataset_size:82744
-- loss:MultipleNegativesRankingLoss
 base_model: BAAI/bge-reranker-base
 pipeline_tag: text-ranking
 library_name: sentence-transformers
@@ -50,11 +51,11 @@ from sentence_transformers import CrossEncoder
 model = CrossEncoder("foochun/bge-reranker-ft")
 # Get scores for pairs of texts
 pairs = [
-    ['quinn toh heng yi', 'heng yi toh quinn'],
-    ['mohd iskandi bin hassan', 'muhd iskandi hassan'],
-    ['quinn ng ee siu', 'quinn ee siu ng'],
-    ['malini doraisamy', 'malini doraisamy'],
-    ['see shan fui', 'shanfui see'],
 ]
 scores = model.predict(pairs)
 print(scores.shape)
@@ -62,13 +63,13 @@ print(scores.shape)
 # Or rank different texts based on similarity to a single text
 ranks = model.rank(
-    'quinn toh heng yi',
     [
-        'heng yi toh quinn',
-        'muhd iskandi hassan',
-        'quinn ee siu ng',
-        'malini doraisamy',
-        'shanfui see',
     ]
 )
 # [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
@@ -116,74 +117,41 @@ You can finetune this model on your own dataset.
 #### Unnamed Dataset
-* Size: 82,744 training samples
-* Columns: <code>query</code>, <code>pos</code>, and <code>neg</code>
 * Approximate statistics based on the first 1000 samples:
-  |         | query                                                                                         | pos                                                                                           | neg                                                                                          |
-  |:--------|:----------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------|
-  | type    | string                                                                                        | string                                                                                        | string                                                                                       |
-  | details | <ul><li>min: 9 characters</li><li>mean: 19.16 characters</li><li>max: 42 characters</li></ul> | <ul><li>min: 9 characters</li><li>mean: 17.11 characters</li><li>max: 37 characters</li></ul> | <ul><li>min: 9 characters</li><li>mean: 17.7 characters</li><li>max: 38 characters</li></ul> |
 * Samples:
-  | query                            | pos                            | neg                              |
-  |:---------------------------------|:-------------------------------|:---------------------------------|
-  | <code>brandon teh min jun</code> | <code>jun teh min</code>       | <code>brandon min teh jun</code> |
-  | <code>suling anak peroi</code>   | <code>suling anak peroi</code> | <code>suling anak rahim</code>   |
-  | <code>chin sze tian</code>       | <code>szetian chin</code>      | <code>chin sze tian wong</code>  |
-* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#multiplenegativesrankingloss) with these parameters:
   ```json
   {
-      "scale": 10.0,
-      "num_negatives": 4,
-      "activation_fn": "torch.nn.modules.activation.Sigmoid"
-  }
-  ```
-### Evaluation Dataset
-#### Unnamed Dataset
-* Size: 11,820 evaluation samples
-* Columns: <code>query</code>, <code>pos</code>, and <code>neg</code>
-* Approximate statistics based on the first 1000 samples:
-  |         | query                                                                                          | pos                                                                                           | neg                                                                                           |
-  |:--------|:-----------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------|
-  | type    | string                                                                                         | string                                                                                        | string                                                                                        |
-  | details | <ul><li>min: 10 characters</li><li>mean: 19.08 characters</li><li>max: 45 characters</li></ul> | <ul><li>min: 9 characters</li><li>mean: 17.02 characters</li><li>max: 40 characters</li></ul> | <ul><li>min: 9 characters</li><li>mean: 17.58 characters</li><li>max: 44 characters</li></ul> |
-* Samples:
-  | query                                | pos                              | neg                                             |
-  |:-------------------------------------|:---------------------------------|:------------------------------------------------|
-  | <code>quinn toh heng yi</code>       | <code>heng yi toh quinn</code>   | <code>toh yi heng</code>                        |
-  | <code>mohd iskandi bin hassan</code> | <code>muhd iskandi hassan</code> | <code>puteri balqis binti megat sulaiman</code> |
-  | <code>quinn ng ee siu</code>         | <code>quinn ee siu ng</code>     | <code>quinn ee ng siu</code>                    |
-* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#multiplenegativesrankingloss) with these parameters:
-  ```json
-  {
-      "scale": 10.0,
-      "num_negatives": 4,
-      "activation_fn": "torch.nn.modules.activation.Sigmoid"
   }
   ```
 ### Training Hyperparameters
 #### Non-Default Hyperparameters
-- `eval_strategy`: steps
 - `per_device_train_batch_size`: 64
 - `per_device_eval_batch_size`: 64
-- `learning_rate`: 1e-05
-- `warmup_ratio`: 0.1
-- `seed`: 12
 - `fp16`: True
-- `dataloader_num_workers`: 4
-- `load_best_model_at_end`: True
-- `batch_sampler`: no_duplicates
 #### All Hyperparameters
 <details><summary>Click to expand</summary>
 - `overwrite_output_dir`: False
 - `do_predict`: False
-- `eval_strategy`: steps
 - `prediction_loss_only`: True
 - `per_device_train_batch_size`: 64
 - `per_device_eval_batch_size`: 64
@@ -192,17 +160,17 @@ You can finetune this model on your own dataset.
 - `gradient_accumulation_steps`: 1
 - `eval_accumulation_steps`: None
 - `torch_empty_cache_steps`: None
-- `learning_rate`: 1e-05
 - `weight_decay`: 0.0
 - `adam_beta1`: 0.9
 - `adam_beta2`: 0.999
 - `adam_epsilon`: 1e-08
-- `max_grad_norm`: 1.0
-- `num_train_epochs`: 3
 - `max_steps`: -1
 - `lr_scheduler_type`: linear
 - `lr_scheduler_kwargs`: {}
-- `warmup_ratio`: 0.1
 - `warmup_steps`: 0
 - `log_level`: passive
 - `log_level_replica`: warning
@@ -215,7 +183,7 @@ You can finetune this model on your own dataset.
 - `no_cuda`: False
 - `use_cpu`: False
 - `use_mps_device`: False
-- `seed`: 12
 - `data_seed`: None
 - `jit_mode_eval`: False
 - `use_ipex`: False
@@ -232,13 +200,13 @@ You can finetune this model on your own dataset.
 - `tpu_metrics_debug`: False
 - `debug`: []
 - `dataloader_drop_last`: False
-- `dataloader_num_workers`: 4
 - `dataloader_prefetch_factor`: None
 - `past_index`: -1
 - `disable_tqdm`: False
 - `remove_unused_columns`: True
 - `label_names`: None
-- `load_best_model_at_end`: True
 - `ignore_data_skip`: False
 - `fsdp`: []
 - `fsdp_min_num_params`: 0
@@ -265,6 +233,7 @@ You can finetune this model on your own dataset.
 - `hub_strategy`: every_save
 - `hub_private_repo`: None
 - `hub_always_push`: False
 - `gradient_checkpointing`: False
 - `gradient_checkpointing_kwargs`: None
 - `include_inputs_for_metrics`: False
@@ -289,31 +258,34 @@ You can finetune this model on your own dataset.
 - `batch_eval_metrics`: False
 - `eval_on_start`: False
 - `use_liger_kernel`: False
 - `eval_use_gather_object`: False
 - `average_tokens_across_devices`: False
 - `prompts`: None
-- `batch_sampler`: no_duplicates
 - `multi_dataset_batch_sampler`: proportional
 </details>
 ### Training Logs
 | Epoch  | Step | Training Loss |
 |:------:|:----:|:-------------:|
-| 0.0008 | 1    | 0.4707        |
-| 0.7734 | 1000 | 0.1114        |
-| 1.5468 | 2000 | 0.0051        |
-| 2.3202 | 3000 | 0.0046        |
 ### Framework Versions
 - Python: 3.11.9
-- Sentence Transformers: 4.1.0
-- Transformers: 4.52.4
 - PyTorch: 2.6.0+cu124
-- Accelerate: 1.7.0
 - Datasets: 3.6.0
-- Tokenizers: 0.21.1
 ## Citation

 tags:
 - sentence-transformers
 - cross-encoder
+- reranker
 - generated_from_trainer
+- dataset_size:27035
+- loss:BinaryCrossEntropyLoss
 base_model: BAAI/bge-reranker-base
 pipeline_tag: text-ranking
 library_name: sentence-transformers
 model = CrossEncoder("foochun/bge-reranker-ft")
 # Get scores for pairs of texts
 pairs = [
+    ['wendy chia pei ling', 'chia ling pei wendy'],
+    ['tara d/o sundaram', 'tara a/l sundaram'],
+    ['sim sin xuan', 'sin sim xuan'],
+    ['samantha claire de silva', 'raja iskandar bin raja ahmad'],
+    ['tai yong shen', 'shen tai yong'],
 ]
 scores = model.predict(pairs)
 print(scores.shape)
 # Or rank different texts based on similarity to a single text
 ranks = model.rank(
+    'wendy chia pei ling',
     [
+        'chia ling pei wendy',
+        'tara a/l sundaram',
+        'sin sim xuan',
+        'raja iskandar bin raja ahmad',
+        'shen tai yong',
     ]
 )
 # [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
 #### Unnamed Dataset
+* Size: 27,035 training samples
+* Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
 * Approximate statistics based on the first 1000 samples:
+  |         | sentence_0                                                                                     | sentence_1                                                                                   | label                                                           |
+  |:--------|:-----------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------|:----------------------------------------------------------------|
+  | type    | string                                                                                         | string                                                                                       | float                                                           |
+  | details | <ul><li>min: 10 characters</li><li>mean: 21.47 characters</li><li>max: 45 characters</li></ul> | <ul><li>min: 7 characters</li><li>mean: 19.7 characters</li><li>max: 40 characters</li></ul> | <ul><li>min: 0.55</li><li>mean: 0.77</li><li>max: 1.0</li></ul> |
 * Samples:
+  | sentence_0                       | sentence_1                       | label               |
+  |:---------------------------------|:---------------------------------|:--------------------|
+  | <code>wendy chia pei ling</code> | <code>chia ling pei wendy</code> | <code>0.55</code>   |
+  | <code>tara d/o sundaram</code>   | <code>tara a/l sundaram</code>   | <code>0.836</code>  |
+  | <code>sim sin xuan</code>        | <code>sin sim xuan</code>        | <code>0.7885</code> |
+* Loss: [<code>BinaryCrossEntropyLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#binarycrossentropyloss) with these parameters:
   ```json
   {
+      "activation_fn": "torch.nn.modules.linear.Identity",
+      "pos_weight": null
   }
   ```
 ### Training Hyperparameters
 #### Non-Default Hyperparameters
 - `per_device_train_batch_size`: 64
 - `per_device_eval_batch_size`: 64
+- `num_train_epochs`: 5
 - `fp16`: True
 #### All Hyperparameters
 <details><summary>Click to expand</summary>
 - `overwrite_output_dir`: False
 - `do_predict`: False
+- `eval_strategy`: no
 - `prediction_loss_only`: True
 - `per_device_train_batch_size`: 64
 - `per_device_eval_batch_size`: 64
 - `gradient_accumulation_steps`: 1
 - `eval_accumulation_steps`: None
 - `torch_empty_cache_steps`: None
+- `learning_rate`: 5e-05
 - `weight_decay`: 0.0
 - `adam_beta1`: 0.9
 - `adam_beta2`: 0.999
 - `adam_epsilon`: 1e-08
+- `max_grad_norm`: 1
+- `num_train_epochs`: 5
 - `max_steps`: -1
 - `lr_scheduler_type`: linear
 - `lr_scheduler_kwargs`: {}
+- `warmup_ratio`: 0.0
 - `warmup_steps`: 0
 - `log_level`: passive
 - `log_level_replica`: warning
 - `no_cuda`: False
 - `use_cpu`: False
 - `use_mps_device`: False
+- `seed`: 42
 - `data_seed`: None
 - `jit_mode_eval`: False
 - `use_ipex`: False
 - `tpu_metrics_debug`: False
 - `debug`: []
 - `dataloader_drop_last`: False
+- `dataloader_num_workers`: 0
 - `dataloader_prefetch_factor`: None
 - `past_index`: -1
 - `disable_tqdm`: False
 - `remove_unused_columns`: True
 - `label_names`: None
+- `load_best_model_at_end`: False
 - `ignore_data_skip`: False
 - `fsdp`: []
 - `fsdp_min_num_params`: 0
 - `hub_strategy`: every_save
 - `hub_private_repo`: None
 - `hub_always_push`: False
+- `hub_revision`: None
 - `gradient_checkpointing`: False
 - `gradient_checkpointing_kwargs`: None
 - `include_inputs_for_metrics`: False
 - `batch_eval_metrics`: False
 - `eval_on_start`: False
 - `use_liger_kernel`: False
+- `liger_kernel_config`: None
 - `eval_use_gather_object`: False
 - `average_tokens_across_devices`: False
 - `prompts`: None
+- `batch_sampler`: batch_sampler
 - `multi_dataset_batch_sampler`: proportional
+- `router_mapping`: {}
+- `learning_rate_mapping`: {}
 </details>
 ### Training Logs
 | Epoch  | Step | Training Loss |
 |:------:|:----:|:-------------:|
+| 1.1820 | 500  | 0.4725        |
+| 2.3641 | 1000 | 0.4476        |
+| 3.5461 | 1500 | 0.4438        |
+| 4.7281 | 2000 | 0.443         |
 ### Framework Versions
 - Python: 3.11.9
+- Sentence Transformers: 5.0.0
+- Transformers: 4.53.0
 - PyTorch: 2.6.0+cu124
+- Accelerate: 1.8.1
 - Datasets: 3.6.0
+- Tokenizers: 0.21.2
 ## Citation

config.json CHANGED Viewed

@@ -27,10 +27,10 @@
   "position_embedding_type": "absolute",
   "sentence_transformers": {
     "activation_fn": "torch.nn.modules.activation.Sigmoid",
-    "version": "4.1.0"
   },
   "torch_dtype": "float32",
-  "transformers_version": "4.52.4",
   "type_vocab_size": 1,
   "use_cache": true,
   "vocab_size": 250002

   "position_embedding_type": "absolute",
   "sentence_transformers": {
     "activation_fn": "torch.nn.modules.activation.Sigmoid",
+    "version": "5.0.0"
   },
   "torch_dtype": "float32",
+  "transformers_version": "4.53.0",
   "type_vocab_size": 1,
   "use_cache": true,
   "vocab_size": 250002

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:590bafb40b20dad3f7206e0dd682b70c7d962305730ffde246762e9b04328fba
 size 1112201932

 version https://git-lfs.github.com/spec/v1
+oid sha256:c4d122284e1a31599b81749bfa07801bed98b79c73b8b146ce4ade3793501d47
 size 1112201932