Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper • 1908.10084 • Published • 12
This is a sentence-transformers model finetuned from BAAI/bge-large-en-v1.5. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': True, 'architecture': 'BertModel'})
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("rshn-krn/bge-large-legal-billing")
# Run inference
sentences = [
'Carrier: Coaction Global, Inc.\nAttorney: Senior Counsel | Rate: $245/hr | Units: 0.1\nTask: L120 - Analysis/Strategy | Activity: A103 - Draft/revise\nNarrative: Draft Litigation Status Report regarding the deposition testimony of California Highway Patrol Officer A.S. Johnson with attention to Section I. Testimony, subsection G. Witness Statements, subsection 3. Tuna Taleni ',
'Carrier: Coaction Global, Inc.\nAttorney: Senior Counsel | Rate: $245/hr | Units: 0.3\nTask: L120 - Analysis/Strategy | Activity: A103 - Draft/revise\nNarrative: Draft Litigation Status Report regarding the deposition testimony of California Highway Patrol Officer A.S. Johnson with attention to Section I. Testimony, subsection G. Witness Statements, subsection 4. Lumafale Oti ',
'Carrier: Mitsui Sumitomo Marine Management, Inc.\nAttorney: Senior Counsel | Rate: $325/hr | Units: 0.1\nTask: L110 - Fact Investigation/Development | Activity: A107 - Communicat/OUT\nNarrative: Email to current counsel for Google defendants regarding status of dismissal of non-related entities in our to evaluate the case for insured.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.9998, 0.4579],
# [0.9998, 1.0000, 0.4567],
# [0.4579, 0.4567, 1.0000]])
anchor and positive| anchor | positive | |
|---|---|---|
| type | string | string |
| details |
|
|
| anchor | positive |
|---|---|
Carrier: Golden Bear Insurance Company |
Carrier: Golden Bear Insurance Company |
Carrier: Aspen Specialty Insurance Company |
Carrier: Aspen Specialty Insurance Company |
Carrier: Argo Group U.S. |
Carrier: Argo Group U.S. |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim",
"gather_across_devices": false,
"directions": [
"query_to_doc"
],
"partition_mode": "joint",
"hardness_mode": null,
"hardness_strength": 0.0
}
per_device_train_batch_size: 32warmup_steps: 0.1gradient_accumulation_steps: 4fp16: Truegradient_checkpointing: Trueper_device_train_batch_size: 32num_train_epochs: 3max_steps: -1learning_rate: 5e-05lr_scheduler_type: linearlr_scheduler_kwargs: Nonewarmup_steps: 0.1optim: adamw_torch_fusedoptim_args: Noneweight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08optim_target_modules: Nonegradient_accumulation_steps: 4average_tokens_across_devices: Truemax_grad_norm: 1.0label_smoothing_factor: 0.0bf16: Falsefp16: Truebf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonegradient_checkpointing: Truegradient_checkpointing_kwargs: Nonetorch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneuse_liger_kernel: Falseliger_kernel_config: Noneuse_cache: Falseneftune_noise_alpha: Nonetorch_empty_cache_steps: Noneauto_find_batch_size: Falselog_on_each_node: Truelogging_nan_inf_filter: Trueinclude_num_input_tokens_seen: nolog_level: passivelog_level_replica: warningdisable_tqdm: Falseproject: huggingfacetrackio_space_id: trackioeval_strategy: noper_device_eval_batch_size: 8prediction_loss_only: Trueeval_on_start: Falseeval_do_concat_batches: Trueeval_use_gather_object: Falseeval_accumulation_steps: Noneinclude_for_metrics: []batch_eval_metrics: Falsesave_only_model: Falsesave_on_each_node: Falseenable_jit_checkpoint: Falsepush_to_hub: Falsehub_private_repo: Nonehub_model_id: Nonehub_strategy: every_savehub_always_push: Falsehub_revision: Noneload_best_model_at_end: Falseignore_data_skip: Falserestore_callback_states_from_checkpoint: Falsefull_determinism: Falseseed: 42data_seed: Noneuse_cpu: Falseaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedataloader_drop_last: Falsedataloader_num_workers: 0dataloader_pin_memory: Truedataloader_persistent_workers: Falsedataloader_prefetch_factor: Noneremove_unused_columns: Truelabel_names: Nonetrain_sampling_strategy: randomlength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falseddp_backend: Noneddp_timeout: 1800fsdp: []fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}deepspeed: Nonedebug: []skip_memory_metrics: Truedo_predict: Falseresume_from_checkpoint: Nonewarmup_ratio: Nonelocal_rank: -1prompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: proportionalrouter_mapping: {}learning_rate_mapping: {}| Epoch | Step | Training Loss |
|---|---|---|
| 0.0447 | 10 | 1.5338 |
| 0.0894 | 20 | 0.4740 |
| 0.1341 | 30 | 0.4362 |
| 0.1788 | 40 | 0.3799 |
| 0.2235 | 50 | 0.4078 |
| 0.2682 | 60 | 0.3891 |
| 0.3128 | 70 | 0.4182 |
| 0.3575 | 80 | 0.3938 |
| 0.4022 | 90 | 0.4459 |
| 0.4469 | 100 | 0.4123 |
| 0.4916 | 110 | 0.3640 |
| 0.5363 | 120 | 0.4194 |
| 0.5810 | 130 | 0.3928 |
| 0.6257 | 140 | 0.4267 |
| 0.6704 | 150 | 0.4228 |
| 0.7151 | 160 | 0.4358 |
| 0.7598 | 170 | 0.4309 |
| 0.8045 | 180 | 0.4161 |
| 0.8492 | 190 | 0.4289 |
| 0.8939 | 200 | 0.4091 |
| 0.9385 | 210 | 0.3994 |
| 0.9832 | 220 | 0.4184 |
| 1.0268 | 230 | 0.4119 |
| 1.0715 | 240 | 0.4279 |
| 1.1162 | 250 | 0.3907 |
| 1.1609 | 260 | 0.4242 |
| 1.2056 | 270 | 0.4049 |
| 1.2503 | 280 | 0.3787 |
| 1.2950 | 290 | 0.4150 |
| 1.3397 | 300 | 0.4472 |
| 1.3844 | 310 | 0.3944 |
| 1.4291 | 320 | 0.4288 |
| 1.4737 | 330 | 0.3718 |
| 1.5184 | 340 | 0.4148 |
| 1.5631 | 350 | 0.4160 |
| 1.6078 | 360 | 0.3907 |
| 1.6525 | 370 | 0.3918 |
| 1.6972 | 380 | 0.3777 |
| 1.7419 | 390 | 0.4300 |
| 1.7866 | 400 | 0.3913 |
| 1.8313 | 410 | 0.4205 |
| 1.8760 | 420 | 0.3863 |
| 1.9207 | 430 | 0.4370 |
| 1.9654 | 440 | 0.4225 |
| 2.0089 | 450 | 0.4057 |
| 2.0536 | 460 | 0.3843 |
| 2.0983 | 470 | 0.4034 |
| 2.1430 | 480 | 0.4115 |
| 2.1877 | 490 | 0.4128 |
| 2.2324 | 500 | 0.4028 |
| 2.2771 | 510 | 0.4198 |
| 2.3218 | 520 | 0.3613 |
| 2.3665 | 530 | 0.4017 |
| 2.4112 | 540 | 0.3639 |
| 2.4559 | 550 | 0.3978 |
| 2.5006 | 560 | 0.3982 |
| 2.5453 | 570 | 0.4059 |
| 2.5899 | 580 | 0.4175 |
| 2.6346 | 590 | 0.4510 |
| 2.6793 | 600 | 0.4210 |
| 2.7240 | 610 | 0.4098 |
| 2.7687 | 620 | 0.4082 |
| 2.8134 | 630 | 0.3970 |
| 2.8581 | 640 | 0.3846 |
| 2.9028 | 650 | 0.4155 |
| 2.9475 | 660 | 0.4071 |
| 2.9922 | 670 | 0.4293 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{oord2019representationlearningcontrastivepredictive,
title={Representation Learning with Contrastive Predictive Coding},
author={Aaron van den Oord and Yazhe Li and Oriol Vinyals},
year={2019},
eprint={1807.03748},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/1807.03748},
}
Base model
BAAI/bge-large-en-v1.5