Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper • 1908.10084 • Published • 12
This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'BertModel'})
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("hbpkillerX/legal-clause-minilm-l6-v2")
# Run inference
sentences = [
'Since the Balance Sheet Date, the Company Group has conducted the Business in the ordinary course consistent with past practices. Without limiting the generality of the foregoing, except as set forth on Schedule 5.13, since the Balance Sheet Date, there has not been:',
'(a) From December 31, 2011 to the Signing Date, (i) each Parent Entity conducted such Parent Entity’s operations only in the ordinary course of business consistent with past practices and (ii) there has not occurred and continued to exist any event, change, effect, fact, circumstance or condition which, individually or in the aggregate, has had, or would reasonably be expected to have or result in, a Parent Material Adverse Change.',
'Notwithstanding the provisions of paragraph 2 above, all Restricted Equivalents then outstanding will immediately vest, in the event of:',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.9113, 0.1831],
# [0.9113, 1.0000, 0.2245],
# [0.1831, 0.2245, 1.0000]])
clause-eval-tripletsTripletEvaluator| Metric | Value |
|---|---|
| cosine_accuracy | 0.9761 |
anchor and positive| anchor | positive | |
|---|---|---|
| type | string | string |
| details |
|
|
| anchor | positive |
|---|---|
Except as disclosed in the SEC Documents, since December 31, 2005, there has been no material adverse change and no material adverse development in the business, properties, operations, condition (financial or otherwise), results of operations or prospects of the Company or its Subsidiaries. Since December 31, 2005, the Company has not (i) declared or paid any dividends, (ii) sold any assets, individually or in the aggregate, in excess of $50,000 outside of the ordinary course of business or (iii) had capital expenditures, individually or in the aggregate, in excess of $100,000. The Company has not taken any steps to seek protection pursuant to any bankruptcy law nor does the Company have any knowledge or reason to believe that its creditors intend to initiate involuntary bankruptcy proceedings or any actual knowledge of any fact which would reasonably lead a creditor to do so. After giving effect to the transactions contemplated hereby to occur at the Closing, the Company will not be ... |
Since December 31, 1995, there has not been (a) any Material Adverse Change (as defined in Section 12.3) in the business, prospects, financial condition, revenues, expenses or operations of DAP; (b) any decrease in the cash and cash equivalents of DAP from the amounts shown on the balance sheet included in the 1995 Financial Statements (except for any such decrease attributable to any Permitted Cash Payments (as defined in Section 12.3) hereof) made by DAP prior to Closing), (c) any damage, destruction or loss, whether covered by insurance or not, having a Material Adverse Effect, with regard to DAP's properties and business; (d) any payment by DAP to, or any notice to or acknowledgment by DAP of any amount due or owing to, DAP's self-insured carrier in connection with any self-insured amounts or liabilities under health insurance covering employees of DAP, in each case, in excess of a reserve therefor on the balance sheet included in the 1995 Financial Statements; (e) any declaration,... |
Since December 31, 1995, there has not been (a) any Material Adverse Change (as defined in Section 12.3) in the business, prospects, financial condition, revenues, expenses or operations of DAP; (b) any decrease in the cash and cash equivalents of DAP from the amounts shown on the balance sheet included in the 1995 Financial Statements (except for any such decrease attributable to any Permitted Cash Payments (as defined in Section 12.3) hereof) made by DAP prior to Closing), (c) any damage, destruction or loss, whether covered by insurance or not, having a Material Adverse Effect, with regard to DAP's properties and business; (d) any payment by DAP to, or any notice to or acknowledgment by DAP of any amount due or owing to, DAP's self-insured carrier in connection with any self-insured amounts or liabilities under health insurance covering employees of DAP, in each case, in excess of a reserve therefor on the balance sheet included in the 1995 Financial Statements; (e) any declaration,... |
Except as disclosed in Schedule 3(g) or -------------------------- the SEC Documents filed at least five (5) days prior to the date hereof, there has been no change or development in the business, properties, assets, operations, financial condition, results of operations or prospects of the Company or its Subsidiaries which has had or reasonably could have a Material Adverse Effect. The Company has not taken any steps, and does not currently expect to take any steps, to seek protection pursuant to any bankruptcy law nor does the Company or its Subsidiaries have any knowledge or reason to believe that its creditors intend to initiate involuntary bankruptcy proceedings. |
Except as disclosed in Schedule 3(g) or -------------------------- the SEC Documents filed at least five (5) days prior to the date hereof, there has been no change or development in the business, properties, assets, operations, financial condition, results of operations or prospects of the Company or its Subsidiaries which has had or reasonably could have a Material Adverse Effect. The Company has not taken any steps, and does not currently expect to take any steps, to seek protection pursuant to any bankruptcy law nor does the Company or its Subsidiaries have any knowledge or reason to believe that its creditors intend to initiate involuntary bankruptcy proceedings. |
Except as disclosed in the SEC Documents, since December 31, 2017, there has been no material adverse change in the business, properties, operations, financial condition or results of operations of the Company or its Subsidiaries. The Company is not in violation or default of (i) any provision of the Certificate of Incorporation or Bylaws, (ii) the terms of any indenture, contract, lease, mortgage, deed of trust, note agreement, loan agreement or other agreement, obligation, condition, covenant or instrument to which it is a party or bound or to which its property is subject, or (iii) any statute, law, rule, regulation, judgment, order or decree of any court, regulatory body, administrative agency, governmental body, arbitrator or other authority having jurisdiction over the Company or any of its properties, which, in the case of clauses (ii) or (iii), could be reasonably expected to have a Material Adverse Effect. Except as described in the SEC Documents, no dispute between the Compan... |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim",
"gather_across_devices": false
}
eval_strategy: stepsper_device_train_batch_size: 256per_device_eval_batch_size: 32learning_rate: 2e-05num_train_epochs: 10warmup_ratio: 0.1fp16: Trueload_best_model_at_end: Truebatch_sampler: no_duplicatesoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 256per_device_eval_batch_size: 32per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 2e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 10max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falsebf16: Falsefp16: Truefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Trueignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedeepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torch_fusedoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthproject: huggingfacetrackio_space_id: trackioddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: noneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Trueprompts: Nonebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportionalrouter_mapping: {}learning_rate_mapping: {}| Epoch | Step | Training Loss | clause-eval-triplets_cosine_accuracy |
|---|---|---|---|
| 0.0954 | 50 | 4.0121 | - |
| 0.1908 | 100 | 3.6193 | - |
| 0.2863 | 150 | 3.2993 | - |
| 0.3817 | 200 | 3.1397 | 0.9145 |
| 0.4771 | 250 | 3.0152 | - |
| 0.5725 | 300 | 2.8616 | - |
| 0.6679 | 350 | 2.7934 | - |
| 0.7634 | 400 | 2.6894 | 0.9415 |
| 0.8588 | 450 | 2.6142 | - |
| 0.9542 | 500 | 2.5517 | - |
| 1.0496 | 550 | 2.4866 | - |
| 1.1450 | 600 | 2.4756 | 0.9552 |
| 1.2405 | 650 | 2.4114 | - |
| 1.3359 | 700 | 2.3485 | - |
| 1.4313 | 750 | 2.317 | - |
| 1.5267 | 800 | 2.2811 | 0.9583 |
| 1.6221 | 850 | 2.2907 | - |
| 1.7176 | 900 | 2.2771 | - |
| 1.8130 | 950 | 2.2511 | - |
| 1.9084 | 1000 | 2.2015 | 0.9639 |
| 2.0038 | 1050 | 2.1946 | - |
| 2.0992 | 1100 | 2.1394 | - |
| 2.1947 | 1150 | 2.1468 | - |
| 2.2901 | 1200 | 2.1317 | 0.9644 |
| 2.3855 | 1250 | 2.1101 | - |
| 2.4809 | 1300 | 2.1233 | - |
| 2.5763 | 1350 | 2.1143 | - |
| 2.6718 | 1400 | 2.0817 | 0.9644 |
| 2.7672 | 1450 | 2.0413 | - |
| 2.8626 | 1500 | 2.0659 | - |
| 2.9580 | 1550 | 2.0521 | - |
| 3.0534 | 1600 | 1.9973 | 0.9695 |
| 3.1489 | 1650 | 2.0266 | - |
| 3.2443 | 1700 | 2.0031 | - |
| 3.3397 | 1750 | 2.0033 | - |
| 3.4351 | 1800 | 1.9841 | 0.9710 |
| 3.5305 | 1850 | 1.9978 | - |
| 3.6260 | 1900 | 1.9635 | - |
| 3.7214 | 1950 | 1.954 | - |
| 3.8168 | 2000 | 1.9485 | 0.9705 |
| 3.9122 | 2050 | 1.966 | - |
| 4.0076 | 2100 | 1.95 | - |
| 4.1031 | 2150 | 1.8986 | - |
| 4.1985 | 2200 | 1.9088 | 0.9710 |
| 4.2939 | 2250 | 1.932 | - |
| 4.3893 | 2300 | 1.9031 | - |
| 4.4847 | 2350 | 1.9196 | - |
| 4.5802 | 2400 | 1.8926 | 0.9736 |
| 4.6756 | 2450 | 1.8905 | - |
| 4.7710 | 2500 | 1.8862 | - |
| 4.8664 | 2550 | 1.8986 | - |
| 4.9618 | 2600 | 1.9048 | 0.9730 |
| 5.0573 | 2650 | 1.8612 | - |
| 5.1527 | 2700 | 1.8464 | - |
| 5.2481 | 2750 | 1.8505 | - |
| 5.3435 | 2800 | 1.8815 | 0.9741 |
| 5.4389 | 2850 | 1.8752 | - |
| 5.5344 | 2900 | 1.8455 | - |
| 5.6298 | 2950 | 1.8356 | - |
| 5.7252 | 3000 | 1.842 | 0.9736 |
| 5.8206 | 3050 | 1.8675 | - |
| 5.9160 | 3100 | 1.8399 | - |
| 6.0115 | 3150 | 1.8019 | - |
| 6.1069 | 3200 | 1.8082 | 0.9771 |
| 6.2023 | 3250 | 1.7821 | - |
| 6.2977 | 3300 | 1.8269 | - |
| 6.3931 | 3350 | 1.8183 | - |
| 6.4885 | 3400 | 1.8138 | 0.9756 |
| 6.5840 | 3450 | 1.8044 | - |
| 6.6794 | 3500 | 1.8233 | - |
| 6.7748 | 3550 | 1.8037 | - |
| 6.8702 | 3600 | 1.7921 | 0.9756 |
| 6.9656 | 3650 | 1.8211 | - |
| 7.0611 | 3700 | 1.7421 | - |
| 7.1565 | 3750 | 1.7685 | - |
| 7.2519 | 3800 | 1.7615 | 0.9756 |
| 7.3473 | 3850 | 1.7571 | - |
| 7.4427 | 3900 | 1.7901 | - |
| 7.5382 | 3950 | 1.7746 | - |
| 7.6336 | 4000 | 1.7775 | 0.9756 |
| 7.7290 | 4050 | 1.7752 | - |
| 7.8244 | 4100 | 1.7688 | - |
| 7.9198 | 4150 | 1.7948 | - |
| 8.0153 | 4200 | 1.7649 | 0.9761 |
| 8.1107 | 4250 | 1.7176 | - |
| 8.2061 | 4300 | 1.7416 | - |
| 8.3015 | 4350 | 1.7579 | - |
| 8.3969 | 4400 | 1.745 | 0.9761 |
| 8.4924 | 4450 | 1.7731 | - |
| 8.5878 | 4500 | 1.7713 | - |
| 8.6832 | 4550 | 1.7188 | - |
| 8.7786 | 4600 | 1.7463 | 0.9761 |
| 8.8740 | 4650 | 1.7669 | - |
| 8.9695 | 4700 | 1.7648 | - |
| 9.0649 | 4750 | 1.7254 | - |
| 9.1603 | 4800 | 1.7173 | 0.9766 |
| 9.2557 | 4850 | 1.7451 | - |
| 9.3511 | 4900 | 1.7605 | - |
| 9.4466 | 4950 | 1.7449 | - |
| 9.5420 | 5000 | 1.7432 | 0.9761 |
| 9.6374 | 5050 | 1.7518 | - |
| 9.7328 | 5100 | 1.717 | - |
| 9.8282 | 5150 | 1.7456 | - |
| 9.9237 | 5200 | 1.7185 | 0.9761 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}