Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper • 1908.10084 • Published • 12
This is a sentence-transformers model trained. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 8192, 'do_lower_case': False, 'architecture': 'ModernBertModel'})
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'Who is responsible for the crane in sharjah?',
'[FIELD] custodian | [TABLE] tabAsset',
'[TABLE] tabQuality Inspection | desc: Quality Inspection Record',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ 1.0000, 0.6763, -0.0788],
# [ 0.6763, 1.0000, -0.0196],
# [-0.0788, -0.0196, 1.0000]])
evalEmbeddingSimilarityEvaluator| Metric | Value |
|---|---|
| pearson_cosine | nan |
| spearman_cosine | nan |
sentence1 and sentence2| sentence1 | sentence2 | |
|---|---|---|
| type | string | string |
| details |
|
|
| sentence1 | sentence2 |
|---|---|
which customers have pending follow-ups scheduled for today? |
[FIELD] status | [TABLE] tabLead | desc: Filters for leads requiring follow-up |
Give me all milestones due this week across active projects. |
[TABLE] tabMilestone Tracker | desc: Milestone tracking |
show me cement bags sent to site last week |
[FIELD] posting_date | [TABLE] tabDelivery Note | desc: Filters for last week. |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim",
"gather_across_devices": false
}
eval_strategy: epochper_device_train_batch_size: 32per_device_eval_batch_size: 32learning_rate: 2e-05num_train_epochs: 5warmup_ratio: 0.1warmup_steps: 0.1fp16: Truedataloader_drop_last: Truedataloader_num_workers: 2load_best_model_at_end: Truedo_predict: Falseeval_strategy: epochprediction_loss_only: Trueper_device_train_batch_size: 32per_device_eval_batch_size: 32gradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 2e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 5max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: Nonewarmup_ratio: 0.1warmup_steps: 0.1log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Trueenable_jit_checkpoint: Falsesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseuse_cpu: Falseseed: 42data_seed: Nonebf16: Falsefp16: Truebf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: -1ddp_backend: Nonedebug: []dataloader_drop_last: Truedataloader_num_workers: 2dataloader_prefetch_factor: Nonedisable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Trueignore_data_skip: Falsefsdp: []fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedeepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torch_fusedoptim_args: Nonegroup_by_length: Falselength_column_name: lengthproject: huggingfacetrackio_space_id: trackioddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Truepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_for_metrics: []eval_do_concat_batches: Trueauto_find_batch_size: Falsefull_determinism: Falseddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_num_input_tokens_seen: noneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Trueuse_cache: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: proportionalrouter_mapping: {}learning_rate_mapping: {}| Epoch | Step | Training Loss | eval_spearman_cosine |
|---|---|---|---|
| 0.0653 | 50 | 1.4199 | - |
| 0.1305 | 100 | 1.1590 | - |
| 0.1958 | 150 | 1.1426 | - |
| 0.2611 | 200 | 0.9971 | - |
| 0.3264 | 250 | 0.9385 | - |
| 0.3916 | 300 | 0.8183 | - |
| 0.4569 | 350 | 0.8028 | - |
| 0.5222 | 400 | 0.7773 | - |
| 0.5875 | 450 | 0.7353 | - |
| 0.6527 | 500 | 0.7948 | - |
| 0.7180 | 550 | 0.7055 | - |
| 0.7833 | 600 | 0.6792 | - |
| 0.8486 | 650 | 0.7091 | - |
| 0.9138 | 700 | 0.6767 | - |
| 0.9791 | 750 | 0.6608 | - |
| 1.0 | 766 | - | nan |
| 1.0444 | 800 | 0.5842 | - |
| 1.1097 | 850 | 0.5622 | - |
| 1.1749 | 900 | 0.5592 | - |
| 1.2402 | 950 | 0.5860 | - |
| 1.3055 | 1000 | 0.5701 | - |
| 1.3708 | 1050 | 0.5920 | - |
| 1.4360 | 1100 | 0.5430 | - |
| 1.5013 | 1150 | 0.5260 | - |
| 1.5666 | 1200 | 0.5249 | - |
| 1.6319 | 1250 | 0.5350 | - |
| 1.6971 | 1300 | 0.5543 | - |
| 1.7624 | 1350 | 0.5162 | - |
| 1.8277 | 1400 | 0.5190 | - |
| 1.8930 | 1450 | 0.5258 | - |
| 1.9582 | 1500 | 0.5090 | - |
| 2.0 | 1532 | - | nan |
| 2.0235 | 1550 | 0.5103 | - |
| 2.0888 | 1600 | 0.4416 | - |
| 2.1540 | 1650 | 0.4374 | - |
| 2.2193 | 1700 | 0.4092 | - |
| 2.2846 | 1750 | 0.4072 | - |
| 2.3499 | 1800 | 0.4321 | - |
| 2.4151 | 1850 | 0.4519 | - |
| 2.4804 | 1900 | 0.3977 | - |
| 2.5457 | 1950 | 0.4490 | - |
| 2.6110 | 2000 | 0.4542 | - |
| 2.6762 | 2050 | 0.3865 | - |
| 2.7415 | 2100 | 0.4113 | - |
| 2.8068 | 2150 | 0.4366 | - |
| 2.8721 | 2200 | 0.3925 | - |
| 2.9373 | 2250 | 0.4150 | - |
| 3.0 | 2298 | - | nan |
| 3.0026 | 2300 | 0.4228 | - |
| 3.0679 | 2350 | 0.3294 | - |
| 3.1332 | 2400 | 0.3256 | - |
| 3.1984 | 2450 | 0.3555 | - |
| 3.2637 | 2500 | 0.3770 | - |
| 3.3290 | 2550 | 0.3339 | - |
| 3.3943 | 2600 | 0.3764 | - |
| 3.4595 | 2650 | 0.3544 | - |
| 3.5248 | 2700 | 0.3567 | - |
| 3.5901 | 2750 | 0.3539 | - |
| 3.6554 | 2800 | 0.3585 | - |
| 3.7206 | 2850 | 0.3224 | - |
| 3.7859 | 2900 | 0.3383 | - |
| 3.8512 | 2950 | 0.3584 | - |
| 3.9164 | 3000 | 0.3584 | - |
| 3.9817 | 3050 | 0.3108 | - |
| 4.0 | 3064 | - | nan |
| 4.0470 | 3100 | 0.3361 | - |
| 4.1123 | 3150 | 0.3048 | - |
| 4.1775 | 3200 | 0.3154 | - |
| 4.2428 | 3250 | 0.2866 | - |
| 4.3081 | 3300 | 0.2981 | - |
| 4.3734 | 3350 | 0.3149 | - |
| 4.4386 | 3400 | 0.3128 | - |
| 4.5039 | 3450 | 0.3237 | - |
| 4.5692 | 3500 | 0.3054 | - |
| 4.6345 | 3550 | 0.3015 | - |
| 4.6997 | 3600 | 0.2885 | - |
| 4.7650 | 3650 | 0.2745 | - |
| 4.8303 | 3700 | 0.3062 | - |
| 4.8956 | 3750 | 0.2996 | - |
| 4.9608 | 3800 | 0.3058 | - |
| 5.0 | 3830 | - | nan |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}