Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper • 1908.10084 • Published • 12
This is a Cross Encoder model trained using the sentence-transformers library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import CrossEncoder
# Download from the 🤗 Hub
model = CrossEncoder("cross_encoder_model_id")
# Get scores for pairs of texts
pairs = [
['Khosrowabad, in the city where Pouya Tajik was born, is found in what county?', 'Khosrowabad, Tehran. Khosrowabad (, also Romanized as Khosrowābād) is a village in Jajrud Rural District, in the Jajrud District of Pardis County, Tehran Province, Iran. At the 2006 census, its population was 1,180, in 386 families. The village was chosen as the capital of Jajrud Rural District when it was created on December 29, 2012.'],
['Khosrowabad, in the city where Pouya Tajik was born, is found in what county?', 'The Prince and the Showgirl. The Prince and the Showgirl (originally called The Sleeping Prince) is a 1957 British - American romantic comedy film starring Marilyn Monroe and Laurence Olivier. Olivier also served as director and producer. The screenplay by Terence Rattigan was based on his 1953 stage play The Sleeping Prince. It was filmed in London.'],
['How many symbols are the same in the beginning of ASCII and the dominant scheme for internal processing?', "Star of David. The flag of Israel, depicting a blue Star of David on a white background, between two horizontal blue stripes was adopted on October 28, 1948, five months after the country's establishment. The origins of the flag's design date from the First Zionist Congress in 1897; the flag has subsequently been known as the ``flag of Zion ''."],
['Margraviate of the country of the Botanical Garden of the place Josef Victor Rohon was educated is an instance of?', 'Botanical Garden of the University of Vienna. The Botanical Garden of the University of Vienna is a botanical garden in Vienna, Austria. It covers 8 hectares and is immediately adjacent to the Belvedere gardens. It is a part of the University of Vienna.'],
["Who was the singer of I Can't Sleep at Night in Home and Away?", 'I Can\'t Sleep at Night. "I Can\'t Sleep at Night" was written by Dannii Minogue, Rob Davis and Jewels & Stone for Minogue\'s fifth studio album "Club Disco" and included on the greatest hits compilation, "The Hits & Beyond" (2006). On 8 January 2007, the song and its remixes were released as a digital download in Australia, the United Kingdom and North America. The Radio Edit of the song features minor mixing and production differences and is the version featured in the music video.'],
]
scores = model.predict(pairs)
print(scores.shape)
# (5,)
# Or rank different texts based on similarity to a single text
ranks = model.rank(
'Khosrowabad, in the city where Pouya Tajik was born, is found in what county?',
[
'Khosrowabad, Tehran. Khosrowabad (, also Romanized as Khosrowābād) is a village in Jajrud Rural District, in the Jajrud District of Pardis County, Tehran Province, Iran. At the 2006 census, its population was 1,180, in 386 families. The village was chosen as the capital of Jajrud Rural District when it was created on December 29, 2012.',
'The Prince and the Showgirl. The Prince and the Showgirl (originally called The Sleeping Prince) is a 1957 British - American romantic comedy film starring Marilyn Monroe and Laurence Olivier. Olivier also served as director and producer. The screenplay by Terence Rattigan was based on his 1953 stage play The Sleeping Prince. It was filmed in London.',
"Star of David. The flag of Israel, depicting a blue Star of David on a white background, between two horizontal blue stripes was adopted on October 28, 1948, five months after the country's establishment. The origins of the flag's design date from the First Zionist Congress in 1897; the flag has subsequently been known as the ``flag of Zion ''.",
'Botanical Garden of the University of Vienna. The Botanical Garden of the University of Vienna is a botanical garden in Vienna, Austria. It covers 8 hectares and is immediately adjacent to the Belvedere gardens. It is a part of the University of Vienna.',
'I Can\'t Sleep at Night. "I Can\'t Sleep at Night" was written by Dannii Minogue, Rob Davis and Jewels & Stone for Minogue\'s fifth studio album "Club Disco" and included on the greatest hits compilation, "The Hits & Beyond" (2006). On 8 January 2007, the song and its remixes were released as a digital download in Australia, the United Kingdom and North America. The Radio Edit of the song features minor mixing and production differences and is the version featured in the music video.',
]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
validationCEBinaryClassificationEvaluator| Metric | Value |
|---|---|
| accuracy | 0.9648 |
| accuracy_threshold | 0.0556 |
| f1 | 0.9447 |
| f1_threshold | 0.0177 |
| precision | 0.9569 |
| recall | 0.9328 |
| average_precision | 0.9824 |
sentence_0, sentence_1, and label| sentence_0 | sentence_1 | label | |
|---|---|---|---|
| type | string | string | float |
| details |
|
|
|
| sentence_0 | sentence_1 | label |
|---|---|---|
Khosrowabad, in the city where Pouya Tajik was born, is found in what county? |
Khosrowabad, Tehran. Khosrowabad (, also Romanized as Khosrowābād) is a village in Jajrud Rural District, in the Jajrud District of Pardis County, Tehran Province, Iran. At the 2006 census, its population was 1,180, in 386 families. The village was chosen as the capital of Jajrud Rural District when it was created on December 29, 2012. |
1.0 |
Khosrowabad, in the city where Pouya Tajik was born, is found in what county? |
The Prince and the Showgirl. The Prince and the Showgirl (originally called The Sleeping Prince) is a 1957 British - American romantic comedy film starring Marilyn Monroe and Laurence Olivier. Olivier also served as director and producer. The screenplay by Terence Rattigan was based on his 1953 stage play The Sleeping Prince. It was filmed in London. |
0.0 |
How many symbols are the same in the beginning of ASCII and the dominant scheme for internal processing? |
Star of David. The flag of Israel, depicting a blue Star of David on a white background, between two horizontal blue stripes was adopted on October 28, 1948, five months after the country's establishment. The origins of the flag's design date from the First Zionist Congress in 1897; the flag has subsequently been known as the ``flag of Zion ''. |
0.0 |
BinaryCrossEntropyLoss with these parameters:{
"activation_fn": "torch.nn.modules.linear.Identity",
"pos_weight": null
}
per_device_train_batch_size: 4per_device_eval_batch_size: 4overwrite_output_dir: Falsedo_predict: Falseeval_strategy: noprediction_loss_only: Trueper_device_train_batch_size: 4per_device_eval_batch_size: 4per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 3max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Falsehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseeval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseeval_use_gather_object: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: proportionalrouter_mapping: {}learning_rate_mapping: {}| Epoch | Step | validation_average_precision |
|---|---|---|
| 1.0 | 13 | 0.9827 |
| 2.0 | 26 | 0.9826 |
| 3.0 | 39 | 0.9824 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}