Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper • 1908.10084 • Published • 13
This is a sentence-transformers model finetuned from answerdotai/ModernBERT-base on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: ModernBertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("LequeuISIR/ModernBERT-base-DPR-8e-05")
# Run inference
sentences = [
'This incites social hatred, threatens economic and social stability, and undermines trust in the authorities.',
'\xa0The conditions for a healthy entrepreneurship, where the most innovative and creative win and where the source of enrichment cannot be property speculation or guilds and networks. ',
'As a result, the profits of the oligarchs are more than 400 times what our entire country gets from the exploitation of natural resources.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
sentence1, sentence2, and label| sentence1 | sentence2 | label | |
|---|---|---|---|
| type | string | string | int |
| details |
|
|
|
| sentence1 | sentence2 | label |
|---|---|---|
There have also been other important structural changes in the countryside, which have come together to form this new, as yet unknown, country. |
Meanwhile, investment, which is the way to increase production, employment capacity and competitiveness of the economy, fell from 20% of output in 1974 to only 11.8% on average between 1984 and 1988. |
0 |
Introduce new visa categories so we can be responsive to humanitarian needs and incentivise greater investment in our domestic infrastructure and regional economies |
The purpose of the project is to design and implement public policies aimed at achieving greater and faster inclusion of immigrants. |
2 |
and economic crimes that seriously and generally affect the fundamental rights of individuals and the international community as a whole. |
For the first time in the history, not only of Ecuador, but of the entire world, a government promoted a public audit process of the foreign debt and declared some of its tranches illegitimate and immoral. |
0 |
CoSENTLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "pairwise_cos_sim"
}
sentence1, sentence2, and label| sentence1 | sentence2 | label | |
|---|---|---|---|
| type | string | string | int |
| details |
|
|
|
| sentence1 | sentence2 | label |
|---|---|---|
The anchoring of the Slovak Republic in the European Union allows citizens to feel: secure politically, secure economically, secure socially. |
Radikale Venstre wants Denmark to participate fully and firmly in EU cooperation on immigration, asylum and cross-border crime. |
2 |
Portugal's participation in the Community's negotiation of the next financial perspective should also be geared in the same direction. |
Given the dynamic international framework, safeguarding the national interest requires adjustments to each of these vectors. |
2 |
On asylum, the Green Party will: Dismantle the direct provision system and replace it with an efficient and humane system for determining the status of asylum seekers |
The crisis in the coal sector subsequently forced these immigrant workers to move into other economic sectors such as metallurgy, chemicals, construction and transport. |
2 |
CoSENTLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "pairwise_cos_sim"
}
per_device_train_batch_size: 64per_device_eval_batch_size: 64learning_rate: 8e-05num_train_epochs: 5warmup_ratio: 0.05bf16: Truebatch_sampler: no_duplicatesoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: noprediction_loss_only: Trueper_device_train_batch_size: 64per_device_eval_batch_size: 64per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 8e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 5max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.05warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Truefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportional| Epoch | Step | Training Loss |
|---|---|---|
| 0.0837 | 500 | 6.425 |
| 0.1673 | 1000 | 6.0308 |
| 0.2510 | 1500 | 5.9522 |
| 0.3346 | 2000 | 5.7818 |
| 0.4183 | 2500 | 5.7122 |
| 0.5019 | 3000 | 5.6378 |
| 0.5856 | 3500 | 5.5503 |
| 0.6692 | 4000 | 5.4429 |
| 0.7529 | 4500 | 5.4246 |
| 0.8365 | 5000 | 5.3536 |
| 0.9202 | 5500 | 5.4072 |
| 1.0038 | 6000 | 5.3033 |
| 1.0875 | 6500 | 4.7611 |
| 1.1712 | 7000 | 4.7535 |
| 1.2548 | 7500 | 4.7503 |
| 1.3385 | 8000 | 4.7453 |
| 1.4221 | 8500 | 4.7413 |
| 1.5058 | 9000 | 4.6753 |
| 1.5894 | 9500 | 4.67 |
| 1.6731 | 10000 | 4.7352 |
| 1.7567 | 10500 | 4.7164 |
| 1.8404 | 11000 | 4.6784 |
| 1.9240 | 11500 | 4.651 |
| 2.0077 | 12000 | 4.5708 |
| 2.0914 | 12500 | 3.6274 |
| 2.1750 | 13000 | 3.5683 |
| 2.2587 | 13500 | 3.7028 |
| 2.3423 | 14000 | 3.5859 |
| 2.4260 | 14500 | 3.6872 |
| 2.5096 | 15000 | 3.5148 |
| 2.5933 | 15500 | 3.7241 |
| 2.6769 | 16000 | 3.5983 |
| 2.7606 | 16500 | 3.6269 |
| 2.8442 | 17000 | 3.6078 |
| 2.9279 | 17500 | 3.6292 |
| 3.0115 | 18000 | 3.5151 |
| 3.0952 | 18500 | 2.5933 |
| 3.1789 | 19000 | 2.599 |
| 3.2625 | 19500 | 2.5598 |
| 3.3462 | 20000 | 2.5577 |
| 3.4298 | 20500 | 2.5827 |
| 3.5135 | 21000 | 2.5598 |
| 3.5971 | 21500 | 2.4173 |
| 3.6808 | 22000 | 2.5884 |
| 3.7644 | 22500 | 2.4313 |
| 3.8481 | 23000 | 2.5669 |
| 3.9317 | 23500 | 2.5162 |
| 4.0154 | 24000 | 2.2531 |
| 4.0990 | 24500 | 1.3758 |
| 4.1827 | 25000 | 1.5491 |
| 4.2664 | 25500 | 1.4933 |
| 4.3500 | 26000 | 1.5139 |
| 4.4337 | 26500 | 1.4607 |
| 4.5173 | 27000 | 1.6117 |
| 4.6010 | 27500 | 1.5395 |
| 4.6846 | 28000 | 1.493 |
| 4.7683 | 28500 | 1.3984 |
| 4.8519 | 29000 | 1.4183 |
| 4.9356 | 29500 | 1.3517 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@online{kexuefm-8847,
title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
author={Su Jianlin},
year={2022},
month={Jan},
url={https://kexue.fm/archives/8847},
}
Base model
answerdotai/ModernBERT-base
from sentence_transformers import SentenceTransformer model = SentenceTransformer("LequeuISIR/ModernBERT-base-DPR-8e-05") sentences = [ "However, its underutilization is mainly due to the absence of a concrete and coherent dissemination strategy.", "At the same time, they need to understand that living in Europe brings great responsibilities in addition to great benefits.", "The mainstay of any intelligent and patriotic mineral policy can be summed up in the following postulate: \"since minerals are exhaustible, they should only be exploited with the maximum return for the economy of the country where they are mined\".", "We must move quickly to a shared sustainable energy supply, sustainable transportation and clean air." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4]