Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper • 1908.10084 • Published • 12
This is a sentence-transformers model finetuned from Rajan/NepaliBERT. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("syubraj/sentence_similarity_nepali_v2")
# Run inference
sentences = [
'रातो, डबल डेकर बस।',
'रातो डबल डेकर बस।',
'दुई कालो कुकुर हिउँमा हिंड्दै।',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
stsb-dev-nepaliEmbeddingSimilarityEvaluator| Metric | Value |
|---|---|
| pearson_cosine | 0.6971 |
| spearman_cosine | 0.6623 |
| pearson_manhattan | 0.6332 |
| spearman_manhattan | 0.6079 |
| pearson_euclidean | 0.634 |
| spearman_euclidean | 0.609 |
| pearson_dot | 0.4848 |
| spearman_dot | 0.5306 |
| pearson_max | 0.6971 |
| spearman_max | 0.6623 |
sentence_0, sentence_1, and label| sentence_0 | sentence_1 | label | |
|---|---|---|---|
| type | string | string | float |
| details |
|
|
|
| sentence_0 | sentence_1 | label |
|---|---|---|
एक व्यक्ति प्याज काट्दै छ। |
एउटा बिरालो शौचालयमा पपिङ गर्दैछ। |
0.0 |
क्यानडाको तेल रेल विस्फोटमा थप मृत्यु हुने अपेक्षा गरिएको छ |
क्यानडामा रेल दुर्घटनामा पाँच जनाको मृत्यु भएको छ |
0.5599999904632569 |
एउटी महिला झिंगा माझ्दै छिन्। |
एउटी महिला केही झिंगा माझ्दै। |
1.0 |
CosineSimilarityLoss with these parameters:{
"loss_fct": "torch.nn.modules.loss.MSELoss"
}
eval_strategy: stepsper_device_train_batch_size: 16per_device_eval_batch_size: 16num_train_epochs: 100multi_dataset_batch_sampler: round_robinoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 16per_device_eval_batch_size: 16per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 100max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Falsehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseeval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falsebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robin| Epoch | Step | Training Loss | stsb-dev-nepali_spearman_max |
|---|---|---|---|
| 1.0 | 288 | - | 0.5355 |
| 1.7361 | 500 | 0.0723 | - |
| 2.0 | 576 | - | 0.5794 |
| 3.0 | 864 | - | 0.6108 |
| 3.4722 | 1000 | 0.047 | 0.6147 |
| 4.0 | 1152 | - | 0.6259 |
| 5.0 | 1440 | - | 0.6356 |
| 5.2083 | 1500 | 0.034 | - |
| 6.0 | 1728 | - | 0.6329 |
| 6.9444 | 2000 | 0.0217 | 0.6375 |
| 7.0 | 2016 | - | 0.6382 |
| 8.0 | 2304 | - | 0.6468 |
| 8.6806 | 2500 | 0.0137 | - |
| 9.0 | 2592 | - | 0.6348 |
| 10.0 | 2880 | - | 0.6332 |
| 10.4167 | 3000 | 0.0102 | 0.6427 |
| 11.0 | 3168 | - | 0.6370 |
| 12.0 | 3456 | - | 0.6515 |
| 12.1528 | 3500 | 0.0084 | - |
| 13.0 | 3744 | - | 0.6546 |
| 13.8889 | 4000 | 0.0069 | 0.6400 |
| 14.0 | 4032 | - | 0.6610 |
| 15.0 | 4320 | - | 0.6495 |
| 15.625 | 4500 | 0.006 | - |
| 16.0 | 4608 | - | 0.6574 |
| 17.0 | 4896 | - | 0.6486 |
| 17.3611 | 5000 | 0.0053 | 0.6589 |
| 18.0 | 5184 | - | 0.6592 |
| 19.0 | 5472 | - | 0.6488 |
| 19.0972 | 5500 | 0.0047 | - |
| 20.0 | 5760 | - | 0.6436 |
| 20.8333 | 6000 | 0.0044 | 0.6576 |
| 21.0 | 6048 | - | 0.6515 |
| 22.0 | 6336 | - | 0.6541 |
| 22.5694 | 6500 | 0.0041 | - |
| 23.0 | 6624 | - | 0.6549 |
| 24.0 | 6912 | - | 0.6571 |
| 24.3056 | 7000 | 0.0037 | 0.6603 |
| 25.0 | 7200 | - | 0.6699 |
| 26.0 | 7488 | - | 0.6653 |
| 26.0417 | 7500 | 0.0037 | - |
| 27.0 | 7776 | - | 0.6609 |
| 27.7778 | 8000 | 0.0033 | 0.6578 |
| 28.0 | 8064 | - | 0.6606 |
| 29.0 | 8352 | - | 0.6614 |
| 29.5139 | 8500 | 0.0031 | - |
| 30.0 | 8640 | - | 0.6579 |
| 31.0 | 8928 | - | 0.6688 |
| 31.25 | 9000 | 0.0028 | 0.6650 |
| 32.0 | 9216 | - | 0.6639 |
| 32.9861 | 9500 | 0.0027 | - |
| 33.0 | 9504 | - | 0.6624 |
| 34.0 | 9792 | - | 0.6646 |
| 34.7222 | 10000 | 0.0025 | 0.6530 |
| 35.0 | 10080 | - | 0.6587 |
| 36.0 | 10368 | - | 0.6671 |
| 36.4583 | 10500 | 0.0025 | - |
| 37.0 | 10656 | - | 0.6614 |
| 38.0 | 10944 | - | 0.6602 |
| 38.1944 | 11000 | 0.0024 | 0.6576 |
| 39.0 | 11232 | - | 0.6665 |
| 39.9306 | 11500 | 0.0023 | - |
| 40.0 | 11520 | - | 0.6663 |
| 41.0 | 11808 | - | 0.6734 |
| 41.6667 | 12000 | 0.0021 | 0.6633 |
| 42.0 | 12096 | - | 0.6667 |
| 43.0 | 12384 | - | 0.6679 |
| 43.4028 | 12500 | 0.002 | - |
| 44.0 | 12672 | - | 0.6701 |
| 45.0 | 12960 | - | 0.6650 |
| 45.1389 | 13000 | 0.0019 | 0.6680 |
| 46.0 | 13248 | - | 0.6631 |
| 46.875 | 13500 | 0.0018 | - |
| 47.0 | 13536 | - | 0.6643 |
| 48.0 | 13824 | - | 0.6631 |
| 48.6111 | 14000 | 0.0017 | 0.6648 |
| 49.0 | 14112 | - | 0.6648 |
| 50.0 | 14400 | - | 0.6619 |
| 50.3472 | 14500 | 0.0017 | - |
| 51.0 | 14688 | - | 0.6633 |
| 52.0 | 14976 | - | 0.6622 |
| 52.0833 | 15000 | 0.0016 | 0.6612 |
| 53.0 | 15264 | - | 0.6670 |
| 53.8194 | 15500 | 0.0015 | - |
| 54.0 | 15552 | - | 0.6618 |
| 55.0 | 15840 | - | 0.6641 |
| 55.5556 | 16000 | 0.0015 | 0.6617 |
| 56.0 | 16128 | - | 0.6669 |
| 57.0 | 16416 | - | 0.6645 |
| 57.2917 | 16500 | 0.0014 | - |
| 58.0 | 16704 | - | 0.6642 |
| 59.0 | 16992 | - | 0.6579 |
| 59.0278 | 17000 | 0.0013 | 0.6592 |
| 60.0 | 17280 | - | 0.6589 |
| 60.7639 | 17500 | 0.0014 | - |
| 61.0 | 17568 | - | 0.6685 |
| 62.0 | 17856 | - | 0.6673 |
| 62.5 | 18000 | 0.0012 | 0.6669 |
| 63.0 | 18144 | - | 0.6665 |
| 64.0 | 18432 | - | 0.6626 |
| 64.2361 | 18500 | 0.0012 | - |
| 65.0 | 18720 | - | 0.6619 |
| 65.9722 | 19000 | 0.0012 | 0.6643 |
| 66.0 | 19008 | - | 0.6651 |
| 67.0 | 19296 | - | 0.6628 |
| 67.7083 | 19500 | 0.0011 | - |
| 68.0 | 19584 | - | 0.6658 |
| 69.0 | 19872 | - | 0.6615 |
| 69.4444 | 20000 | 0.0011 | 0.6627 |
| 70.0 | 20160 | - | 0.6657 |
| 71.0 | 20448 | - | 0.6663 |
| 71.1806 | 20500 | 0.0011 | - |
| 72.0 | 20736 | - | 0.6634 |
| 72.9167 | 21000 | 0.001 | 0.6649 |
| 73.0 | 21024 | - | 0.6632 |
| 74.0 | 21312 | - | 0.6658 |
| 74.6528 | 21500 | 0.001 | - |
| 75.0 | 21600 | - | 0.6639 |
| 76.0 | 21888 | - | 0.6601 |
| 76.3889 | 22000 | 0.001 | 0.6623 |
| 77.0 | 22176 | - | 0.6607 |
| 78.0 | 22464 | - | 0.6613 |
| 78.125 | 22500 | 0.0009 | - |
| 79.0 | 22752 | - | 0.6613 |
| 79.8611 | 23000 | 0.0009 | 0.6615 |
| 80.0 | 23040 | - | 0.6615 |
| 81.0 | 23328 | - | 0.6617 |
| 81.5972 | 23500 | 0.0008 | - |
| 82.0 | 23616 | - | 0.6604 |
| 83.0 | 23904 | - | 0.6605 |
| 83.3333 | 24000 | 0.0008 | 0.6602 |
| 84.0 | 24192 | - | 0.6628 |
| 85.0 | 24480 | - | 0.6603 |
| 85.0694 | 24500 | 0.0008 | - |
| 86.0 | 24768 | - | 0.6602 |
| 86.8056 | 25000 | 0.0008 | 0.6592 |
| 87.0 | 25056 | - | 0.6611 |
| 88.0 | 25344 | - | 0.6612 |
| 88.5417 | 25500 | 0.0008 | - |
| 89.0 | 25632 | - | 0.6607 |
| 90.0 | 25920 | - | 0.6598 |
| 90.2778 | 26000 | 0.0008 | 0.6607 |
| 91.0 | 26208 | - | 0.6615 |
| 92.0 | 26496 | - | 0.6615 |
| 92.0139 | 26500 | 0.0007 | - |
| 93.0 | 26784 | - | 0.6609 |
| 93.75 | 27000 | 0.0007 | 0.6607 |
| 94.0 | 27072 | - | 0.6612 |
| 95.0 | 27360 | - | 0.6624 |
| 95.4861 | 27500 | 0.0007 | - |
| 96.0 | 27648 | - | 0.6627 |
| 97.0 | 27936 | - | 0.6618 |
| 97.2222 | 28000 | 0.0007 | 0.6619 |
| 98.0 | 28224 | - | 0.6621 |
| 98.9583 | 28500 | 0.0007 | - |
| 99.0 | 28512 | - | 0.6623 |
| 100.0 | 28800 | - | 0.6623 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
Base model
Rajan/NepaliBERT