Sentence Similarity
sentence-transformers
Safetensors
English
distilbert
feature-extraction
Generated from Trainer
dataset_size:404290
loss:OnlineContrastiveLoss
Eval Results (legacy)
text-embeddings-inference
Instructions to use omega5505/stsb-distilbert-base-ocl with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use omega5505/stsb-distilbert-base-ocl with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("omega5505/stsb-distilbert-base-ocl") sentences = [ "Why Modi is putting a ban on 500 and 1000 notes?", "Why making multiple fake accounts on Quora is illegal?", "What are the advantages of the decision taken by the Government of India to scrap out 500 and 1000 rupees notes?", "Why should I go for internships?" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Notebooks
- Google Colab
- Kaggle
metadata
language:
- en
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:404290
- loss:OnlineContrastiveLoss
base_model: sentence-transformers/stsb-distilbert-base
widget:
- source_sentence: Why Modi is putting a ban on 500 and 1000 notes?
sentences:
- Why making multiple fake accounts on Quora is illegal?
- >-
What are the advantages of the decision taken by the Government of India
to scrap out 500 and 1000 rupees notes?
- Why should I go for internships?
- source_sentence: Where can I buy cheap t-shirts?
sentences:
- Where can I buy cheap wholesale t-shirts?
- How can I make money from a blog?
- What are the best places to shop in Charleston, SC?
- source_sentence: What are the most important mobile applications?
sentences:
- How can I tell if my wife's vagina had a bigger penis inside?
- What is the most important apps in your phone?
- >-
What do you think Ned Stark would have done or said to Jon Snow if he
was able to join the Night’s Watch or escaped his beheading?
- source_sentence: What is the whole process for making Android games with high graphics?
sentences:
- What lf I don't accept Jesus as God?
- >-
I have to masturbate3 times to feel an orgasm sometimes only2 times what
is wrong with me I went to the doctor and they do not believe meWhat's
wrong?
- What does a healthy diet consist of?
- source_sentence: Why do so many religious people believe in healing miracles?
sentences:
- Is Warframe better than Destiny?
- What do you like about China?
- Is believing in God a bad thing?
datasets:
- sentence-transformers/quora-duplicates
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- cosine_accuracy
- cosine_accuracy_threshold
- cosine_f1
- cosine_f1_threshold
- cosine_precision
- cosine_recall
- cosine_ap
- cosine_mcc
- average_precision
- f1
- precision
- recall
- threshold
- cosine_accuracy@1
- cosine_accuracy@3
- cosine_accuracy@5
- cosine_accuracy@10
- cosine_precision@1
- cosine_precision@3
- cosine_precision@5
- cosine_precision@10
- cosine_recall@1
- cosine_recall@3
- cosine_recall@5
- cosine_recall@10
- cosine_ndcg@10
- cosine_mrr@10
- cosine_map@100
model-index:
- name: SentenceTransformer based on sentence-transformers/stsb-distilbert-base
results:
- task:
type: binary-classification
name: Binary Classification
dataset:
name: quora duplicates
type: quora-duplicates
metrics:
- type: cosine_accuracy
value: 0.877
name: Cosine Accuracy
- type: cosine_accuracy_threshold
value: 0.7857047319412231
name: Cosine Accuracy Threshold
- type: cosine_f1
value: 0.8516284680337757
name: Cosine F1
- type: cosine_f1_threshold
value: 0.774639368057251
name: Cosine F1 Threshold
- type: cosine_precision
value: 0.8209302325581396
name: Cosine Precision
- type: cosine_recall
value: 0.8847117794486216
name: Cosine Recall
- type: cosine_ap
value: 0.8988328505183655
name: Cosine Ap
- type: cosine_mcc
value: 0.7483655051498526
name: Cosine Mcc
- task:
type: paraphrase-mining
name: Paraphrase Mining
dataset:
name: quora duplicates dev
type: quora-duplicates-dev
metrics:
- type: average_precision
value: 0.5483042026376685
name: Average Precision
- type: f1
value: 0.5606415792720543
name: F1
- type: precision
value: 0.5539301735907939
name: Precision
- type: recall
value: 0.5675176100314733
name: Recall
- type: threshold
value: 0.8631762564182281
name: Threshold
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: Unknown
type: unknown
metrics:
- type: cosine_accuracy@1
value: 0.9308
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.969
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.9778
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.9854
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.9308
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.4145333333333333
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.26696000000000003
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.14144
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.8008592901379665
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.9314231047351341
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.9558165998609235
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.9743579383296442
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.9511384841680516
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.9511976190476192
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.939071878001028
name: Cosine Map@100
SentenceTransformer based on sentence-transformers/stsb-distilbert-base
This is a sentence-transformers model finetuned from sentence-transformers/stsb-distilbert-base on the quora-duplicates dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: sentence-transformers/stsb-distilbert-base
- Maximum Sequence Length: 128 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
- Training Dataset:
- Language: en
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: DistilBertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("omega5505/stsb-distilbert-base-ocl")
# Run inference
sentences = [
'Why do so many religious people believe in healing miracles?',
'Is believing in God a bad thing?',
'What do you like about China?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Binary Classification
- Dataset:
quora-duplicates - Evaluated with
BinaryClassificationEvaluator
| Metric | Value |
|---|---|
| cosine_accuracy | 0.877 |
| cosine_accuracy_threshold | 0.7857 |
| cosine_f1 | 0.8516 |
| cosine_f1_threshold | 0.7746 |
| cosine_precision | 0.8209 |
| cosine_recall | 0.8847 |
| cosine_ap | 0.8988 |
| cosine_mcc | 0.7484 |
Paraphrase Mining
- Dataset:
quora-duplicates-dev - Evaluated with
ParaphraseMiningEvaluator
| Metric | Value |
|---|---|
| average_precision | 0.5483 |
| f1 | 0.5606 |
| precision | 0.5539 |
| recall | 0.5675 |
| threshold | 0.8632 |
Information Retrieval
- Evaluated with
InformationRetrievalEvaluator
| Metric | Value |
|---|---|
| cosine_accuracy@1 | 0.9308 |
| cosine_accuracy@3 | 0.969 |
| cosine_accuracy@5 | 0.9778 |
| cosine_accuracy@10 | 0.9854 |
| cosine_precision@1 | 0.9308 |
| cosine_precision@3 | 0.4145 |
| cosine_precision@5 | 0.267 |
| cosine_precision@10 | 0.1414 |
| cosine_recall@1 | 0.8009 |
| cosine_recall@3 | 0.9314 |
| cosine_recall@5 | 0.9558 |
| cosine_recall@10 | 0.9744 |
| cosine_ndcg@10 | 0.9511 |
| cosine_mrr@10 | 0.9512 |
| cosine_map@100 | 0.9391 |
Training Details
Training Dataset
quora-duplicates
- Dataset: quora-duplicates at 451a485
- Size: 404,290 training samples
- Columns:
sentence1,sentence2, andlabel - Approximate statistics based on the first 1000 samples:
sentence1 sentence2 label type string string int details - min: 6 tokens
- mean: 15.73 tokens
- max: 65 tokens
- min: 6 tokens
- mean: 15.93 tokens
- max: 85 tokens
- 0: ~61.60%
- 1: ~38.40%
- Samples:
sentence1 sentence2 label How can Trump supporters claim he didn't mock a disabled reporter when there is live footage of him mocking a disabled reporter?Why don't people actually watch the Trump video of him allegedly mocking a disabled reporter?0Where can I get the best digital marketing course (online & offline) in India?Which is the best digital marketing institute for professionals in India?1What best two liner shayri?What does "senile dementia, uncomplicated" mean in medical terms?0 - Loss:
OnlineContrastiveLoss
Evaluation Dataset
quora-duplicates
- Dataset: quora-duplicates at 451a485
- Size: 404,290 evaluation samples
- Columns:
sentence1,sentence2, andlabel - Approximate statistics based on the first 1000 samples:
sentence1 sentence2 label type string string int details - min: 6 tokens
- mean: 16.14 tokens
- max: 70 tokens
- min: 6 tokens
- mean: 15.92 tokens
- max: 74 tokens
- 0: ~60.10%
- 1: ~39.90%
- Samples:
sentence1 sentence2 label What are some must subscribe RSS feeds?What are RSS feeds?0How close are Madonna and Hillary Clinton?Why do people say Hillary Clinton is a crook?0Can you share best day of your life?What is the Best Day of your life till date?1 - Loss:
OnlineContrastiveLoss
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy: stepsper_device_train_batch_size: 64per_device_eval_batch_size: 64num_train_epochs: 1warmup_ratio: 0.1fp16: Truebatch_sampler: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 64per_device_eval_batch_size: 64per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 1max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Truefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Falsehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseeval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters:auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseeval_use_gather_object: Falseprompts: Nonebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportional
Training Logs
| Epoch | Step | Training Loss | Validation Loss | quora-duplicates_cosine_ap | quora-duplicates-dev_average_precision | cosine_ndcg@10 |
|---|---|---|---|---|---|---|
| 0 | 0 | - | - | 0.7458 | 0.4200 | 0.9390 |
| 0.0640 | 100 | 2.5263 | - | - | - | - |
| 0.1280 | 200 | 2.1489 | - | - | - | - |
| 0.1599 | 250 | - | 1.8621 | 0.8433 | 0.3907 | 0.9329 |
| 0.1919 | 300 | 2.0353 | - | - | - | - |
| 0.2559 | 400 | 1.7831 | - | - | - | - |
| 0.3199 | 500 | 1.8887 | 1.7744 | 0.8662 | 0.4924 | 0.9379 |
| 0.3839 | 600 | 1.7814 | - | - | - | - |
| 0.4479 | 700 | 1.7775 | - | - | - | - |
| 0.4798 | 750 | - | 1.6468 | 0.8766 | 0.4945 | 0.9399 |
| 0.5118 | 800 | 1.6835 | - | - | - | - |
| 0.5758 | 900 | 1.6974 | - | - | - | - |
| 0.6398 | 1000 | 1.5704 | 1.4925 | 0.8895 | 0.5283 | 0.9460 |
| 0.7038 | 1100 | 1.6771 | - | - | - | - |
| 0.7678 | 1200 | 1.619 | - | - | - | - |
| 0.7997 | 1250 | - | 1.4311 | 0.8982 | 0.5252 | 0.9466 |
| 0.8317 | 1300 | 1.6119 | - | - | - | - |
| 0.8957 | 1400 | 1.6043 | - | - | - | - |
| 0.9597 | 1500 | 1.6848 | 1.4070 | 0.8988 | 0.5483 | 0.9511 |
Framework Versions
- Python: 3.9.18
- Sentence Transformers: 3.4.1
- Transformers: 4.44.2
- PyTorch: 2.2.1+cu121
- Accelerate: 1.3.0
- Datasets: 2.19.0
- Tokenizers: 0.19.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}