Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper • 1908.10084 • Published • 12
This is a sentence-transformers model finetuned from microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'BertModel'})
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'AbstractBackgroundCoronavirus disease 2019 (COVID-19) is a strong risk factor for venous thromboembolism (VTE). Few studies have evaluated the effectiveness of COVID-19 vaccination in preventing hospitalization for COVID-19 with VTE.MethodsAdults hospitalized at 21 sites between March 2021 and October 2022 with symptoms of acute respiratory illness were assessed for COVID-19, completion of the original monovalent messenger RNA (mRNA) COVID-19 vaccination series, and VTE. Prevalence of VTE was compared between unvaccinated and vaccinated patients with COVID-19. The vaccine effectiveness (VE) in preventing COVID-19 hospitalization with VTE was calculated using a test-negative design. The VE was also stratified by predominant circulating severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variant.ResultsAmong 18 811 patients (median age [interquartile range], 63 [50–73] years; 49% women; 59% non-Hispanic white, 20% non-Hispanic black, and 14% Hispanic; and median of 2 comorbid conditions [interquartile range, 1–3]), 9792 were admitted with COVID-19 (44% vaccinated), and 9019 were test-negative controls (73% vaccinated). Among patients with COVID-19, 601 had VTE diagnosed by hospital day 28, of whom 170 were vaccinated. VTE was more common among unvaccinated than vaccinated patients with COVID-19 (7.8% vs 4.0%; P = .001). The VE against COVID-19 hospitalization with VTE was 84% overall (95% confidence interval, 80%–87%), and VE stratified by predominant circulating variant was 88% (73%–95%) for Alpha, 93% (90%–95%) for Delta, and 68% (58%–76%) for Omicron variants.ConclusionsVaccination with the original monovalent mRNA series was associated with a decrease in COVID-19 hospitalization with VTE, though data detailing prior history of VTE and use of anticoagulation were not available. These findings will inform risk-benefit considerations for those considering vaccination.',
'PurposeProgrammed death-ligand 1 (PD-L1) expression may influence the prognosis of patients with localized esophageal cancer. The current study compared the prognostic value of PD-L1 expression between tumor cells and immune cells.MethodsArchival esophageal tumor tissue samples were collected from patients who received paclitaxel and cisplatin-based neoadjuvant chemoradiotherapy (CRT) for locally advanced esophageal squamous cell carcinoma (ESCC) in three prospective phase II trials. PD-L1 expression on tumor and immune cells was examined immunohistochemically by using the SP142 antibody and scored by two independent pathologists. The association of PD-L1 expression with patient’s outcomes was analyzed using a log-rank test and Cox regression multivariate analysis.ResultsA total of 100 patients were included. PD-L1 expression on tumor cells was positive (≥ 1%, TC-positive) in 55 patients; PD-L1 expression on immune cells was high (≥ 5%, IC-high) in 30 patients. TC-positive status was associated with poor overall survival (OS) (HR: 1.63, P = 0.035), whereas IC-high status was associated with improved OS (HR: 0.44, P = 0.0024). Multivariate analysis revealed that TC-positive, IC-high, and performance status were independent prognostic factors for progression-free survival and that IC-high and performance status were independent factors for OS. Furthermore, the combination of IC-high and TC-negative status was associated with the optimal OS, whereas that of TC-positive and IC-low status was associated with the worst OS.ConclusionPD-L1 expression on tumor and immune cells may have different prognostic value for patients with locally advanced ESCC receiving neoadjuvant CRT. A combination of these two indexes may further improve the prognostic prediction.Supplementary InformationThe online version contains supplementary material available at 10.1007/s00432-021-03772-7.',
'PurposeLymphocyte-monocyte ratio (LMR) has previously been used as a prognostic predictor in various solid tumors. This research aims in comparing the prognostic predictive Please check and conability of several inflammatory parameters and clinical parameters to validate further the excellent prognostic value of LMR in patients with gastric cancer treated with apatinib.MethodsMonitor inflammatory, nutritional parameters and tumor markers. Cutoff values of the parameters concerned were identified with the X-tile program. Subgroup analysis was made via Kaplan–Meier curves, and univariate and multivariate Cox regression analyses were used to find independent prognostic factors. The nomogram of logistic regression models was constructed according to the results.ResultsA total of 192 patients (115 divided into training group and 77 into validation group) who received the second- or later-line regimen of apatinib were retrospectively analyzed. The optimal cutoff value for LMR was 1.33. Patients with high LMR (LMR-H) were significantly longer than those with low LMR (LMR-L) in progression-free survival (median 121.0 days vs. median 44.5 days, P < 0.001). The predictive value of LMR was generally uniform across subgroups. Meanwhile, LMR and CA19-9 were the only hematological parameters with significant prognostic value in multivariate analysis. The area under the LMR curve (0.60) was greatest for all inflammatory indices. Adding LMR to the base model significantly enhanced the predictive power of the 6-month probability of disease progression (PD). The LMR-based nomogram showed good predictive power and discrimination in external validation.ConclusionLMR is a simple but effective predictor of prognosis for patients treated with apatinib.Supplementary InformationThe online version contains supplementary material available at 10.1007/s00432-023-04976-9.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.1375, 0.1558],
# [0.1375, 1.0000, 0.8143],
# [0.1558, 0.8143, 1.0000]])
medical-embedding-evalEmbeddingSimilarityEvaluator| Metric | Value |
|---|---|
| pearson_cosine | 0.695 |
| spearman_cosine | 0.7048 |
sentence_0, sentence_1, and label| sentence_0 | sentence_1 | label | |
|---|---|---|---|
| type | string | string | float |
| details |
|
|
|
| sentence_0 | sentence_1 | label |
|---|---|---|
ABSTRACTKirsten rat sarcoma viral oncogene homolog (KRAS) mutation is prognostic of poor survival for patients with non-small cell lung cancer (NSCLC). KRAS G12C mutations occur in 13% of NSCLC cases and despite the frequency of this mutation, advances in drug development against KRAS have historically been impeded due to the extremely high affinity of KRAS for guanosine triphosphate (GTP) and the lack of a binding pocket on the surface of KRAS that is suitable for drug binding. Sotorasib, a first-in-class, highly selective KRAS G12C inhibitor overcomes this issue by irreversibly binding in the switch-II pocket. Sotorasib was granted accelerated FDA approval for the treatment of KRASG12C-mutated locally advanced/metastatic NSCLC who have received at least one prior systemic therapy. This review summarizes the pharmacology, clinical efficacy, adverse effects, and clinical considerations of sotorasib. |
Lung cancer is among the most common instances of cancer subtypes and is associated with high mortality rates. Due to the availability of fewer therapies and delayed clinical investigations, the number of cancer incidences is rising dramatically. This is possibly an effect of immune modulations and chemotherapeutic drugs that raises cancer resistance. Among the list, IL-6 and IL-17 are host-derived paradoxical effectors that attune immune responses in malignant lung cells. Their excessive release in the cytokine milieu stabilizes immunosuppressive phenotypes, resulting in cellular perturbations. During tumor development, the significance of these molecules is reflected in their potential to regulate oncogenesis by initiating a myriad of signaling events that influence tumor growth and the metastatic ability of benign cancer cells. Moreover, their transactivation contributes to antiapoptotic mechanisms and favors cancer cell survival via constitutive expression of immunoregulatory molec... |
1.0 |
BackgroundTrophoblast cell-surface antigen 2 (TROP2) is expressed on the surface of trophoblast cells and many malignant tumor cells. However, data on TROP2 expression in advanced lung cancer are insufficient, and its changes have not been fully evaluated.MethodsWe assessed the prevalence and changes in TROP2 expression in patients with lung cancer who received anti-cancer treatments using immunohistochemical (IHC) analysis with an anti-TROP2 antibody (clone: SP295). IHC scores were graded from 0 to 3; grade ≥ 2 was considered positive for TROP2 expression. We defined a difference in IHC score, before and after anti-cancer treatments, as the change in TROP2 expression.ResultsBefore anti-cancer treatment, TROP2 expression was observed in 89% (143/160) of the patients and was significantly more common in adenocarcinoma and squamous cell carcinoma than in neuroendocrine carcinoma (P < 0.001). After anti-cancer treatment, TROP2 expression was observed in 87% (139/160) of the patients. The ... |
PurposeTo investigate the prognostic value of the neutrophil to lymphocyte ratio (NLR) and platelet to lymphocyte ratio (PLR) in patients with hepatocellular carcinoma (HCC) treated with stereotactic body radiotherapy (SBRT).MethodsThe medical records of HCC patients treated with SBRT between 2008 and 2019 were reviewed retrospectively. The NLR and PLR were calculated from the serum complete blood count before and after SBRT, and the prognostic values of the NLR and PLR for the treatment outcomes were evaluated.ResultsThirty-nine patients with 49 HCC lesions were included. After a median follow-up of 26.8 months (range, 8.4-80.0 months), three-year local control, overall survival (OS), and progression-free survival (PFS) rate were 97.4%, 78.3%, and 35.2%, respectively. Both NLR and PLR increased significantly after SBRT and decreased slowly to the pre-SBRT value at 6 months. Univariable analysis showed that gross tumor volume (GTV) >14 cc, post-SBRT PLR >90, and PLR change >30 were ass... |
0.0 |
PurposeDissociated response (DR, reduction at baseline or increase < 20% in target lesions compared with nadir in the presence of new lesions) was observed in 20–34% of patients treated with immune checkpoint inhibitors (ICIs). DRs were defined as progression disease (PD) per response evaluation criteria in solid tumors (RECIST v1.1), while evaluation criteria related to immunotherapy incorporated the new lesions into the total tumor burden or conducted further evaluation after 4–8 weeks rather than declaring PD immediately. The main objective of this study is to compare survival between people who continuing initial ICIs treatment and those who switched to other anticancer therapy at the time of DR.Patients and methods235 patients with advanced lung cancer (LC) treated with ICIs were evaluated. Propensity score matching (PSM) was used to minimize potential confounding factors. Post-DR OS, target lesion changes were evaluated.Results52 patients had been estimated as DRs. After PSM, the... |
PurposeAging is closely related to the occurrence of many diseases, including cancer, and involves changes in the immune microenvironment. γδT cells are important components of resident lymphocytes in mucosal tissues. However, little is known about the effects that the aged lung has on γδT cells and their prognostic significance in non-small cell lung cancer.MethodsIn the current study, the expression of γδTCR and IL-17A was measured by immunohistochemistry in paraffin-embedded lung tissues from 168 patients with adenocarcinoma (LUAD) and 144 patients with squamous cell carcinoma (LUSC). Furthermore, gene transcription patterns in LUAD and LUSC tumors and normal controls were extracted from TCGA and GTEx databases and were analyzed.ResultsHigh frequency of γδT cells was observed in patients with LUAD and LUSC, whereas the levels of CD4 + T cells, CD8 + T cells and CD56 + cells were decreased. Elevated γδT cells in tumors were mainly IL-17A-releasing γδT17 cells, which were found to be ... |
1.0 |
CosineSimilarityLoss with these parameters:{
"loss_fct": "torch.nn.modules.loss.MSELoss"
}
eval_strategy: stepsnum_train_epochs: 2multi_dataset_batch_sampler: round_robinoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 8per_device_eval_batch_size: 8per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 2max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedeepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robinrouter_mapping: {}learning_rate_mapping: {}| Epoch | Step | medical-embedding-eval_spearman_cosine |
|---|---|---|
| 0.4 | 50 | 0.5811 |
| 0.8 | 100 | 0.6263 |
| 1.0 | 125 | 0.6923 |
| 1.2 | 150 | 0.6450 |
| 1.6 | 200 | 0.6659 |
| 2.0 | 250 | 0.7048 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}