SentenceTransformer based on jhu-clsp/mmBERT-base
This is a sentence-transformers model finetuned from jhu-clsp/mmBERT-base. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: jhu-clsp/mmBERT-base
- Maximum Sequence Length: 128 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
Model Sources
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 128, 'do_lower_case': False, 'architecture': 'ModernBertModel'})
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("sentence_transformers_model_id")
sentences = [
'attenuated vaccines:',
'कम संवेदनशील टीकेः',
'६.५% दसादशे',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
similarities = model.similarity(embeddings, embeddings)
print(similarities)
Evaluation
Metrics
Translation
| Metric |
Value |
| src2trg_accuracy |
0.595 |
| trg2src_accuracy |
0.573 |
| mean_accuracy |
0.584 |
Training Details
Training Dataset
Unnamed Dataset
- Size: 3,749,530 training samples
- Columns:
sentence1 and sentence2
- Approximate statistics based on the first 1000 samples:
|
sentence1 |
sentence2 |
| type |
string |
string |
| details |
- min: 12 tokens
- mean: 31.26 tokens
- max: 88 tokens
|
- min: 19 tokens
- mean: 67.93 tokens
- max: 128 tokens
|
- Samples:
| sentence1 |
sentence2 |
There was no Mughal tradition of primogeniture, the systematic passing of rule, upon an emperor's death, to his eldest son.
|
चक्रवर्तिनः मृत्योः अनन्तरं तस्य शासनस्य व्यवस्थितरूपेण सङ्क्रमणस्य, मुघलपरम्परायाः ज्येष्ठपुत्राधिकारपद्धतिः नासीत्।
|
The four sons of Shah Jahan all held governorships during their father's reign.
|
शाह्-जहाँ-नामकस्य चत्वारः पुत्राः, सर्वे पितुः शासनकाले शासकपदम् अधारयन्।
|
In this regard he discusses the correlation between social opportunities of education and health and how both of these complement economic and political freedoms as a healthy and well-educated person is better suited to make informed economic decisions and be involved in fruitful political demonstrations etc.
|
अस्मिन् विषये सः शिक्षणस्य स्वास्थ्यस्य च सामाजिकावकाशानाम् अन्योन्य-सम्बन्धस्य, तथा च एतद्द्वयम् अपि आर्थिक-राजनैतिक-स्वातन्त्र्ययोः कथं पूरकं भवतः इति च चर्चां करोति, यतोहि स्वस्था सुशिक्षिता च व्यक्तिः ज्ञानपूर्वम् आर्थिकविषयान् निर्णेतुं तथा फलप्रदेषु राजनैतिकेषु प्रतिपादनादिषु संलग्नः भवितुं च अधिकारी भवति इति।
|
- Loss:
main.InfoCELoss
Evaluation Dataset
Unnamed Dataset
- Size: 1,000 evaluation samples
- Columns:
sentence1 and sentence2
- Approximate statistics based on the first 1000 samples:
|
sentence1 |
sentence2 |
| type |
string |
string |
| details |
- min: 5 tokens
- mean: 11.9 tokens
- max: 67 tokens
|
- min: 5 tokens
- mean: 23.13 tokens
- max: 128 tokens
|
- Samples:
| sentence1 |
sentence2 |
plus 2 tempered glass screen protectors: |
6 पश्चात तापाभिसंतप्तॊ विदुर समार कर्शितः |
"Take sadaqah (alms) from their wealth in order to purify them with it." (p. |
अप्येकाङ्गेऽप्यधोवस्तुमिच्छामि च सुकुत्सिते" ॥ |
"Who could it possibly be?" |
कश्च तासेः सम्भवति ? |
- Loss:
main.InfoCELoss
Training Hyperparameters
Non-Default Hyperparameters
per_device_train_batch_size: 32
num_train_epochs: 5
max_steps: 12000
learning_rate: 2e-05
warmup_steps: 500
gradient_accumulation_steps: 4
bf16: True
eval_strategy: steps
load_best_model_at_end: True
All Hyperparameters
Click to expand
per_device_train_batch_size: 32
num_train_epochs: 5
max_steps: 12000
learning_rate: 2e-05
lr_scheduler_type: linear
lr_scheduler_kwargs: None
warmup_steps: 500
optim: adamw_torch_fused
optim_args: None
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
optim_target_modules: None
gradient_accumulation_steps: 4
average_tokens_across_devices: True
max_grad_norm: 1.0
label_smoothing_factor: 0.0
bf16: True
fp16: False
bf16_full_eval: False
fp16_full_eval: False
tf32: None
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
use_liger_kernel: False
liger_kernel_config: None
use_cache: False
neftune_noise_alpha: None
torch_empty_cache_steps: None
auto_find_batch_size: False
log_on_each_node: True
logging_nan_inf_filter: True
include_num_input_tokens_seen: no
log_level: passive
log_level_replica: warning
disable_tqdm: False
project: huggingface
trackio_space_id: trackio
eval_strategy: steps
per_device_eval_batch_size: 8
prediction_loss_only: True
eval_on_start: False
eval_do_concat_batches: True
eval_use_gather_object: False
eval_accumulation_steps: None
include_for_metrics: []
batch_eval_metrics: False
save_only_model: False
save_on_each_node: False
enable_jit_checkpoint: False
push_to_hub: False
hub_private_repo: None
hub_model_id: None
hub_strategy: every_save
hub_always_push: False
hub_revision: None
load_best_model_at_end: True
ignore_data_skip: False
restore_callback_states_from_checkpoint: False
full_determinism: False
seed: 42
data_seed: None
use_cpu: False
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
parallelism_config: None
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_pin_memory: True
dataloader_persistent_workers: False
dataloader_prefetch_factor: None
remove_unused_columns: True
label_names: None
train_sampling_strategy: random
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
ddp_backend: None
ddp_timeout: 1800
fsdp: []
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
deepspeed: None
debug: []
skip_memory_metrics: True
do_predict: False
resume_from_checkpoint: None
warmup_ratio: None
local_rank: -1
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional
router_mapping: {}
learning_rate_mapping: {}
Training Logs
Click to expand
| Epoch |
Step |
Training Loss |
Validation Loss |
eval-en-sa_mean_accuracy |
| 0.0034 |
100 |
3.1753 |
- |
- |
| 0.0068 |
200 |
2.8850 |
- |
- |
| 0.0102 |
300 |
2.0349 |
- |
- |
| 0.0137 |
400 |
1.3482 |
- |
- |
| 0.0171 |
500 |
1.0376 |
- |
- |
| 0.0205 |
600 |
0.8672 |
- |
- |
| 0.0239 |
700 |
0.7578 |
- |
- |
| 0.0273 |
800 |
0.6927 |
- |
- |
| 0.0307 |
900 |
0.6515 |
- |
- |
| 0.0341 |
1000 |
0.5976 |
0.2567 |
0.4845 |
| 0.0376 |
1100 |
0.5816 |
- |
- |
| 0.0410 |
1200 |
0.5644 |
- |
- |
| 0.0444 |
1300 |
0.5165 |
- |
- |
| 0.0478 |
1400 |
0.5173 |
- |
- |
| 0.0512 |
1500 |
0.5060 |
- |
- |
| 0.0546 |
1600 |
0.4878 |
- |
- |
| 0.0580 |
1700 |
0.4761 |
- |
- |
| 0.0614 |
1800 |
0.4638 |
- |
- |
| 0.0649 |
1900 |
0.4735 |
- |
- |
| 0.0683 |
2000 |
0.4498 |
0.1812 |
0.549 |
| 0.0717 |
2100 |
0.4382 |
- |
- |
| 0.0751 |
2200 |
0.4320 |
- |
- |
| 0.0785 |
2300 |
0.4421 |
- |
- |
| 0.0819 |
2400 |
0.4269 |
- |
- |
| 0.0853 |
2500 |
0.4297 |
- |
- |
| 0.0888 |
2600 |
0.4279 |
- |
- |
| 0.0922 |
2700 |
0.4264 |
- |
- |
| 0.0956 |
2800 |
0.4193 |
- |
- |
| 0.0990 |
2900 |
0.4187 |
- |
- |
| 0.1024 |
3000 |
0.4129 |
0.1540 |
0.5675 |
| 0.1058 |
3100 |
0.4009 |
- |
- |
| 0.1092 |
3200 |
0.3989 |
- |
- |
| 0.1127 |
3300 |
0.4021 |
- |
- |
| 0.1161 |
3400 |
0.3985 |
- |
- |
| 0.1195 |
3500 |
0.3875 |
- |
- |
| 0.1229 |
3600 |
0.3855 |
- |
- |
| 0.1263 |
3700 |
0.3945 |
- |
- |
| 0.1297 |
3800 |
0.4050 |
- |
- |
| 0.1331 |
3900 |
0.4000 |
- |
- |
| 0.1366 |
4000 |
0.3927 |
0.1436 |
0.572 |
| 0.1400 |
4100 |
0.3937 |
- |
- |
| 0.1434 |
4200 |
0.3891 |
- |
- |
| 0.1468 |
4300 |
0.3874 |
- |
- |
| 0.1502 |
4400 |
0.3821 |
- |
- |
| 0.1536 |
4500 |
0.3845 |
- |
- |
| 0.1570 |
4600 |
0.3817 |
- |
- |
| 0.1604 |
4700 |
0.3848 |
- |
- |
| 0.1639 |
4800 |
0.3800 |
- |
- |
| 0.1673 |
4900 |
0.3769 |
- |
- |
| 0.1707 |
5000 |
0.3820 |
0.1403 |
0.5805 |
| 0.1741 |
5100 |
0.3898 |
- |
- |
| 0.1775 |
5200 |
0.3821 |
- |
- |
| 0.1809 |
5300 |
0.3811 |
- |
- |
| 0.1843 |
5400 |
0.3781 |
- |
- |
| 0.1878 |
5500 |
0.3814 |
- |
- |
| 0.1912 |
5600 |
0.3915 |
- |
- |
| 0.1946 |
5700 |
0.3790 |
- |
- |
| 0.1980 |
5800 |
0.3787 |
- |
- |
| 0.2014 |
5900 |
0.3763 |
- |
- |
| 0.2048 |
6000 |
0.3809 |
0.1376 |
0.5815 |
| 0.2082 |
6100 |
0.3691 |
- |
- |
| 0.2117 |
6200 |
0.3794 |
- |
- |
| 0.2151 |
6300 |
0.3711 |
- |
- |
| 0.2185 |
6400 |
0.3682 |
- |
- |
| 0.2219 |
6500 |
0.3808 |
- |
- |
| 0.2253 |
6600 |
0.3883 |
- |
- |
| 0.2287 |
6700 |
0.3691 |
- |
- |
| 0.2321 |
6800 |
0.3589 |
- |
- |
| 0.2355 |
6900 |
0.3678 |
- |
- |
| 0.2390 |
7000 |
0.3878 |
0.1357 |
0.582 |
| 0.2424 |
7100 |
0.3725 |
- |
- |
| 0.2458 |
7200 |
0.3743 |
- |
- |
| 0.2492 |
7300 |
0.3724 |
- |
- |
| 0.2526 |
7400 |
0.3687 |
- |
- |
| 0.2560 |
7500 |
0.3732 |
- |
- |
| 0.2594 |
7600 |
0.3749 |
- |
- |
| 0.2629 |
7700 |
0.3652 |
- |
- |
| 0.2663 |
7800 |
0.3720 |
- |
- |
| 0.2697 |
7900 |
0.3714 |
- |
- |
| 0.2731 |
8000 |
0.3632 |
0.1345 |
0.5835 |
| 0.2765 |
8100 |
0.3756 |
- |
- |
| 0.2799 |
8200 |
0.3689 |
- |
- |
| 0.2833 |
8300 |
0.3638 |
- |
- |
| 0.2868 |
8400 |
0.3763 |
- |
- |
| 0.2902 |
8500 |
0.3702 |
- |
- |
| 0.2936 |
8600 |
0.3795 |
- |
- |
| 0.2970 |
8700 |
0.3674 |
- |
- |
| 0.3004 |
8800 |
0.3824 |
- |
- |
| 0.3038 |
8900 |
0.3661 |
- |
- |
| 0.3072 |
9000 |
0.375 |
0.1343 |
0.5845 |
| 0.3107 |
9100 |
0.3718 |
- |
- |
| 0.3141 |
9200 |
0.3689 |
- |
- |
| 0.3175 |
9300 |
0.3726 |
- |
- |
| 0.3209 |
9400 |
0.3761 |
- |
- |
| 0.3243 |
9500 |
0.3706 |
- |
- |
| 0.3277 |
9600 |
0.3672 |
- |
- |
| 0.3311 |
9700 |
0.3682 |
- |
- |
| 0.3345 |
9800 |
0.3783 |
- |
- |
| 0.3380 |
9900 |
0.3700 |
- |
- |
| 0.3414 |
10000 |
0.3771 |
0.1348 |
0.5825 |
| 0.3448 |
10100 |
0.3668 |
- |
- |
| 0.3482 |
10200 |
0.3670 |
- |
- |
| 0.3516 |
10300 |
0.3666 |
- |
- |
| 0.3550 |
10400 |
0.3792 |
- |
- |
| 0.3584 |
10500 |
0.3796 |
- |
- |
| 0.3619 |
10600 |
0.3733 |
- |
- |
| 0.3653 |
10700 |
0.3728 |
- |
- |
| 0.3687 |
10800 |
0.3631 |
- |
- |
| 0.3721 |
10900 |
0.3674 |
- |
- |
| 0.3755 |
11000 |
0.3719 |
0.1350 |
0.583 |
| 0.3789 |
11100 |
0.3673 |
- |
- |
| 0.3823 |
11200 |
0.3688 |
- |
- |
| 0.3858 |
11300 |
0.3604 |
- |
- |
| 0.3892 |
11400 |
0.3666 |
- |
- |
| 0.3926 |
11500 |
0.3669 |
- |
- |
| 0.3960 |
11600 |
0.3664 |
- |
- |
| 0.3994 |
11700 |
0.3690 |
- |
- |
| 0.4028 |
11800 |
0.3726 |
- |
- |
| 0.4062 |
11900 |
0.3768 |
- |
- |
| 0.4097 |
12000 |
0.3727 |
0.1346 |
0.584 |
- The bold row denotes the saved checkpoint.
Framework Versions
- Python: 3.10.18
- Sentence Transformers: 5.2.3
- Transformers: 5.2.0
- PyTorch: 2.8.0+cu128
- Accelerate: 1.12.0
- Datasets: 3.3.2
- Tokenizers: 0.22.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}