SentenceTransformer based on BAAI/bge-m3

This is a sentence-transformers model finetuned from BAAI/bge-m3 on the mitrasamgraha dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-m3
  • Maximum Sequence Length: 8192 tokens
  • Output Dimensionality: 1024 tokens
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • mitrasamgraha

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sanganaka/bge-m3-sanskritFT")
# Run inference
sentences = [
    'O Śākyamuni, conquering the powerful host of Māra, You found peace, immortality, and the happiness of that supreme enlightenment',
    'मारस् त्वयास्तु विजितस् सबलो मुनीन्द्रः प्राप्ता शिवा अमृतशान्तवराग्रबोधिः ।',
    'न हि तथता द्वयप्रभाविता नानात्वप्रभाविता ।',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Translation

Metric Value
src2trg_accuracy 0.9423
trg2src_accuracy 0.9381
mean_accuracy 0.9402

Translation

Metric Value
src2trg_accuracy 0.9305
trg2src_accuracy 0.927
mean_accuracy 0.9288

Training Details

Training Dataset

mitrasamgraha

  • Dataset: mitrasamgraha
  • Size: 477,170 training samples
  • Columns: english and sanskrit_Deva
  • Approximate statistics based on the first 1000 samples:
    english sanskrit_Deva
    type string string
    details
    • min: 20 tokens
    • mean: 43.11 tokens
    • max: 90 tokens
    • min: 19 tokens
    • mean: 33.88 tokens
    • max: 78 tokens
  • Samples:
    english sanskrit_Deva
    My patience is almost worn out, like that of a creeper under the winter frost. It is decayed, and neither lives nor perishes at once. जर्जरीकृत्य वस्तूनि त्यजन्ती विभ्रती तथा । मार्गशीर्षान्तवल्लीव धृतिर्विधुरतां गता ॥
    Our minds are partly settled in worldly things, and partly fixed in their giver (the Supreme soul). This divided state of the mind is termed its half waking condition. अपहस्तितसर्वार्थमनवस्थितिरास्थिता । गृहीत्वोत्सृज्य चात्मानं भवस्थितिरवस्थिता ॥
    My mind is in a state of suspense, being unable to ascertain the real nature of my soul. I am like one in the dark, who is deceived by the stump of a fallen tree at a distance, to think it a human figure. चलिताचलितेनान्तरवष्टम्भेन मे मतिः । दरिद्रा छिन्नवृक्षस्य मूलेनेव विडम्ब्यते ॥
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

mitrasamgraha

  • Dataset: mitrasamgraha
  • Size: 5,560 evaluation samples
  • Columns: english and sanskrit_Deva
  • Approximate statistics based on the first 1000 samples:
    english sanskrit_Deva
    type string string
    details
    • min: 5 tokens
    • mean: 58.68 tokens
    • max: 387 tokens
    • min: 6 tokens
    • mean: 44.88 tokens
    • max: 257 tokens
  • Samples:
    english sanskrit_Deva
    Thereupon he takes the winnowing basket and the Agnihotra ladle , with the text : 'For the work (I take) you, for pervasion (or accomplishment) you two! ' For the sacrifice is a work: hence, in saying 'for the work you two, ' he says, 'for the sacrifice. ' And 'for pervasion you two, ' he says, because he, as it were, pervades (goes through, accomplishes) the sacrifice. He then restrains his speech; for (restrained) speech means undisturbed sacrifice; so that (in so doing) he thinks: 'May I accomplish the sacrifice! ' He now heats (the two objects on the Grhapatya), with the formula : 'Scorched is the Rakshas, scorched are the enemies! ' or : 'Burnt out is the Rakshas, burnt out are the enemies! ' अग्ने व्रतपते व्रतं चरिष्यामि तचकेयं तन्मे राध्यतामित्यग्निर्वै देवानां व्रतपतिस्तस्मा एवैतत्प्राह व्रतं चरिष्यामि तच्चकेयं तन्मे राध्यतामिति नात्र तिरोहितमिवास्ति ॥ अथ संस्थिते विसृजते । अग्ने व्रतपते व्रतमचारिषं तादशकम् तन्मे राधीत्यशकद्येतद्यो यज्ञस्य संस्थामगन्नराधि ह्यस्मै यो यज्ञस्य संस्थामगन्नेतेन न्वेव भूयिष्ठा इव व्रतमुपयन्त्यनेन त्वेवोपेयात् ॥ द्वयं वा इदं न तृतीयमस्ति ।
    For the gods, when they were performing the sacrifice, were afraid of a disturbance on the part of the Asuras and Rakshas: hence by this means he expels from here, at the very opening of the sacrifice, the evil spirits, the Rakshas. एतद्धवै देवा व्रतं चरन्ति यत्सत्यं तस्मात्ते यशो यशो ह भवति य एवं विद्वांत्सत्यंवदति ॥ अथ संस्थिते विसृजते ।
    He now steps forward (to the cart ), with the text : 'I move along the wide arial realm. ' For the Rakshas roams about in the air, rootless and unfettered in both directions (below and above); and in order that this man (the Adhvaryu) may move about the air, rootless and unfettered in both directions, he by this very prayer renders the atmosphere free from danger and evil spirits. स वा आरण्यमेवाश्नीयात् । या वारण्या ओषधयो यद्वा वृक्ष्यं तदु ह स्माहापि बर्कुर्वार्ष्णो मासान्मे पचत न वा एतेसां हविर्गृह्णन्तीति तदु तथा न कुर्याद्व्रीहियवयोर्वा एतदुपजं यचमीधान्यं तद्व्रीहियवावेवैतेन भूयांसौ करोति तस्मादारण्यमेवाश्नीयात् ॥
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 128
  • learning_rate: 2e-05
  • num_train_epochs: 5
  • warmup_ratio: 0.1
  • bf16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 128
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss Validation Loss translate-val_mean_accuracy translate-test_mean_accuracy
0 0 - - 0.3724 -
0.0335 500 1.7546 - - -
0.0671 1000 0.8427 - - -
0.1006 1500 0.301 - - -
0.1341 2000 0.1892 - - -
0.1677 2500 0.0985 - - -
0.2012 3000 0.283 - - -
0.2347 3500 0.4843 - - -
0.2682 4000 0.3884 - - -
0.3018 4500 0.5331 - - -
0.3353 5000 0.6926 - - -
0.3688 5500 0.4398 - - -
0.4024 6000 0.3152 - - -
0.4359 6500 0.2488 - - -
0.4694 7000 0.3297 - - -
0.5030 7500 0.2496 - - -
0.5365 8000 0.2058 - - -
0.5700 8500 0.2032 - - -
0.6035 9000 0.3762 - - -
0.6371 9500 0.4035 - - -
0.6706 10000 0.5922 - - -
0.7041 10500 0.0894 - - -
0.7377 11000 0.2658 - - -
0.7712 11500 0.2099 - - -
0.8047 12000 0.4648 - - -
0.8383 12500 0.5967 - - -
0.8718 13000 0.0863 - - -
0.9053 13500 0.0626 - - -
0.9388 14000 0.2336 - - -
0.9724 14500 0.3032 - - -
1.0 14912 - 0.2874 0.8858 -
1.0059 15000 0.2268 - - -
1.0394 15500 0.4782 - - -
1.0730 16000 0.2226 - - -
1.1065 16500 0.0766 - - -
1.1400 17000 0.0589 - - -
1.1736 17500 0.0248 - - -
1.2071 18000 0.1875 - - -
1.2406 18500 0.2958 - - -
1.2741 19000 0.2065 - - -
1.3077 19500 0.4541 - - -
1.3412 20000 0.5509 - - -
1.3747 20500 0.1221 - - -
1.4083 21000 0.1986 - - -
1.4418 21500 0.1263 - - -
1.4753 22000 0.1777 - - -
1.5089 22500 0.1165 - - -
1.5424 23000 0.1017 - - -
1.5759 23500 0.1309 - - -
1.6094 24000 0.2304 - - -
1.6430 24500 0.3245 - - -
1.6765 25000 0.3282 - - -
1.7100 25500 0.0163 - - -
1.7436 26000 0.1357 - - -
1.7771 26500 0.1302 - - -
1.8106 27000 0.4238 - - -
1.8442 27500 0.3066 - - -
1.8777 28000 0.0305 - - -
1.9112 28500 0.0279 - - -
1.9447 29000 0.1823 - - -
1.9783 29500 0.151 - - -
2.0 29824 - 0.2112 0.9160 -
2.0118 30000 0.169 - - -
2.0453 30500 0.2848 - - -
2.0789 31000 0.0858 - - -
2.1124 31500 0.0363 - - -
2.1459 32000 0.0208 - - -
2.1795 32500 0.01 - - -
2.2130 33000 0.1198 - - -
2.2465 33500 0.2025 - - -
2.2800 34000 0.1131 - - -
2.3136 34500 0.3647 - - -
2.3471 35000 0.3397 - - -
2.3806 35500 0.0507 - - -
2.4142 36000 0.1101 - - -
2.4477 36500 0.0832 - - -
2.4812 37000 0.0977 - - -
2.5148 37500 0.0666 - - -
2.5483 38000 0.0546 - - -
2.5818 38500 0.0868 - - -
2.6153 39000 0.1504 - - -
2.6489 39500 0.2462 - - -
2.6824 40000 0.1835 - - -
2.7159 40500 0.0279 - - -
2.7495 41000 0.0594 - - -
2.7830 41500 0.0889 - - -
2.8165 42000 0.4076 - - -
2.8501 42500 0.1206 - - -
2.8836 43000 0.0143 - - -
2.9171 43500 0.013 - - -
2.9506 44000 0.1479 - - -
2.9842 44500 0.0626 - - -
3.0 44736 - 0.1816 0.9262 -
3.0177 45000 0.1422 - - -
3.0512 45500 0.1636 - - -
3.0848 46000 0.0266 - - -
3.1183 46500 0.0145 - - -
3.1518 47000 0.0096 - - -
3.1854 47500 0.0055 - - -
3.2189 48000 0.0728 - - -
3.2524 48500 0.1368 - - -
3.2859 49000 0.0739 - - -
3.3195 49500 0.2677 - - -
3.3530 50000 0.2339 - - -
3.3865 50500 0.0283 - - -
3.4201 51000 0.0654 - - -
3.4536 51500 0.0659 - - -
3.4871 52000 0.0445 - - -
3.5207 52500 0.0355 - - -
3.5542 53000 0.0307 - - -
3.5877 53500 0.0577 - - -
3.6212 54000 0.129 - - -
3.6548 54500 0.1727 - - -
3.6883 55000 0.0952 - - -
3.7218 55500 0.03 - - -
3.7554 56000 0.0263 - - -
3.7889 56500 0.059 - - -
3.8224 57000 0.3222 - - -
3.8560 57500 0.0727 - - -
3.8895 58000 0.0072 - - -
3.9230 58500 0.0229 - - -
3.9565 59000 0.0877 - - -
3.9901 59500 0.0273 - - -
4.0 59648 - 0.1633 0.9357 -
4.0236 60000 0.111 - - -
4.0571 60500 0.0897 - - -
4.0907 61000 0.0117 - - -
4.1242 61500 0.0077 - - -
4.1577 62000 0.005 - - -
4.1913 62500 0.0115 - - -
4.2248 63000 0.0463 - - -
4.2583 63500 0.097 - - -
4.2918 64000 0.0713 - - -
4.3254 64500 0.1869 - - -
4.3589 65000 0.1845 - - -
4.3924 65500 0.0267 - - -
4.4260 66000 0.041 - - -
4.4595 66500 0.0463 - - -
4.4930 67000 0.0239 - - -
4.5266 67500 0.0276 - - -
4.5601 68000 0.0176 - - -
4.5936 68500 0.0409 - - -
4.6271 69000 0.107 - - -
4.6607 69500 0.1604 - - -
4.6942 70000 0.0495 - - -
4.7277 70500 0.0268 - - -
4.7613 71000 0.0259 - - -
4.7948 71500 0.0478 - - -
4.8283 72000 0.3 - - -
4.8619 72500 0.0436 - - -
4.8954 73000 0.0059 - - -
4.9289 73500 0.0295 - - -
4.9624 74000 0.0926 - - -
4.9960 74500 0.0191 - - -
5.0 74560 - 0.1558 0.9402 0.9288

Framework Versions

  • Python: 3.12.4
  • Sentence Transformers: 3.2.1
  • Transformers: 4.42.4
  • PyTorch: 2.3.1
  • Accelerate: 0.31.0
  • Datasets: 2.20.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
67
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sanganaka/bge-m3-sanskritFT

Base model

BAAI/bge-m3
Finetuned
(427)
this model

Collections including sanganaka/bge-m3-sanskritFT

Papers for sanganaka/bge-m3-sanskritFT

Evaluation results