SentenceTransformer based on sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2

This is a sentence-transformers model finetuned from sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    '我只想暖和一下。',
    'mi wile kama seli taso.',
    'tomo tawa sina li lon ni.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.7361, 0.2725],
#         [0.7361, 1.0000, 0.2417],
#         [0.2725, 0.2417, 1.0000]])

Evaluation

Metrics

Knowledge Distillation

Metric Value
negative_mse -1.9607

Translation

Metric Value
src2trg_accuracy 0.6918
trg2src_accuracy 0.6346
mean_accuracy 0.6632

Training Details

Training Dataset

Unnamed Dataset

  • Size: 82,069 training samples
  • Columns: natural, tok, and label
  • Approximate statistics based on the first 1000 samples:
    natural tok label
    type string string list
    details
    • min: 4 tokens
    • mean: 10.99 tokens
    • max: 45 tokens
    • min: 4 tokens
    • mean: 15.64 tokens
    • max: 55 tokens
    • size: 384 elements
  • Samples:
    natural tok label
    Я держу руку. mi sewi e luka mi. [-0.17412713170051575, 0.2601699233055115, 0.3189601004123688, 0.009355960413813591, -0.030796436592936516, ...]
    Я змарыўся ад працы. tan pali mi la mi pilin lape. [0.1258312165737152, 0.173202782869339, 0.16050441563129425, 0.2519824206829071, -0.035661786794662476, ...]
    Mi bolso necesita ser reparado. poki mi li pakala. [-0.22065182030200958, 0.3290186822414398, -0.006242208182811737, 0.18535998463630676, 0.3087056577205658, ...]
  • Loss: MSELoss

Evaluation Dataset

Unnamed Dataset

  • Size: 4,267 evaluation samples
  • Columns: natural, tok, and label
  • Approximate statistics based on the first 1000 samples:
    natural tok label
    type string string list
    details
    • min: 5 tokens
    • mean: 11.0 tokens
    • max: 44 tokens
    • min: 4 tokens
    • mean: 15.4 tokens
    • max: 66 tokens
    • size: 384 elements
  • Samples:
    natural tok label
    Da quanto tempo sei/state in Germania? tenpo pi suli seme la sina lon ma Tosi? [0.43582403659820557, 0.4226286709308624, 0.06436676532030106, -0.38238099217414856, -0.13951840996742249, ...]
    Habesne difficultatem hac re? ni li ike tawa sina anu seme? [0.22038640081882477, 0.03845325857400894, 0.20817194879055023, 0.08335897326469421, -0.10346948355436325, ...]
    אני לא הולך להפסיד. mi kama ala anpa. [0.3058338761329651, 0.06292764097452164, 0.019105680286884308, -0.04162227734923363, -0.10258055478334427, ...]
  • Loss: MSELoss

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • learning_rate: 2e-05
  • num_train_epochs: 12
  • warmup_ratio: 0.1
  • fp16: True
  • load_best_model_at_end: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 12
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Click to expand
Epoch Step Training Loss Validation Loss eval_data_negative_mse eval_data_mean_accuracy
0.0779 100 0.0257 - - -
0.1559 200 0.0235 - - -
0.2338 300 0.0221 - - -
0.3118 400 0.0217 - - -
0.3897 500 0.0209 - - -
0.4677 600 0.0201 - - -
0.5456 700 0.0192 - - -
0.6235 800 0.0186 - - -
0.7015 900 0.0176 - - -
0.7794 1000 0.0171 - - -
0.8574 1100 0.0166 - - -
0.9353 1200 0.0159 - - -
1.0133 1300 0.0154 - - -
1.0912 1400 0.015 - - -
1.1691 1500 0.0145 - - -
1.2471 1600 0.0143 - - -
1.3250 1700 0.014 - - -
1.4030 1800 0.0139 - - -
1.4809 1900 0.0136 - - -
1.5588 2000 0.0134 0.0122 -2.3609 0.5603
1.6368 2100 0.0133 - - -
1.7147 2200 0.0133 - - -
1.7927 2300 0.0132 - - -
1.8706 2400 0.0131 - - -
1.9486 2500 0.0131 - - -
2.0265 2600 0.0129 - - -
2.1044 2700 0.0127 - - -
2.1824 2800 0.0125 - - -
2.2603 2900 0.0125 - - -
2.3383 3000 0.0124 - - -
2.4162 3100 0.0123 - - -
2.4942 3200 0.0122 - - -
2.5721 3300 0.0121 - - -
2.6500 3400 0.0123 - - -
2.7280 3500 0.0122 - - -
2.8059 3600 0.0122 - - -
2.8839 3700 0.0121 - - -
2.9618 3800 0.0122 - - -
3.0398 3900 0.012 - - -
3.1177 4000 0.0119 0.0110 -2.1275 0.6289
3.1956 4100 0.0118 - - -
3.2736 4200 0.0118 - - -
3.3515 4300 0.0117 - - -
3.4295 4400 0.0117 - - -
3.5074 4500 0.0116 - - -
3.5853 4600 0.0116 - - -
3.6633 4700 0.0117 - - -
3.7412 4800 0.0117 - - -
3.8192 4900 0.0116 - - -
3.8971 5000 0.0117 - - -
3.9751 5100 0.0115 - - -
4.0530 5200 0.0115 - - -
4.1309 5300 0.0113 - - -
4.2089 5400 0.0113 - - -
4.2868 5500 0.0114 - - -
4.3648 5600 0.0114 - - -
4.4427 5700 0.0113 - - -
4.5207 5800 0.0112 - - -
4.5986 5900 0.0113 - - -
4.6765 6000 0.0113 0.0107 -2.0522 0.6478
4.7545 6100 0.0112 - - -
4.8324 6200 0.0112 - - -
4.9104 6300 0.0113 - - -
4.9883 6400 0.0113 - - -
5.0663 6500 0.011 - - -
5.1442 6600 0.011 - - -
5.2221 6700 0.011 - - -
5.3001 6800 0.0109 - - -
5.3780 6900 0.0111 - - -
5.4560 7000 0.0111 - - -
5.5339 7100 0.011 - - -
5.6118 7200 0.0109 - - -
5.6898 7300 0.011 - - -
5.7677 7400 0.011 - - -
5.8457 7500 0.0111 - - -
5.9236 7600 0.011 - - -
6.0016 7700 0.0112 - - -
6.0795 7800 0.0108 - - -
6.1574 7900 0.0108 - - -
6.2354 8000 0.0107 0.0105 -2.0098 0.6524
6.3133 8100 0.0108 - - -
6.3913 8200 0.0108 - - -
6.4692 8300 0.0108 - - -
6.5472 8400 0.0109 - - -
6.6251 8500 0.0108 - - -
6.7030 8600 0.0108 - - -
6.7810 8700 0.0108 - - -
6.8589 8800 0.0107 - - -
6.9369 8900 0.0109 - - -
7.0148 9000 0.0108 - - -
7.0928 9100 0.0106 - - -
7.1707 9200 0.0107 - - -
7.2486 9300 0.0106 - - -
7.3266 9400 0.0105 - - -
7.4045 9500 0.0105 - - -
7.4825 9600 0.0107 - - -
7.5604 9700 0.0107 - - -
7.6383 9800 0.0108 - - -
7.7163 9900 0.0107 - - -
7.7942 10000 0.0106 0.0103 -1.9857 0.6582
7.8722 10100 0.0106 - - -
7.9501 10200 0.0106 - - -
8.0281 10300 0.0106 - - -
8.1060 10400 0.0105 - - -
8.1839 10500 0.0103 - - -
8.2619 10600 0.0105 - - -
8.3398 10700 0.0105 - - -
8.4178 10800 0.0105 - - -
8.4957 10900 0.0105 - - -
8.5737 11000 0.0105 - - -
8.6516 11100 0.0105 - - -
8.7295 11200 0.0106 - - -
8.8075 11300 0.0104 - - -
8.8854 11400 0.0106 - - -
8.9634 11500 0.0106 - - -
9.0413 11600 0.0105 - - -
9.1193 11700 0.0103 - - -
9.1972 11800 0.0102 - - -
9.2751 11900 0.0105 - - -
9.3531 12000 0.0104 0.0103 -1.9715 0.6624
9.4310 12100 0.0104 - - -
9.5090 12200 0.0104 - - -
9.5869 12300 0.0105 - - -
9.6648 12400 0.0104 - - -
9.7428 12500 0.0103 - - -
9.8207 12600 0.0105 - - -
9.8987 12700 0.0105 - - -
9.9766 12800 0.0104 - - -
10.0546 12900 0.0103 - - -
10.1325 13000 0.0103 - - -
10.2104 13100 0.0102 - - -
10.2884 13200 0.0103 - - -
10.3663 13300 0.0105 - - -
10.4443 13400 0.0103 - - -
10.5222 13500 0.0104 - - -
10.6002 13600 0.0104 - - -
10.6781 13700 0.0103 - - -
10.7560 13800 0.0103 - - -
10.8340 13900 0.0103 - - -
10.9119 14000 0.0103 0.0102 -1.9607 0.6632
10.9899 14100 0.0102 - - -
11.0678 14200 0.0103 - - -
11.1458 14300 0.0103 - - -
11.2237 14400 0.0103 - - -
11.3016 14500 0.0102 - - -
11.3796 14600 0.0104 - - -
11.4575 14700 0.0103 - - -
11.5355 14800 0.0103 - - -
11.6134 14900 0.0103 - - -
11.6913 15000 0.0102 - - -
11.7693 15100 0.0102 - - -
11.8472 15200 0.0103 - - -
11.9252 15300 0.0102 - - -
12.0 15396 - 0.0102 -1.9607 0.6632
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.13.7
  • Sentence Transformers: 5.3.0
  • Transformers: 4.55.2
  • PyTorch: 2.10.0+cu128
  • Accelerate: 1.13.0
  • Datasets: 4.7.0
  • Tokenizers: 0.21.4

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MSELoss

@inproceedings{reimers-2020-multilingual-sentence-bert,
    title = "Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2020",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/2004.09813",
}
Downloads last month
18
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for NetherQuartz/paraphrase-MiniLM-tokipona

Papers for NetherQuartz/paraphrase-MiniLM-tokipona

Evaluation results