SentenceTransformer based on intfloat/multilingual-e5-large

This is a sentence-transformers model finetuned from intfloat/multilingual-e5-large. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: intfloat/multilingual-e5-large
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 1024 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'XLMRobertaModel'})
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("iara-project/e5-large-simcse-sts-pt-ckpt-22000")
# Run inference
sentences = [
    'The weather is lovely today.',
    "It's so sunny outside!",
    'He drove to the stadium.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 1.0000, 1.0000],
#         [1.0000, 1.0000, 1.0000],
#         [1.0000, 1.0000, 1.0000]])

Training Details

Training Dataset

Unnamed Dataset

  • Columns: sentence1 and sentence2
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false,
        "directions": [
            "query_to_doc"
        ],
        "partition_mode": "joint",
        "hardness_mode": null,
        "hardness_strength": 0.0
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 64
  • max_steps: 140625
  • warmup_steps: 0.05
  • optim: adamw_torch
  • weight_decay: 0.01
  • gradient_accumulation_steps: 2
  • fp16: True
  • gradient_checkpointing: True
  • gradient_checkpointing_kwargs: {'use_reentrant': False}
  • data_seed: 42
  • accelerator_config: {'split_batches': False, 'dispatch_batches': False, 'even_batches': False, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • remove_unused_columns: False

All Hyperparameters

Click to expand
  • per_device_train_batch_size: 64
  • num_train_epochs: 3.0
  • max_steps: 140625
  • learning_rate: 5e-05
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: None
  • warmup_steps: 0.05
  • optim: adamw_torch
  • optim_args: None
  • weight_decay: 0.01
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • optim_target_modules: None
  • gradient_accumulation_steps: 2
  • average_tokens_across_devices: True
  • max_grad_norm: 1.0
  • label_smoothing_factor: 0.0
  • bf16: False
  • fp16: True
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • gradient_checkpointing: True
  • gradient_checkpointing_kwargs: {'use_reentrant': False}
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • use_liger_kernel: False
  • liger_kernel_config: None
  • use_cache: False
  • neftune_noise_alpha: None
  • torch_empty_cache_steps: None
  • auto_find_batch_size: False
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • include_num_input_tokens_seen: no
  • log_level: passive
  • log_level_replica: warning
  • disable_tqdm: False
  • project: huggingface
  • trackio_space_id: trackio
  • eval_strategy: no
  • per_device_eval_batch_size: 8
  • prediction_loss_only: True
  • eval_on_start: False
  • eval_do_concat_batches: True
  • eval_use_gather_object: False
  • eval_accumulation_steps: None
  • include_for_metrics: []
  • batch_eval_metrics: False
  • save_only_model: False
  • save_on_each_node: False
  • enable_jit_checkpoint: False
  • push_to_hub: False
  • hub_private_repo: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_always_push: False
  • hub_revision: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • restore_callback_states_from_checkpoint: False
  • full_determinism: False
  • seed: 42
  • data_seed: 42
  • use_cpu: False
  • accelerator_config: {'split_batches': False, 'dispatch_batches': False, 'even_batches': False, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • dataloader_drop_last: True
  • dataloader_num_workers: 0
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • dataloader_prefetch_factor: None
  • remove_unused_columns: False
  • label_names: None
  • train_sampling_strategy: random
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • ddp_backend: None
  • ddp_timeout: 1800
  • fsdp: []
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • deepspeed: None
  • debug: []
  • skip_memory_metrics: True
  • do_predict: False
  • resume_from_checkpoint: None
  • warmup_ratio: None
  • local_rank: -1
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Click to expand
Epoch Step Training Loss
0.0007 100 0.7908
0.0014 200 0.0416
0.0021 300 0.0083
0.0028 400 0.0032
0.0036 500 0.0026
0.0043 600 0.0029
0.0050 700 0.0019
0.0057 800 0.0010
0.0064 900 0.0007
0.0071 1000 0.0006
0.0078 1100 0.0006
0.0085 1200 0.0007
0.0092 1300 0.0006
0.0100 1400 0.0148
0.0107 1500 0.0616
0.0114 1600 0.0006
0.0121 1700 0.0005
0.0128 1800 0.0004
0.0135 1900 0.0002
0.0142 2000 0.0008
0.0149 2100 0.0006
0.0156 2200 0.0004
0.0164 2300 0.0004
0.0171 2400 0.0005
0.0178 2500 0.0007
0.0185 2600 0.0004
0.0192 2700 0.0004
0.0199 2800 0.0009
0.0206 2900 0.0008
0.0213 3000 0.0007
0.0220 3100 0.0007
0.0228 3200 0.0003
0.0235 3300 0.0006
0.0242 3400 0.0003
0.0249 3500 0.0007
0.0256 3600 0.0005
0.0263 3700 0.0004
0.0270 3800 0.0003
0.0277 3900 0.0004
0.0284 4000 0.0002
0.0292 4100 0.0004
0.0299 4200 0.0004
0.0306 4300 0.0222
0.0313 4400 0.0602
0.032 4500 0.0015
0.0327 4600 0.0006
0.0334 4700 0.0012
0.0341 4800 0.0005
0.0348 4900 0.0006
0.0356 5000 0.0020
0.0363 5100 0.0011
0.0370 5200 0.0052
0.0377 5300 0.0056
0.0384 5400 0.0012
0.0391 5500 0.0023
0.0398 5600 0.0008
0.0405 5700 0.0022
0.0412 5800 0.0028
0.0420 5900 0.0013
0.0427 6000 0.0006
0.0434 6100 0.0010
0.0441 6200 0.0018
0.0448 6300 0.0012
0.0455 6400 0.0850
0.0462 6500 0.2135
0.0469 6600 0.0009
0.0476 6700 0.0006
0.0484 6800 0.0008
0.0491 6900 0.0007
0.0498 7000 0.0007
0.0505 7100 0.0023
0.0512 7200 0.0324
0.0519 7300 3.5332
0.0526 7400 3.8028
0.0533 7500 3.5173
0.0540 7600 3.3454
0.0548 7700 3.2329
0.0555 7800 3.1247
0.0562 7900 3.0590
0.0569 8000 3.0476
0.0576 8100 3.0130
0.0583 8200 3.0243
0.0590 8300 3.0732
0.0597 8400 2.9507
0.0604 8500 2.9262
0.0612 8600 2.8962
0.0619 8700 2.8804
0.0626 8800 2.9169
0.0633 8900 2.8266
0.064 9000 2.7225
0.0647 9100 2.7237
0.0654 9200 2.7453
0.0661 9300 2.6963
0.0668 9400 2.6756
0.0676 9500 2.6574
0.0683 9600 2.6483
0.0690 9700 2.6565
0.0697 9800 2.7075
0.0704 9900 2.6720
0.0711 10000 2.5925
0.0718 10100 2.6393
0.0725 10200 2.6691
0.0732 10300 2.6803
0.0740 10400 2.5933
0.0747 10500 2.5682
0.0754 10600 2.5590
0.0761 10700 2.5377
0.0768 10800 2.5846
0.0775 10900 2.5702
0.0782 11000 2.5857
0.0789 11100 2.6180
0.0796 11200 2.5984
0.0804 11300 2.5591
0.0811 11400 2.5416
0.0818 11500 2.4725
0.0825 11600 2.5454
0.0832 11700 2.5108
0.0839 11800 2.5175
0.0846 11900 2.4648
0.0853 12000 2.4761
0.0860 12100 2.5200
0.0868 12200 2.4817
0.0875 12300 2.4515
0.0882 12400 2.4157
0.0889 12500 2.4040
0.0896 12600 2.4159
0.0903 12700 2.4135
0.0910 12800 2.3847
0.0917 12900 2.3521
0.0924 13000 2.3916
0.0932 13100 2.4054
0.0939 13200 2.3786
0.0946 13300 2.3229
0.0953 13400 2.3404
0.096 13500 2.3150
0.0967 13600 2.3440
0.0974 13700 2.3156
0.0981 13800 2.2733
0.0988 13900 2.2865
0.0996 14000 2.2950
0.1003 14100 2.4035
0.1010 14200 2.2394
0.1017 14300 2.2683
0.1024 14400 2.3263
0.1031 14500 2.3141
0.1038 14600 2.3244
0.1045 14700 2.2082
0.1052 14800 2.1171
0.1060 14900 2.1066
0.1067 15000 2.1475
0.1074 15100 2.0911
0.1081 15200 2.0580
0.1088 15300 2.0421
0.1095 15400 2.0824
0.1102 15500 2.0614
0.1109 15600 2.1394
0.1116 15700 2.0768
0.1124 15800 2.0223
0.1131 15900 1.9565
0.1138 16000 1.9910
0.1145 16100 2.0546
0.1152 16200 2.1003
0.1159 16300 1.9791
0.1166 16400 1.9593
0.1173 16500 1.9520
0.1180 16600 1.9481
0.1188 16700 1.9796
0.1195 16800 2.0056
0.1202 16900 2.1229
0.1209 17000 2.6600
0.1216 17100 2.4135
0.1223 17200 2.2025
0.1230 17300 2.1887
0.1237 17400 2.1336
0.1244 17500 2.2373
0.1252 17600 2.1003
0.1259 17700 2.0335
0.1266 17800 2.1719
0.1273 17900 2.8578
0.128 18000 2.5652
0.1287 18100 3.3793
0.1294 18200 3.4910
0.1301 18300 3.4708
0.1308 18400 3.8807
0.1316 18500 4.1589
0.1323 18600 4.1589
0.1330 18700 4.1589
0.1337 18800 4.1589
0.1344 18900 4.1589
0.1351 19000 4.1574
0.1358 19100 4.1192
0.1365 19200 4.0591
0.1372 19300 3.9932
0.1380 19400 3.8817
0.1387 19500 3.7952
0.1394 19600 3.8317
0.1401 19700 3.8522
0.1408 19800 3.8386
0.1415 19900 3.6811
0.1422 20000 3.6110
0.1429 20100 3.6770
0.1436 20200 3.8502
0.1444 20300 3.8473
0.1451 20400 3.7811
0.1458 20500 3.7746
0.1465 20600 3.8177
0.1472 20700 3.9349
0.1479 20800 3.9037
0.1486 20900 3.9405
0.1493 21000 3.8764
0.1500 21100 3.8224
0.1508 21200 3.7619
0.1515 21300 3.7261
0.1522 21400 3.7147
0.1529 21500 3.6446
0.1536 21600 3.6096
0.1543 21700 3.6247
0.1550 21800 3.5986
0.1557 21900 3.5659
0.1564 22000 3.5389

Framework Versions

  • Python: 3.11.12
  • Sentence Transformers: 5.3.0
  • Transformers: 5.3.0
  • PyTorch: 2.10.0+cu128
  • Accelerate: 1.13.0
  • Datasets: 4.8.3
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{oord2019representationlearningcontrastivepredictive,
      title={Representation Learning with Contrastive Predictive Coding},
      author={Aaron van den Oord and Yazhe Li and Oriol Vinyals},
      year={2019},
      eprint={1807.03748},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/1807.03748},
}
Downloads last month
12
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for iara-project/e5-large-simcse-sts-pt-ckpt-22000

Finetuned
(175)
this model

Papers for iara-project/e5-large-simcse-sts-pt-ckpt-22000