SentenceTransformer based on intfloat/multilingual-e5-large

This is a sentence-transformers model finetuned from intfloat/multilingual-e5-large. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: intfloat/multilingual-e5-large
Maximum Sequence Length: 512 tokens
Output Dimensionality: 1024 dimensions
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'XLMRobertaModel'})
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("iara-project/e5-large-simcse-sts-pt-ckpt-22000")
# Run inference
sentences = [
    'The weather is lovely today.',
    "It's so sunny outside!",
    'He drove to the stadium.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 1.0000, 1.0000],
#         [1.0000, 1.0000, 1.0000],
#         [1.0000, 1.0000, 1.0000]])

Training Details

Training Dataset

Unnamed Dataset

Columns: sentence1 and sentence2

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim",
    "gather_across_devices": false,
    "directions": [
        "query_to_doc"
    ],
    "partition_mode": "joint",
    "hardness_mode": null,
    "hardness_strength": 0.0
}

Training Hyperparameters

Non-Default Hyperparameters

per_device_train_batch_size: 64
max_steps: 140625
warmup_steps: 0.05
optim: adamw_torch
weight_decay: 0.01
gradient_accumulation_steps: 2
fp16: True
gradient_checkpointing: True
gradient_checkpointing_kwargs: {'use_reentrant': False}
data_seed: 42
accelerator_config: {'split_batches': False, 'dispatch_batches': False, 'even_batches': False, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
remove_unused_columns: False

All Hyperparameters

Click to expand

per_device_train_batch_size: 64
num_train_epochs: 3.0
max_steps: 140625
learning_rate: 5e-05
lr_scheduler_type: linear
lr_scheduler_kwargs: None
warmup_steps: 0.05
optim: adamw_torch
optim_args: None
weight_decay: 0.01
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
optim_target_modules: None
gradient_accumulation_steps: 2
average_tokens_across_devices: True
max_grad_norm: 1.0
label_smoothing_factor: 0.0
bf16: False
fp16: True
bf16_full_eval: False
fp16_full_eval: False
tf32: None
gradient_checkpointing: True
gradient_checkpointing_kwargs: {'use_reentrant': False}
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
use_liger_kernel: False
liger_kernel_config: None
use_cache: False
neftune_noise_alpha: None
torch_empty_cache_steps: None
auto_find_batch_size: False
log_on_each_node: True
logging_nan_inf_filter: True
include_num_input_tokens_seen: no
log_level: passive
log_level_replica: warning
disable_tqdm: False
project: huggingface
trackio_space_id: trackio
eval_strategy: no
per_device_eval_batch_size: 8
prediction_loss_only: True
eval_on_start: False
eval_do_concat_batches: True
eval_use_gather_object: False
eval_accumulation_steps: None
include_for_metrics: []
batch_eval_metrics: False
save_only_model: False
save_on_each_node: False
enable_jit_checkpoint: False
push_to_hub: False
hub_private_repo: None
hub_model_id: None
hub_strategy: every_save
hub_always_push: False
hub_revision: None
load_best_model_at_end: False
ignore_data_skip: False
restore_callback_states_from_checkpoint: False
full_determinism: False
seed: 42
data_seed: 42
use_cpu: False
accelerator_config: {'split_batches': False, 'dispatch_batches': False, 'even_batches': False, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
parallelism_config: None
dataloader_drop_last: True
dataloader_num_workers: 0
dataloader_pin_memory: True
dataloader_persistent_workers: False
dataloader_prefetch_factor: None
remove_unused_columns: False
label_names: None
train_sampling_strategy: random
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
ddp_backend: None
ddp_timeout: 1800
fsdp: []
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
deepspeed: None
debug: []
skip_memory_metrics: True
do_predict: False
resume_from_checkpoint: None
warmup_ratio: None
local_rank: -1
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional
router_mapping: {}
learning_rate_mapping: {}

Training Logs

Click to expand

Epoch	Step	Training Loss
0.0007	100	0.7908
0.0014	200	0.0416
0.0021	300	0.0083
0.0028	400	0.0032
0.0036	500	0.0026
0.0043	600	0.0029
0.0050	700	0.0019
0.0057	800	0.0010
0.0064	900	0.0007
0.0071	1000	0.0006
0.0078	1100	0.0006
0.0085	1200	0.0007
0.0092	1300	0.0006
0.0100	1400	0.0148
0.0107	1500	0.0616
0.0114	1600	0.0006
0.0121	1700	0.0005
0.0128	1800	0.0004
0.0135	1900	0.0002
0.0142	2000	0.0008
0.0149	2100	0.0006
0.0156	2200	0.0004
0.0164	2300	0.0004
0.0171	2400	0.0005
0.0178	2500	0.0007
0.0185	2600	0.0004
0.0192	2700	0.0004
0.0199	2800	0.0009
0.0206	2900	0.0008
0.0213	3000	0.0007
0.0220	3100	0.0007
0.0228	3200	0.0003
0.0235	3300	0.0006
0.0242	3400	0.0003
0.0249	3500	0.0007
0.0256	3600	0.0005
0.0263	3700	0.0004
0.0270	3800	0.0003
0.0277	3900	0.0004
0.0284	4000	0.0002
0.0292	4100	0.0004
0.0299	4200	0.0004
0.0306	4300	0.0222
0.0313	4400	0.0602
0.032	4500	0.0015
0.0327	4600	0.0006
0.0334	4700	0.0012
0.0341	4800	0.0005
0.0348	4900	0.0006
0.0356	5000	0.0020
0.0363	5100	0.0011
0.0370	5200	0.0052
0.0377	5300	0.0056
0.0384	5400	0.0012
0.0391	5500	0.0023
0.0398	5600	0.0008
0.0405	5700	0.0022
0.0412	5800	0.0028
0.0420	5900	0.0013
0.0427	6000	0.0006
0.0434	6100	0.0010
0.0441	6200	0.0018
0.0448	6300	0.0012
0.0455	6400	0.0850
0.0462	6500	0.2135
0.0469	6600	0.0009
0.0476	6700	0.0006
0.0484	6800	0.0008
0.0491	6900	0.0007
0.0498	7000	0.0007
0.0505	7100	0.0023
0.0512	7200	0.0324
0.0519	7300	3.5332
0.0526	7400	3.8028
0.0533	7500	3.5173
0.0540	7600	3.3454
0.0548	7700	3.2329
0.0555	7800	3.1247
0.0562	7900	3.0590
0.0569	8000	3.0476
0.0576	8100	3.0130
0.0583	8200	3.0243
0.0590	8300	3.0732
0.0597	8400	2.9507
0.0604	8500	2.9262
0.0612	8600	2.8962
0.0619	8700	2.8804
0.0626	8800	2.9169
0.0633	8900	2.8266
0.064	9000	2.7225
0.0647	9100	2.7237
0.0654	9200	2.7453
0.0661	9300	2.6963
0.0668	9400	2.6756
0.0676	9500	2.6574
0.0683	9600	2.6483
0.0690	9700	2.6565
0.0697	9800	2.7075
0.0704	9900	2.6720
0.0711	10000	2.5925
0.0718	10100	2.6393
0.0725	10200	2.6691
0.0732	10300	2.6803
0.0740	10400	2.5933
0.0747	10500	2.5682
0.0754	10600	2.5590
0.0761	10700	2.5377
0.0768	10800	2.5846
0.0775	10900	2.5702
0.0782	11000	2.5857
0.0789	11100	2.6180
0.0796	11200	2.5984
0.0804	11300	2.5591
0.0811	11400	2.5416
0.0818	11500	2.4725
0.0825	11600	2.5454
0.0832	11700	2.5108
0.0839	11800	2.5175
0.0846	11900	2.4648
0.0853	12000	2.4761
0.0860	12100	2.5200
0.0868	12200	2.4817
0.0875	12300	2.4515
0.0882	12400	2.4157
0.0889	12500	2.4040
0.0896	12600	2.4159
0.0903	12700	2.4135
0.0910	12800	2.3847
0.0917	12900	2.3521
0.0924	13000	2.3916
0.0932	13100	2.4054
0.0939	13200	2.3786
0.0946	13300	2.3229
0.0953	13400	2.3404
0.096	13500	2.3150
0.0967	13600	2.3440
0.0974	13700	2.3156
0.0981	13800	2.2733
0.0988	13900	2.2865
0.0996	14000	2.2950
0.1003	14100	2.4035
0.1010	14200	2.2394
0.1017	14300	2.2683
0.1024	14400	2.3263
0.1031	14500	2.3141
0.1038	14600	2.3244
0.1045	14700	2.2082
0.1052	14800	2.1171
0.1060	14900	2.1066
0.1067	15000	2.1475
0.1074	15100	2.0911
0.1081	15200	2.0580
0.1088	15300	2.0421
0.1095	15400	2.0824
0.1102	15500	2.0614
0.1109	15600	2.1394
0.1116	15700	2.0768
0.1124	15800	2.0223
0.1131	15900	1.9565
0.1138	16000	1.9910
0.1145	16100	2.0546
0.1152	16200	2.1003
0.1159	16300	1.9791
0.1166	16400	1.9593
0.1173	16500	1.9520
0.1180	16600	1.9481
0.1188	16700	1.9796
0.1195	16800	2.0056
0.1202	16900	2.1229
0.1209	17000	2.6600
0.1216	17100	2.4135
0.1223	17200	2.2025
0.1230	17300	2.1887
0.1237	17400	2.1336
0.1244	17500	2.2373
0.1252	17600	2.1003
0.1259	17700	2.0335
0.1266	17800	2.1719
0.1273	17900	2.8578
0.128	18000	2.5652
0.1287	18100	3.3793
0.1294	18200	3.4910
0.1301	18300	3.4708
0.1308	18400	3.8807
0.1316	18500	4.1589
0.1323	18600	4.1589
0.1330	18700	4.1589
0.1337	18800	4.1589
0.1344	18900	4.1589
0.1351	19000	4.1574
0.1358	19100	4.1192
0.1365	19200	4.0591
0.1372	19300	3.9932
0.1380	19400	3.8817
0.1387	19500	3.7952
0.1394	19600	3.8317
0.1401	19700	3.8522
0.1408	19800	3.8386
0.1415	19900	3.6811
0.1422	20000	3.6110
0.1429	20100	3.6770
0.1436	20200	3.8502
0.1444	20300	3.8473
0.1451	20400	3.7811
0.1458	20500	3.7746
0.1465	20600	3.8177
0.1472	20700	3.9349
0.1479	20800	3.9037
0.1486	20900	3.9405
0.1493	21000	3.8764
0.1500	21100	3.8224
0.1508	21200	3.7619
0.1515	21300	3.7261
0.1522	21400	3.7147
0.1529	21500	3.6446
0.1536	21600	3.6096
0.1543	21700	3.6247
0.1550	21800	3.5986
0.1557	21900	3.5659
0.1564	22000	3.5389

Framework Versions

Python: 3.11.12
Sentence Transformers: 5.3.0
Transformers: 5.3.0
PyTorch: 2.10.0+cu128
Accelerate: 1.13.0
Datasets: 4.8.3
Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{oord2019representationlearningcontrastivepredictive,
      title={Representation Learning with Contrastive Predictive Coding},
      author={Aaron van den Oord and Yazhe Li and Oriol Vinyals},
      year={2019},
      eprint={1807.03748},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/1807.03748},
}