SentenceTransformer based on neuralmind/bert-large-portuguese-cased

This is a sentence-transformers model finetuned from neuralmind/bert-large-portuguese-cased. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: neuralmind/bert-large-portuguese-cased
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 1024 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("iara-project/BERTimbau-large-simcse-pt-ckpt-32000")
# Run inference
sentences = [
    'The weather is lovely today.',
    "It's so sunny outside!",
    'He drove to the stadium.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 1.0000, 1.0000],
#         [1.0000, 1.0000, 1.0000],
#         [1.0000, 1.0000, 1.0000]])

Training Details

Training Dataset

Unnamed Dataset

  • Columns: sentence1 and sentence2
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false,
        "directions": [
            "query_to_doc"
        ],
        "partition_mode": "joint",
        "hardness_mode": null,
        "hardness_strength": 0.0
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 64
  • max_steps: 140625
  • warmup_steps: 0.05
  • optim: adamw_torch
  • weight_decay: 0.01
  • fp16: True
  • gradient_checkpointing: True
  • gradient_checkpointing_kwargs: {'use_reentrant': False}
  • data_seed: 42
  • accelerator_config: {'split_batches': False, 'dispatch_batches': False, 'even_batches': False, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • remove_unused_columns: False
  • ddp_find_unused_parameters: False

All Hyperparameters

Click to expand
  • per_device_train_batch_size: 64
  • num_train_epochs: 3.0
  • max_steps: 140625
  • learning_rate: 5e-05
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: None
  • warmup_steps: 0.05
  • optim: adamw_torch
  • optim_args: None
  • weight_decay: 0.01
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • optim_target_modules: None
  • gradient_accumulation_steps: 1
  • average_tokens_across_devices: True
  • max_grad_norm: 1.0
  • label_smoothing_factor: 0.0
  • bf16: False
  • fp16: True
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • gradient_checkpointing: True
  • gradient_checkpointing_kwargs: {'use_reentrant': False}
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • use_liger_kernel: False
  • liger_kernel_config: None
  • use_cache: False
  • neftune_noise_alpha: None
  • torch_empty_cache_steps: None
  • auto_find_batch_size: False
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • include_num_input_tokens_seen: no
  • log_level: passive
  • log_level_replica: warning
  • disable_tqdm: False
  • project: huggingface
  • trackio_space_id: trackio
  • eval_strategy: no
  • per_device_eval_batch_size: 8
  • prediction_loss_only: True
  • eval_on_start: False
  • eval_do_concat_batches: True
  • eval_use_gather_object: False
  • eval_accumulation_steps: None
  • include_for_metrics: []
  • batch_eval_metrics: False
  • save_only_model: False
  • save_on_each_node: False
  • enable_jit_checkpoint: False
  • push_to_hub: False
  • hub_private_repo: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_always_push: False
  • hub_revision: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • restore_callback_states_from_checkpoint: False
  • full_determinism: False
  • seed: 42
  • data_seed: 42
  • use_cpu: False
  • accelerator_config: {'split_batches': False, 'dispatch_batches': False, 'even_batches': False, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • dataloader_drop_last: True
  • dataloader_num_workers: 0
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • dataloader_prefetch_factor: None
  • remove_unused_columns: False
  • label_names: None
  • train_sampling_strategy: random
  • length_column_name: length
  • ddp_find_unused_parameters: False
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • ddp_backend: None
  • ddp_timeout: 1800
  • fsdp: []
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • deepspeed: None
  • debug: []
  • skip_memory_metrics: True
  • do_predict: False
  • resume_from_checkpoint: None
  • warmup_ratio: None
  • local_rank: -1
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Click to expand
Epoch Step Training Loss
0.0007 100 1.5420
0.0014 200 0.3312
0.0021 300 0.1085
0.0028 400 0.0245
0.0036 500 0.0090
0.0043 600 0.0028
0.0050 700 0.0041
0.0057 800 0.0022
0.0064 900 0.0019
0.0071 1000 0.0021
0.0078 1100 0.0032
0.0085 1200 0.0018
0.0092 1300 0.0012
0.0100 1400 0.0008
0.0107 1500 0.0008
0.0114 1600 0.0012
0.0121 1700 0.0006
0.0128 1800 0.0012
0.0135 1900 0.0007
0.0142 2000 0.0008
0.0149 2100 0.0006
0.0156 2200 0.0008
0.0164 2300 0.0006
0.0171 2400 0.0012
0.0178 2500 0.0008
0.0185 2600 0.0004
0.0192 2700 0.0007
0.0199 2800 0.0286
0.0206 2900 0.0496
0.0213 3000 0.0741
0.0220 3100 0.0006
0.0228 3200 0.0006
0.0235 3300 0.0006
0.0242 3400 0.0005
0.0249 3500 0.0008
0.0256 3600 0.0005
0.0263 3700 0.0002
0.0270 3800 0.0002
0.0277 3900 0.0003
0.0284 4000 0.0016
0.0292 4100 0.0005
0.0299 4200 0.0005
0.0306 4300 0.0002
0.0313 4400 0.0003
0.032 4500 0.0006
0.0327 4600 0.0003
0.0334 4700 0.0007
0.0341 4800 0.0006
0.0348 4900 0.0009
0.0356 5000 0.0007
0.0363 5100 0.0003
0.0370 5200 0.0006
0.0377 5300 0.0005
0.0384 5400 0.0004
0.0391 5500 0.0007
0.0398 5600 2.6074
0.0405 5700 4.1589
0.0412 5800 4.1589
0.0420 5900 4.1589
0.0427 6000 4.1589
0.0434 6100 4.1735
0.0441 6200 4.1589
0.0448 6300 4.1589
0.0455 6400 4.1589
0.0462 6500 4.1589
0.0469 6600 4.1589
0.0476 6700 4.1589
0.0484 6800 4.1599
0.0491 6900 4.1589
0.0498 7000 4.1589
0.0505 7100 4.1589
0.0512 7200 4.1589
0.0519 7300 4.1589
0.0526 7400 4.1589
0.0533 7500 4.1589
0.0540 7600 4.1589
0.0548 7700 4.1589
0.0555 7800 4.1589
0.0562 7900 4.1589
0.0569 8000 4.1589
0.0576 8100 4.1589
0.0583 8200 4.1589
0.0590 8300 4.1589
0.0597 8400 4.1589
0.0604 8500 4.1589
0.0612 8600 4.1589
0.0619 8700 4.1589
0.0626 8800 4.1589
0.0633 8900 4.1589
0.064 9000 4.1589
0.0647 9100 4.1589
0.0654 9200 4.1589
0.0661 9300 4.1589
0.0668 9400 4.1589
0.0676 9500 4.1589
0.0683 9600 4.1589
0.0690 9700 4.1589
0.0697 9800 4.1589
0.0704 9900 4.1589
0.0711 10000 4.1589
0.0718 10100 4.1589
0.0725 10200 4.1589
0.0732 10300 4.1589
0.0740 10400 4.1590
0.0747 10500 4.1589
0.0754 10600 4.1589
0.0761 10700 4.1589
0.0768 10800 4.1589
0.0775 10900 4.1589
0.0782 11000 4.1589
0.0789 11100 4.1589
0.0796 11200 4.1589
0.0804 11300 4.1589
0.0811 11400 4.1589
0.0818 11500 4.1589
0.0825 11600 4.1589
0.0832 11700 4.1589
0.0839 11800 4.1589
0.0846 11900 4.1589
0.0853 12000 4.1589
0.0860 12100 4.1589
0.0868 12200 4.1589
0.0875 12300 4.1589
0.0882 12400 4.1589
0.0889 12500 4.1589
0.0896 12600 4.1589
0.0903 12700 4.1590
0.0910 12800 4.1589
0.0917 12900 4.1589
0.0924 13000 4.1589
0.0932 13100 4.1589
0.0939 13200 4.1589
0.0946 13300 4.1589
0.0953 13400 4.1589
0.096 13500 4.1589
0.0967 13600 4.1589
0.0974 13700 4.1589
0.0981 13800 4.1589
0.0988 13900 4.1589
0.0996 14000 4.1589
0.1003 14100 4.1589
0.1010 14200 4.1589
0.1017 14300 4.1589
0.1024 14400 4.1589
0.1031 14500 4.1589
0.1038 14600 4.1559
0.1045 14700 3.9817
0.1052 14800 3.7477
0.1060 14900 3.5543
0.1067 15000 3.4032
0.1074 15100 3.3075
0.1081 15200 3.3079
0.1088 15300 3.2541
0.1095 15400 3.1995
0.1102 15500 3.1708
0.1109 15600 3.1008
0.1116 15700 3.0541
0.1124 15800 3.0049
0.1131 15900 3.0122
0.1138 16000 2.9579
0.1145 16100 2.9453
0.1152 16200 2.9591
0.1159 16300 2.9199
0.1166 16400 2.9170
0.1173 16500 2.9973
0.1180 16600 2.9514
0.1188 16700 2.8438
0.1195 16800 2.8325
0.1202 16900 2.8506
0.1209 17000 2.8035
0.1216 17100 2.8410
0.1223 17200 2.8628
0.1230 17300 2.8425
0.1237 17400 2.8264
0.1244 17500 2.8800
0.1252 17600 2.8725
0.1259 17700 2.8837
0.1266 17800 2.7461
0.1273 17900 2.7399
0.128 18000 2.6857
0.1287 18100 2.7086
0.1294 18200 2.7013
0.1301 18300 2.7102
0.1308 18400 2.6981
0.1316 18500 2.6890
0.1323 18600 2.6908
0.1330 18700 2.6565
0.1337 18800 2.6742
0.1344 18900 2.6655
0.1351 19000 2.6397
0.1358 19100 2.6440
0.1365 19200 2.6420
0.1372 19300 2.6494
0.1380 19400 2.6536
0.1387 19500 2.7161
0.1394 19600 2.6934
0.1401 19700 2.6851
0.1408 19800 2.6709
0.1415 19900 2.6001
0.1422 20000 2.5986
0.1429 20100 2.6043
0.1436 20200 2.6304
0.1444 20300 2.6342
0.1451 20400 2.6488
0.1458 20500 2.6486
0.1465 20600 2.6653
0.1472 20700 2.6159
0.1479 20800 2.5898
0.1486 20900 2.5710
0.1493 21000 2.5618
0.1500 21100 2.5528
0.1508 21200 2.5666
0.1515 21300 2.5606
0.1522 21400 2.5554
0.1529 21500 2.5547
0.1536 21600 2.5721
0.1543 21700 2.5669
0.1550 21800 2.5349
0.1557 21900 2.5704
0.1564 22000 2.5753
0.1572 22100 2.6133
0.1579 22200 2.6561
0.1586 22300 2.6255
0.1593 22400 2.5821
0.16 22500 2.5775
0.1607 22600 2.5793
0.1614 22700 2.6094
0.1621 22800 2.5806
0.1628 22900 2.5689
0.1636 23000 2.5078
0.1643 23100 2.5385
0.1650 23200 2.6389
0.1657 23300 2.5860
0.1664 23400 2.6136
0.1671 23500 2.5865
0.1678 23600 2.5382
0.1685 23700 2.5487
0.1692 23800 2.5074
0.1700 23900 2.5503
0.1707 24000 2.5343
0.1714 24100 2.5583
0.1721 24200 2.5519
0.1728 24300 2.5200
0.1735 24400 2.5175
0.1742 24500 2.5105
0.1749 24600 2.5066
0.1756 24700 2.4882
0.1764 24800 2.4950
0.1771 24900 2.5010
0.1778 25000 2.5041
0.1785 25100 2.5198
0.1792 25200 2.4849
0.1799 25300 2.4970
0.1806 25400 2.4799
0.1813 25500 2.4916
0.1820 25600 2.4756
0.1828 25700 2.4758
0.1835 25800 2.4501
0.1842 25900 2.4944
0.1849 26000 2.5023
0.1856 26100 2.5249
0.1863 26200 2.5321
0.1870 26300 2.4801
0.1877 26400 2.4995
0.1884 26500 2.4712
0.1892 26600 2.4323
0.1899 26700 2.4498
0.1906 26800 2.4945
0.1913 26900 2.4686
0.192 27000 2.4653
0.1927 27100 2.4965
0.1934 27200 2.5110
0.1941 27300 2.4948
0.1948 27400 2.4674
0.1956 27500 2.4425
0.1963 27600 2.4405
0.1970 27700 2.4515
0.1977 27800 2.4596
0.1984 27900 2.4545
0.1991 28000 2.5055
0.1998 28100 2.5854
0.2005 28200 2.5503
0.2012 28300 2.4410
0.2020 28400 2.4647
0.2027 28500 2.4935
0.2034 28600 2.4540
0.2041 28700 2.5213
0.2048 28800 2.5248
0.2055 28900 2.5340
0.2062 29000 2.5297
0.2069 29100 2.5805
0.2076 29200 2.5471
0.2084 29300 2.5704
0.2091 29400 2.4198
0.2098 29500 2.4242
0.2105 29600 2.3726
0.2112 29700 2.3942
0.2119 29800 2.3843
0.2126 29900 2.4082
0.2133 30000 2.4050
0.2140 30100 2.4028
0.2148 30200 2.3963
0.2155 30300 2.3762
0.2162 30400 2.4019
0.2169 30500 2.3814
0.2176 30600 2.3791
0.2183 30700 2.3902
0.2190 30800 2.3966
0.2197 30900 2.3916
0.2204 31000 2.4011
0.2212 31100 2.4296
0.2219 31200 2.4453
0.2226 31300 2.4584
0.2233 31400 2.4195
0.224 31500 2.3436
0.2247 31600 2.3490
0.2254 31700 2.3700
0.2261 31800 2.3618
0.2268 31900 2.3903
0.2276 32000 2.4286

Framework Versions

  • Python: 3.11.12
  • Sentence Transformers: 5.3.0
  • Transformers: 5.3.0
  • PyTorch: 2.10.0+cu128
  • Accelerate: 1.13.0
  • Datasets: 4.8.4
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{oord2019representationlearningcontrastivepredictive,
      title={Representation Learning with Contrastive Predictive Coding},
      author={Aaron van den Oord and Yazhe Li and Oriol Vinyals},
      year={2019},
      eprint={1807.03748},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/1807.03748},
}
Downloads last month
15
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for iara-project/BERTimbau-large-simcse-pt-ckpt-32000

Finetuned
(68)
this model

Papers for iara-project/BERTimbau-large-simcse-pt-ckpt-32000