SentenceTransformer based on lorenzocc/NeoBERTugues

This is a sentence-transformers model finetuned from lorenzocc/NeoBERTugues. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: lorenzocc/NeoBERTugues
  • Maximum Sequence Length: 1024 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 1024, 'do_lower_case': False, 'architecture': 'ModernBertModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("iara-project/NeoBERTugues-simcse-pt-ckpt-6000")
# Run inference
sentences = [
    'The weather is lovely today.',
    "It's so sunny outside!",
    'He drove to the stadium.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.0221, 0.3829],
#         [0.0221, 1.0000, 0.3356],
#         [0.3829, 0.3356, 1.0000]])

Training Details

Training Dataset

Unnamed Dataset

  • Columns: sentence1 and sentence2
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false,
        "directions": [
            "query_to_doc"
        ],
        "partition_mode": "joint",
        "hardness_mode": null,
        "hardness_strength": 0.0
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 64
  • max_steps: 150000
  • warmup_steps: 0.05
  • optim: adamw_torch
  • weight_decay: 0.01
  • gradient_accumulation_steps: 2
  • fp16: True
  • gradient_checkpointing: True
  • gradient_checkpointing_kwargs: {'use_reentrant': False}
  • data_seed: 42
  • accelerator_config: {'split_batches': False, 'dispatch_batches': False, 'even_batches': False, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • remove_unused_columns: False
  • ddp_find_unused_parameters: False

All Hyperparameters

Click to expand
  • per_device_train_batch_size: 64
  • num_train_epochs: 3.0
  • max_steps: 150000
  • learning_rate: 5e-05
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: None
  • warmup_steps: 0.05
  • optim: adamw_torch
  • optim_args: None
  • weight_decay: 0.01
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • optim_target_modules: None
  • gradient_accumulation_steps: 2
  • average_tokens_across_devices: True
  • max_grad_norm: 1.0
  • label_smoothing_factor: 0.0
  • bf16: False
  • fp16: True
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • gradient_checkpointing: True
  • gradient_checkpointing_kwargs: {'use_reentrant': False}
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • use_liger_kernel: False
  • liger_kernel_config: None
  • use_cache: False
  • neftune_noise_alpha: None
  • torch_empty_cache_steps: None
  • auto_find_batch_size: False
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • include_num_input_tokens_seen: no
  • log_level: passive
  • log_level_replica: warning
  • disable_tqdm: False
  • project: huggingface
  • trackio_space_id: trackio
  • eval_strategy: no
  • per_device_eval_batch_size: 8
  • prediction_loss_only: True
  • eval_on_start: False
  • eval_do_concat_batches: True
  • eval_use_gather_object: False
  • eval_accumulation_steps: None
  • include_for_metrics: []
  • batch_eval_metrics: False
  • save_only_model: False
  • save_on_each_node: False
  • enable_jit_checkpoint: False
  • push_to_hub: False
  • hub_private_repo: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_always_push: False
  • hub_revision: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • restore_callback_states_from_checkpoint: False
  • full_determinism: False
  • seed: 42
  • data_seed: 42
  • use_cpu: False
  • accelerator_config: {'split_batches': False, 'dispatch_batches': False, 'even_batches': False, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • dataloader_drop_last: True
  • dataloader_num_workers: 0
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • dataloader_prefetch_factor: None
  • remove_unused_columns: False
  • label_names: None
  • train_sampling_strategy: random
  • length_column_name: length
  • ddp_find_unused_parameters: False
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • ddp_backend: None
  • ddp_timeout: 1800
  • fsdp: []
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • deepspeed: None
  • debug: []
  • skip_memory_metrics: True
  • do_predict: False
  • resume_from_checkpoint: None
  • warmup_ratio: None
  • local_rank: -1
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss
0.0007 100 1.3221
0.0013 200 0.2384
0.002 300 0.0310
0.0027 400 0.0069
0.0033 500 0.0019
0.004 600 0.0018
0.0047 700 0.0009
0.0053 800 0.0005
0.006 900 0.0007
0.0067 1000 0.0004
0.0073 1100 0.0003
0.008 1200 0.0003
0.0087 1300 0.0006
0.0093 1400 0.0151
0.01 1500 0.0594
0.0107 1600 0.0003
0.0113 1700 0.0002
0.012 1800 0.0003
0.0127 1900 0.0003
0.0133 2000 0.0003
0.014 2100 0.0002
0.0147 2200 0.0002
0.0153 2300 0.0003
0.016 2400 0.0005
0.0167 2500 0.0005
0.0173 2600 0.0002
0.018 2700 0.0000
0.0187 2800 0.0004
0.0193 2900 0.0002
0.02 3000 0.0001
0.0207 3100 0.0004
0.0213 3200 0.0001
0.022 3300 0.0002
0.0227 3400 0.0003
0.0233 3500 0.0005
0.024 3600 0.0004
0.0247 3700 0.0004
0.0253 3800 0.0014
0.026 3900 0.0002
0.0267 4000 0.0001
0.0273 4100 0.0006
0.028 4200 0.0001
0.0287 4300 0.0211
0.0293 4400 0.0523
0.03 4500 0.0005
0.0307 4600 0.0013
0.0313 4700 0.0006
0.032 4800 0.0006
0.0327 4900 0.0006
0.0333 5000 0.0010
0.034 5100 0.0015
0.0347 5200 0.0056
0.0353 5300 0.0006
0.036 5400 0.0002
0.0367 5500 0.0002
0.0373 5600 0.0064
0.038 5700 0.0005
0.0387 5800 0.0002
0.0393 5900 0.0007
0.04 6000 0.0077

Framework Versions

  • Python: 3.11.12
  • Sentence Transformers: 5.3.0
  • Transformers: 5.3.0
  • PyTorch: 2.10.0+cu128
  • Accelerate: 1.13.0
  • Datasets: 4.8.4
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{oord2019representationlearningcontrastivepredictive,
      title={Representation Learning with Contrastive Predictive Coding},
      author={Aaron van den Oord and Yazhe Li and Oriol Vinyals},
      year={2019},
      eprint={1807.03748},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/1807.03748},
}
Downloads last month
6
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for iara-project/NeoBERTugues-simcse-pt-ckpt-6000

Finetuned
(3)
this model

Papers for iara-project/NeoBERTugues-simcse-pt-ckpt-6000