SentenceTransformer based on thenlper/gte-large

This is a sentence-transformers model finetuned from thenlper/gte-large. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: thenlper/gte-large
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 1024 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("JFernandoGRE/gtelarge-colombian-elitenames")
# Run inference
sentences = [
    'ABEL VERA DURAN',
    'VERA JUDITH PADILLA DE MARTINEZ',
    ' JAIMEGARCIA FERNANDEZ',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 18,200 training samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string int
    details
    • min: 4 tokens
    • mean: 8.06 tokens
    • max: 17 tokens
    • min: 4 tokens
    • mean: 8.57 tokens
    • max: 15 tokens
    • 0: ~82.60%
    • 1: ~17.40%
  • Samples:
    sentence1 sentence2 label
    ABDEL CHAHIN GARCIA JAVIER ERNESTO GARCIA RESTREPOC 0
    ALEJANDRO FELIPE VALDERRAMA RUGELES FELIPE VERGARA WILLIAMS 0
    JUAN CARLOS RESTREPO JUAN CARLOS SALAZAR RESTREPO 0
  • Loss: OnlineContrastiveLoss

Evaluation Dataset

Unnamed Dataset

  • Size: 4,551 evaluation samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string int
    details
    • min: 4 tokens
    • mean: 8.1 tokens
    • max: 17 tokens
    • min: 4 tokens
    • mean: 8.62 tokens
    • max: 15 tokens
    • 0: ~85.20%
    • 1: ~14.80%
  • Samples:
    sentence1 sentence2 label
    ABDON AUGUSTO DE JESUS DUQUE ESCOBAR JESUS EFREN TRIVINO DIAZ 0
    LUZ MERY ROJAS CARDENAS LUZ DARY ROA CARDENAS 0
    AGUSTÍN PELÁEZ GAVIRIA FRNACISCO JAVIER GAVIRIA ECHEVERRY 0
  • Loss: OnlineContrastiveLoss

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • learning_rate: 1e-05
  • num_train_epochs: 5
  • warmup_ratio: 0.182
  • fp16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 1e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.182
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss
0.0879 100 0.274 0.3006
0.1757 200 0.1968 0.3069
0.2636 300 0.1497 0.2908
0.3515 400 0.1252 0.2683
0.4394 500 0.1234 0.2578
0.5272 600 0.1185 0.2231
0.6151 700 0.1065 0.1549
0.7030 800 0.0884 0.1871
0.7909 900 0.0818 0.1573
0.8787 1000 0.107 0.1911
0.9666 1100 0.0878 0.1615
1.0545 1200 0.0823 0.1305
1.1424 1300 0.0831 0.1072
1.2302 1400 0.0673 0.1263
1.3181 1500 0.0644 0.1494
1.4060 1600 0.0629 0.1311
1.4938 1700 0.0653 0.1148
1.5817 1800 0.0748 0.0927
1.6696 1900 0.0707 0.1075
1.7575 2000 0.0771 0.1024
1.8453 2100 0.0703 0.1214
1.9332 2200 0.0724 0.1226
2.0211 2300 0.0498 0.1192
2.1090 2400 0.0404 0.1124
2.1968 2500 0.0481 0.1028
2.2847 2600 0.0418 0.1123
2.3726 2700 0.0526 0.0947
2.4605 2800 0.0404 0.0891
2.5483 2900 0.036 0.0930
2.6362 3000 0.046 0.0915
2.7241 3100 0.0406 0.0983
2.8120 3200 0.0394 0.0982
2.8998 3300 0.0473 0.0913
2.9877 3400 0.0396 0.0895
3.0756 3500 0.022 0.1009
3.1634 3600 0.0524 0.0997
3.2513 3700 0.0351 0.0943
3.3392 3800 0.0375 0.0966
3.4271 3900 0.0405 0.0925
3.5149 4000 0.036 0.0923
3.6028 4100 0.0334 0.0917
3.6907 4200 0.0323 0.0956
3.7786 4300 0.0315 0.0964
3.8664 4400 0.0337 0.0962
3.9543 4500 0.036 0.0916
4.0422 4600 0.0318 0.0927
4.1301 4700 0.0185 0.0900
4.2179 4800 0.0294 0.0910
4.3058 4900 0.0267 0.0929
4.3937 5000 0.0208 0.0919
4.4815 5100 0.0233 0.0926
4.5694 5200 0.0256 0.0927
4.6573 5300 0.0341 0.0903
4.7452 5400 0.0308 0.0909
4.8330 5500 0.0329 0.0886
4.9209 5600 0.0188 0.0890

Framework Versions

  • Python: 3.11.12
  • Sentence Transformers: 4.1.0
  • Transformers: 4.51.3
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.5.2
  • Datasets: 3.5.0
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
2
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for JFernandoGRE/gtelarge-colombian-elitenames

Finetuned
(19)
this model

Collection including JFernandoGRE/gtelarge-colombian-elitenames

Paper for JFernandoGRE/gtelarge-colombian-elitenames