SentenceTransformer based on BSC-LT/MrBERT-es

This is a sentence-transformers model finetuned from BSC-LT/MrBERT-es. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

About This Project

This model was trained using the Transformer Encoder Frankenstein framework - a config-driven training library and CLI for end-to-end NLP workflows.

The Frankenstein Transformer provides:

  • Schema-driven configuration: Strict YAML schema validation for reproducible training
  • Thermal stability controls: GPU temperature management for safe long-term training
  • Advanced optimizer support: Multiple optimizer implementations (AdamW, AdaFactor, GaLore, Lion, Muon, Sophia, and more)
  • SBERT workflows: Specialized sentence-embedding fine-tuning and inference tools
  • Deployment artifact generation: Model quantization and deployment utilities
  • Inference modes: Single text, batch, and benchmark inference capabilities

Visit the Transformer Encoder Frankenstein repository for more information, documentation, and usage examples.

Evaluation Results (STSES Dataset)

This model achieves strong performance on the Spanish Semantic Textual Similarity Evaluation Set (STSES):

Metric Score
Pearson Cosine Similarity 0.7527
Spearman Cosine Similarity 0.7166
Manhattan Pearson 0.7514
Manhattan Spearman 0.7162
Euclidean Pearson 0.7499
Euclidean Spearman 0.7166
Main Score (Spearman Cosine) 0.7166
Evaluation Time 1.15 seconds
Languages Spanish (spa-Latn)
MTEB Version 1.39.7

Training Configuration

This model was trained using the following Frankenstein Transformer YAML configuration:

base_model: BSC-LT/MrBERT-es
training:
  task: sbert
  switch_on_thermal: true
  gpu_temp_guard_enabled: true
  gpu_temp_resume_threshold_c: 75
  gpu_temp_pause_threshold_c: 85
  gpu_temp_critical_threshold_c: 88
  gpu_temp_poll_interval_seconds: 30
  telemetry_log_interval: 1
  sbert:
    dataset_name: "erickfmm/agentlans__multilingual-sentences__paired_10_sts"
    dataset_type: paired_similarity
    columns:
      sentence1: sentence1
      sentence2: sentence2
      similarity: similarity
    output_dir: "./output/sbert_modernbert"
    batch_size: 512
    gradient_accumulation_steps: 1
    max_grad_norm: 2.0
    epochs: 10
    warmup_steps: 250
    evaluation_steps: 5000
    checkpoint_save_steps: 1000
    resume_from_checkpoint: true
    learning_rate: 1.6e-6
    max_train_samples: null
    max_eval_samples: 20000
    max_seq_length: 8192
    pooling_mode: mean
    use_amp: false
    resample_balanced: false
    resample_std: 0.3
    standardize_scores: true

Configuration Details

  • Base Model: BSC-LT/MrBERT-es - Spanish BERT variant
  • Task: Sentence-BERT (SBERT) fine-tuning for semantic similarity
  • Thermal Management: Enabled with safeguards (pause at 85°C, resume at 75°C, critical at 88°C)
  • Dataset: Multilingual sentence pairs with similarity scores
  • Batch Size: 512 samples per batch
  • Training Duration: 10 epochs
  • Sequence Length: Up to 8,192 tokens (extended from standard 512)
  • Learning Rate: 1.6e-6 (very low for stable fine-tuning)
  • Pooling: Mean pooling over token embeddings
  • Output Dimensionality: 768 dimensions

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BSC-LT/MrBERT-es
  • Maximum Sequence Length: 8192 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Dataset Size: 1,175,405 sentence pairs
  • Loss Function: Cosine Similarity Loss

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False, 'architecture': 'ModernBertModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'Historia La botánica moderna Significado de la botánica como ciencia Los distintos grupos de vegetales participan de manera fundamental en los ciclos de la biosfera.',
    'El COPINH exige a las autoridades judiciales y fiscales proceder judicialmente contra los alcaldes municipales, altos funcionarios de SERNA, y contra las empresas y demás sectores involucrados en esta agresión contra el pueblo lenca.',
    'Durante la transpiración, el sudor elimina el calor del cuerpo humano por evaporación.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.2126, 0.2099],
#         [0.2126, 1.0000, 0.0278],
#         [0.2099, 0.0278, 1.0000]])

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.4611
spearman_cosine 0.2749

Training Details

Training Dataset

Unnamed Dataset

  • Size: 1,175,405 training samples
  • Columns: sentence_0, sentence_1, and label
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 label
    type string string float
    details
    • min: 5 tokens
    • mean: 37.17 tokens
    • max: 290 tokens
    • min: 5 tokens
    • mean: 38.26 tokens
    • max: 375 tokens
    • min: -0.75
    • mean: 0.17
    • max: 1.0
  • Samples:
    sentence_0 sentence_1 label
    Los ahorros de la jubilación podrán usarse para este fin. Sony Ericsson W8 además de todo eso presenta una pantalla táctil de tipo HVGA de 320 x 480 píxeles y la pantalla posee 16.777.216 colores. 0.2533760964870453
    Programas de desarrollo en el cerebelo La transición célula progenitora a neurona madura, implica una serie de cambios morfológicos y moleculares altamente regulada espacial y temporalmente. Dos ejemplos en los que el principio de exclusión relaciona la materia con la ocupación del espacio son las estrellas enanas blancas y las estrellas de neutrones, que se analizan más adelante. 0.1902337223291397
    Bolsa inmobiliaria online en Distrito Federal df, inmuebles en venta y renta, casas, departamentos, locales, terrenos, inmobiliarias, desarrollos, anunciar inmuebles. Otros prefieren hablar de "régimen" o "sistema feudal", para diferenciarlo sutilmente del feudalismo estricto, o de síntesis feudal, para marcar el hecho de que sobreviven en ella rasgos de la antigüedad clásica mezclados con contribuciones germánicas, implicando tanto a instituciones como a elementos productivos, y significó la especificidad del feudalismo europeo occidental como formación económico social frente a otras también feudales, con consecuencias trascendentales en el futuro devenir histórico. 0.21721388399600983
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • max_grad_norm: 2.0
  • num_train_epochs: 10
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 8
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 2.0
  • num_train_epochs: 10
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: None
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Click to expand
Epoch Step Training Loss sts_eval_spearman_cosine
3.9714 583500 0.0253 0.2725
3.9748 584000 0.0274 0.2733
3.9782 584500 0.0279 0.2711
3.9816 585000 0.0248 0.2708
3.9850 585500 0.0264 0.2676
3.9884 586000 0.0267 0.2713
3.9918 586500 0.0276 0.2703
3.9952 587000 0.0273 0.2674
3.9986 587500 0.0278 0.2688
4.0 587704 - 0.2672
4.0020 588000 0.0259 0.2675
4.0054 588500 0.0257 0.2697
4.0088 589000 0.0268 0.2694
4.0122 589500 0.0256 0.2706
4.0156 590000 0.0254 0.2706
4.0190 590500 0.0263 0.2695
4.0224 591000 0.0274 0.2691
4.0258 591500 0.0255 0.2712
4.0292 592000 0.0253 0.2696
4.0326 592500 0.025 0.2692
4.0360 593000 0.0263 0.2679
4.0394 593500 0.028 0.2689
4.0429 594000 0.0275 0.2696
4.0463 594500 0.0268 0.2699
4.0497 595000 0.025 0.2686
4.0531 595500 0.0277 0.2683
4.0565 596000 0.0276 0.2690
4.0599 596500 0.0242 0.2686
4.0633 597000 0.0264 0.2691
4.0667 597500 0.0273 0.2681
4.0701 598000 0.0269 0.2693
4.0735 598500 0.0274 0.2698
4.0769 599000 0.0252 0.2704
4.0803 599500 0.0268 0.2708
4.0837 600000 0.0259 0.2696
4.0871 600500 0.0277 0.2689
4.0905 601000 0.0262 0.2663
4.0939 601500 0.0266 0.2697
4.0973 602000 0.0269 0.2700
4.1007 602500 0.0253 0.2673
4.1041 603000 0.0281 0.2684
4.1075 603500 0.0263 0.2687
4.1109 604000 0.028 0.2677
4.1143 604500 0.0277 0.2701
4.1177 605000 0.0273 0.2686
4.1211 605500 0.0253 0.2681
4.1245 606000 0.0264 0.2694
4.1279 606500 0.0281 0.2706
4.1313 607000 0.0262 0.2714
4.1347 607500 0.0265 0.2673
4.1381 608000 0.0254 0.2685
4.1415 608500 0.0279 0.2674
4.1449 609000 0.0284 0.2692
4.1483 609500 0.0283 0.2680
4.1517 610000 0.0277 0.2673
4.1552 610500 0.0264 0.2692
4.1586 611000 0.0261 0.2687
4.1620 611500 0.0273 0.2697
4.1654 612000 0.027 0.2697
4.1688 612500 0.0274 0.2696
4.1722 613000 0.0273 0.2698
4.1756 613500 0.0255 0.2659
4.1790 614000 0.0274 0.2660
4.1824 614500 0.0284 0.2666
4.1858 615000 0.0268 0.2680
4.1892 615500 0.0278 0.2674
4.1926 616000 0.0276 0.2684
4.1960 616500 0.026 0.2700
4.1994 617000 0.0266 0.2686
4.2028 617500 0.0266 0.2680
4.2062 618000 0.0277 0.2678
4.2096 618500 0.0291 0.2649
4.2130 619000 0.0281 0.2635
4.2164 619500 0.0291 0.2659
4.2198 620000 0.0281 0.2672
4.2232 620500 0.0282 0.2655
4.2266 621000 0.0287 0.2648
4.2300 621500 0.0285 0.2640
4.2334 622000 0.0282 0.2645
4.2368 622500 0.027 0.2674
4.2402 623000 0.0268 0.2669
4.2436 623500 0.0291 0.2663
4.2470 624000 0.0291 0.2645
4.2504 624500 0.0277 0.2677
4.2538 625000 0.0273 0.2631
4.2572 625500 0.0265 0.2653
4.2606 626000 0.0276 0.2665
4.2641 626500 0.027 0.2654
4.2675 627000 0.0271 0.2659
4.2709 627500 0.0279 0.2659
4.2743 628000 0.0274 0.2648
4.2777 628500 0.0263 0.2659
4.2811 629000 0.0279 0.2665
4.2845 629500 0.028 0.2677
4.2879 630000 0.0299 0.2701
4.2913 630500 0.0284 0.2688
4.2947 631000 0.0269 0.2683
4.2981 631500 0.0271 0.2689
4.3015 632000 0.0288 0.2680
4.3049 632500 0.0274 0.2674
4.3083 633000 0.0277 0.2675
4.3117 633500 0.0282 0.2671
4.3151 634000 0.0266 0.2658
4.3185 634500 0.0284 0.2648
4.3219 635000 0.0283 0.2637
4.3253 635500 0.0283 0.2647
4.3287 636000 0.0281 0.2641
4.3321 636500 0.0275 0.2620
4.3355 637000 0.0272 0.2630
4.3389 637500 0.0282 0.2642
4.3423 638000 0.0294 0.2664
4.3457 638500 0.0283 0.2639
4.3491 639000 0.0262 0.2663
4.3525 639500 0.0275 0.2671
4.3559 640000 0.0298 0.2669
4.3593 640500 0.0292 0.2693
4.3627 641000 0.0283 0.2673
4.3661 641500 0.027 0.2687
4.3695 642000 0.0278 0.2663
4.3729 642500 0.0301 0.2652
4.3764 643000 0.0275 0.2676
4.3798 643500 0.0292 0.2680
4.3832 644000 0.0266 0.2680
4.3866 644500 0.0283 0.2668
4.3900 645000 0.0303 0.2677
4.3934 645500 0.0299 0.2701
4.3968 646000 0.0284 0.2680
4.4002 646500 0.0272 0.2664
4.4036 647000 0.0297 0.2662
4.4070 647500 0.029 0.2661
4.4104 648000 0.0281 0.2678
4.4138 648500 0.0282 0.2683
4.4172 649000 0.0278 0.2699
4.4206 649500 0.0309 0.2684
4.4240 650000 0.0288 0.2693
4.4274 650500 0.0307 0.2697
4.4308 651000 0.0272 0.2722
4.4342 651500 0.0289 0.2726
4.4376 652000 0.0288 0.2716
4.4410 652500 0.0289 0.2729
4.4444 653000 0.0297 0.2699
4.4478 653500 0.0286 0.2724
4.4512 654000 0.0298 0.2702
4.4546 654500 0.0302 0.2738
4.4580 655000 0.0292 0.2713
4.4614 655500 0.0297 0.2712
4.4648 656000 0.0286 0.2705
4.4682 656500 0.0285 0.2735
4.4716 657000 0.0294 0.2733
4.4750 657500 0.0291 0.2722
4.4784 658000 0.0283 0.2708
4.4818 658500 0.028 0.2714
4.4853 659000 0.0298 0.2716
4.4887 659500 0.0275 0.2721
4.4921 660000 0.0314 0.2731
4.4955 660500 0.0292 0.2730
4.4989 661000 0.029 0.2749

Framework Versions

  • Python: 3.9.25
  • Sentence Transformers: 5.1.2
  • Transformers: 4.57.6
  • PyTorch: 2.6.0+cu118
  • Accelerate: 1.10.1
  • Datasets: 4.5.0
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
116
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for erickfmm/mrbert-es-sbert-ft

Base model

BSC-LT/MrBERT
Finetuned
BSC-LT/MrBERT-es
Finetuned
(3)
this model

Paper for erickfmm/mrbert-es-sbert-ft

Evaluation results