SentenceTransformer based on BSC-LT/MrBERT-es
This is a sentence-transformers model finetuned from BSC-LT/MrBERT-es. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
About This Project
This model was trained using the Transformer Encoder Frankenstein framework - a config-driven training library and CLI for end-to-end NLP workflows.
The Frankenstein Transformer provides:
- Schema-driven configuration: Strict YAML schema validation for reproducible training
- Thermal stability controls: GPU temperature management for safe long-term training
- Advanced optimizer support: Multiple optimizer implementations (AdamW, AdaFactor, GaLore, Lion, Muon, Sophia, and more)
- SBERT workflows: Specialized sentence-embedding fine-tuning and inference tools
- Deployment artifact generation: Model quantization and deployment utilities
- Inference modes: Single text, batch, and benchmark inference capabilities
Visit the Transformer Encoder Frankenstein repository for more information, documentation, and usage examples.
Evaluation Results (STSES Dataset)
This model achieves strong performance on the Spanish Semantic Textual Similarity Evaluation Set (STSES):
| Metric | Score |
|---|---|
| Pearson Cosine Similarity | 0.7527 |
| Spearman Cosine Similarity | 0.7166 |
| Manhattan Pearson | 0.7514 |
| Manhattan Spearman | 0.7162 |
| Euclidean Pearson | 0.7499 |
| Euclidean Spearman | 0.7166 |
| Main Score (Spearman Cosine) | 0.7166 |
| Evaluation Time | 1.15 seconds |
| Languages | Spanish (spa-Latn) |
| MTEB Version | 1.39.7 |
Training Configuration
This model was trained using the following Frankenstein Transformer YAML configuration:
base_model: BSC-LT/MrBERT-es
training:
task: sbert
switch_on_thermal: true
gpu_temp_guard_enabled: true
gpu_temp_resume_threshold_c: 75
gpu_temp_pause_threshold_c: 85
gpu_temp_critical_threshold_c: 88
gpu_temp_poll_interval_seconds: 30
telemetry_log_interval: 1
sbert:
dataset_name: "erickfmm/agentlans__multilingual-sentences__paired_10_sts"
dataset_type: paired_similarity
columns:
sentence1: sentence1
sentence2: sentence2
similarity: similarity
output_dir: "./output/sbert_modernbert"
batch_size: 512
gradient_accumulation_steps: 1
max_grad_norm: 2.0
epochs: 10
warmup_steps: 250
evaluation_steps: 5000
checkpoint_save_steps: 1000
resume_from_checkpoint: true
learning_rate: 1.6e-6
max_train_samples: null
max_eval_samples: 20000
max_seq_length: 8192
pooling_mode: mean
use_amp: false
resample_balanced: false
resample_std: 0.3
standardize_scores: true
Configuration Details
- Base Model: BSC-LT/MrBERT-es - Spanish BERT variant
- Task: Sentence-BERT (SBERT) fine-tuning for semantic similarity
- Thermal Management: Enabled with safeguards (pause at 85°C, resume at 75°C, critical at 88°C)
- Dataset: Multilingual sentence pairs with similarity scores
- Batch Size: 512 samples per batch
- Training Duration: 10 epochs
- Sequence Length: Up to 8,192 tokens (extended from standard 512)
- Learning Rate: 1.6e-6 (very low for stable fine-tuning)
- Pooling: Mean pooling over token embeddings
- Output Dimensionality: 768 dimensions
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: BSC-LT/MrBERT-es
- Maximum Sequence Length: 8192 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
- Dataset Size: 1,175,405 sentence pairs
- Loss Function: Cosine Similarity Loss
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 8192, 'do_lower_case': False, 'architecture': 'ModernBertModel'})
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'Historia La botánica moderna Significado de la botánica como ciencia Los distintos grupos de vegetales participan de manera fundamental en los ciclos de la biosfera.',
'El COPINH exige a las autoridades judiciales y fiscales proceder judicialmente contra los alcaldes municipales, altos funcionarios de SERNA, y contra las empresas y demás sectores involucrados en esta agresión contra el pueblo lenca.',
'Durante la transpiración, el sudor elimina el calor del cuerpo humano por evaporación.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.2126, 0.2099],
# [0.2126, 1.0000, 0.0278],
# [0.2099, 0.0278, 1.0000]])
Evaluation
Metrics
Semantic Similarity
- Dataset:
sts_eval - Evaluated with
EmbeddingSimilarityEvaluator
| Metric | Value |
|---|---|
| pearson_cosine | 0.4611 |
| spearman_cosine | 0.2749 |
Training Details
Training Dataset
Unnamed Dataset
- Size: 1,175,405 training samples
- Columns:
sentence_0,sentence_1, andlabel - Approximate statistics based on the first 1000 samples:
sentence_0 sentence_1 label type string string float details - min: 5 tokens
- mean: 37.17 tokens
- max: 290 tokens
- min: 5 tokens
- mean: 38.26 tokens
- max: 375 tokens
- min: -0.75
- mean: 0.17
- max: 1.0
- Samples:
sentence_0 sentence_1 label Los ahorros de la jubilación podrán usarse para este fin.Sony Ericsson W8 además de todo eso presenta una pantalla táctil de tipo HVGA de 320 x 480 píxeles y la pantalla posee 16.777.216 colores.0.2533760964870453Programas de desarrollo en el cerebelo La transición célula progenitora a neurona madura, implica una serie de cambios morfológicos y moleculares altamente regulada espacial y temporalmente.Dos ejemplos en los que el principio de exclusión relaciona la materia con la ocupación del espacio son las estrellas enanas blancas y las estrellas de neutrones, que se analizan más adelante.0.1902337223291397Bolsa inmobiliaria online en Distrito Federal df, inmuebles en venta y renta, casas, departamentos, locales, terrenos, inmobiliarias, desarrollos, anunciar inmuebles.Otros prefieren hablar de "régimen" o "sistema feudal", para diferenciarlo sutilmente del feudalismo estricto, o de síntesis feudal, para marcar el hecho de que sobreviven en ella rasgos de la antigüedad clásica mezclados con contribuciones germánicas, implicando tanto a instituciones como a elementos productivos, y significó la especificidad del feudalismo europeo occidental como formación económico social frente a otras también feudales, con consecuencias trascendentales en el futuro devenir histórico.0.21721388399600983 - Loss:
CosineSimilarityLosswith these parameters:{ "loss_fct": "torch.nn.modules.loss.MSELoss" }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy: stepsmax_grad_norm: 2.0num_train_epochs: 10multi_dataset_batch_sampler: round_robin
All Hyperparameters
Click to expand
overwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 8per_device_eval_batch_size: 8per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 2.0num_train_epochs: 10max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: Nonewarmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedeepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthproject: huggingfacetrackio_space_id: trackioddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters:auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: noneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Trueprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robinrouter_mapping: {}learning_rate_mapping: {}
Training Logs
Click to expand
| Epoch | Step | Training Loss | sts_eval_spearman_cosine |
|---|---|---|---|
| 3.9714 | 583500 | 0.0253 | 0.2725 |
| 3.9748 | 584000 | 0.0274 | 0.2733 |
| 3.9782 | 584500 | 0.0279 | 0.2711 |
| 3.9816 | 585000 | 0.0248 | 0.2708 |
| 3.9850 | 585500 | 0.0264 | 0.2676 |
| 3.9884 | 586000 | 0.0267 | 0.2713 |
| 3.9918 | 586500 | 0.0276 | 0.2703 |
| 3.9952 | 587000 | 0.0273 | 0.2674 |
| 3.9986 | 587500 | 0.0278 | 0.2688 |
| 4.0 | 587704 | - | 0.2672 |
| 4.0020 | 588000 | 0.0259 | 0.2675 |
| 4.0054 | 588500 | 0.0257 | 0.2697 |
| 4.0088 | 589000 | 0.0268 | 0.2694 |
| 4.0122 | 589500 | 0.0256 | 0.2706 |
| 4.0156 | 590000 | 0.0254 | 0.2706 |
| 4.0190 | 590500 | 0.0263 | 0.2695 |
| 4.0224 | 591000 | 0.0274 | 0.2691 |
| 4.0258 | 591500 | 0.0255 | 0.2712 |
| 4.0292 | 592000 | 0.0253 | 0.2696 |
| 4.0326 | 592500 | 0.025 | 0.2692 |
| 4.0360 | 593000 | 0.0263 | 0.2679 |
| 4.0394 | 593500 | 0.028 | 0.2689 |
| 4.0429 | 594000 | 0.0275 | 0.2696 |
| 4.0463 | 594500 | 0.0268 | 0.2699 |
| 4.0497 | 595000 | 0.025 | 0.2686 |
| 4.0531 | 595500 | 0.0277 | 0.2683 |
| 4.0565 | 596000 | 0.0276 | 0.2690 |
| 4.0599 | 596500 | 0.0242 | 0.2686 |
| 4.0633 | 597000 | 0.0264 | 0.2691 |
| 4.0667 | 597500 | 0.0273 | 0.2681 |
| 4.0701 | 598000 | 0.0269 | 0.2693 |
| 4.0735 | 598500 | 0.0274 | 0.2698 |
| 4.0769 | 599000 | 0.0252 | 0.2704 |
| 4.0803 | 599500 | 0.0268 | 0.2708 |
| 4.0837 | 600000 | 0.0259 | 0.2696 |
| 4.0871 | 600500 | 0.0277 | 0.2689 |
| 4.0905 | 601000 | 0.0262 | 0.2663 |
| 4.0939 | 601500 | 0.0266 | 0.2697 |
| 4.0973 | 602000 | 0.0269 | 0.2700 |
| 4.1007 | 602500 | 0.0253 | 0.2673 |
| 4.1041 | 603000 | 0.0281 | 0.2684 |
| 4.1075 | 603500 | 0.0263 | 0.2687 |
| 4.1109 | 604000 | 0.028 | 0.2677 |
| 4.1143 | 604500 | 0.0277 | 0.2701 |
| 4.1177 | 605000 | 0.0273 | 0.2686 |
| 4.1211 | 605500 | 0.0253 | 0.2681 |
| 4.1245 | 606000 | 0.0264 | 0.2694 |
| 4.1279 | 606500 | 0.0281 | 0.2706 |
| 4.1313 | 607000 | 0.0262 | 0.2714 |
| 4.1347 | 607500 | 0.0265 | 0.2673 |
| 4.1381 | 608000 | 0.0254 | 0.2685 |
| 4.1415 | 608500 | 0.0279 | 0.2674 |
| 4.1449 | 609000 | 0.0284 | 0.2692 |
| 4.1483 | 609500 | 0.0283 | 0.2680 |
| 4.1517 | 610000 | 0.0277 | 0.2673 |
| 4.1552 | 610500 | 0.0264 | 0.2692 |
| 4.1586 | 611000 | 0.0261 | 0.2687 |
| 4.1620 | 611500 | 0.0273 | 0.2697 |
| 4.1654 | 612000 | 0.027 | 0.2697 |
| 4.1688 | 612500 | 0.0274 | 0.2696 |
| 4.1722 | 613000 | 0.0273 | 0.2698 |
| 4.1756 | 613500 | 0.0255 | 0.2659 |
| 4.1790 | 614000 | 0.0274 | 0.2660 |
| 4.1824 | 614500 | 0.0284 | 0.2666 |
| 4.1858 | 615000 | 0.0268 | 0.2680 |
| 4.1892 | 615500 | 0.0278 | 0.2674 |
| 4.1926 | 616000 | 0.0276 | 0.2684 |
| 4.1960 | 616500 | 0.026 | 0.2700 |
| 4.1994 | 617000 | 0.0266 | 0.2686 |
| 4.2028 | 617500 | 0.0266 | 0.2680 |
| 4.2062 | 618000 | 0.0277 | 0.2678 |
| 4.2096 | 618500 | 0.0291 | 0.2649 |
| 4.2130 | 619000 | 0.0281 | 0.2635 |
| 4.2164 | 619500 | 0.0291 | 0.2659 |
| 4.2198 | 620000 | 0.0281 | 0.2672 |
| 4.2232 | 620500 | 0.0282 | 0.2655 |
| 4.2266 | 621000 | 0.0287 | 0.2648 |
| 4.2300 | 621500 | 0.0285 | 0.2640 |
| 4.2334 | 622000 | 0.0282 | 0.2645 |
| 4.2368 | 622500 | 0.027 | 0.2674 |
| 4.2402 | 623000 | 0.0268 | 0.2669 |
| 4.2436 | 623500 | 0.0291 | 0.2663 |
| 4.2470 | 624000 | 0.0291 | 0.2645 |
| 4.2504 | 624500 | 0.0277 | 0.2677 |
| 4.2538 | 625000 | 0.0273 | 0.2631 |
| 4.2572 | 625500 | 0.0265 | 0.2653 |
| 4.2606 | 626000 | 0.0276 | 0.2665 |
| 4.2641 | 626500 | 0.027 | 0.2654 |
| 4.2675 | 627000 | 0.0271 | 0.2659 |
| 4.2709 | 627500 | 0.0279 | 0.2659 |
| 4.2743 | 628000 | 0.0274 | 0.2648 |
| 4.2777 | 628500 | 0.0263 | 0.2659 |
| 4.2811 | 629000 | 0.0279 | 0.2665 |
| 4.2845 | 629500 | 0.028 | 0.2677 |
| 4.2879 | 630000 | 0.0299 | 0.2701 |
| 4.2913 | 630500 | 0.0284 | 0.2688 |
| 4.2947 | 631000 | 0.0269 | 0.2683 |
| 4.2981 | 631500 | 0.0271 | 0.2689 |
| 4.3015 | 632000 | 0.0288 | 0.2680 |
| 4.3049 | 632500 | 0.0274 | 0.2674 |
| 4.3083 | 633000 | 0.0277 | 0.2675 |
| 4.3117 | 633500 | 0.0282 | 0.2671 |
| 4.3151 | 634000 | 0.0266 | 0.2658 |
| 4.3185 | 634500 | 0.0284 | 0.2648 |
| 4.3219 | 635000 | 0.0283 | 0.2637 |
| 4.3253 | 635500 | 0.0283 | 0.2647 |
| 4.3287 | 636000 | 0.0281 | 0.2641 |
| 4.3321 | 636500 | 0.0275 | 0.2620 |
| 4.3355 | 637000 | 0.0272 | 0.2630 |
| 4.3389 | 637500 | 0.0282 | 0.2642 |
| 4.3423 | 638000 | 0.0294 | 0.2664 |
| 4.3457 | 638500 | 0.0283 | 0.2639 |
| 4.3491 | 639000 | 0.0262 | 0.2663 |
| 4.3525 | 639500 | 0.0275 | 0.2671 |
| 4.3559 | 640000 | 0.0298 | 0.2669 |
| 4.3593 | 640500 | 0.0292 | 0.2693 |
| 4.3627 | 641000 | 0.0283 | 0.2673 |
| 4.3661 | 641500 | 0.027 | 0.2687 |
| 4.3695 | 642000 | 0.0278 | 0.2663 |
| 4.3729 | 642500 | 0.0301 | 0.2652 |
| 4.3764 | 643000 | 0.0275 | 0.2676 |
| 4.3798 | 643500 | 0.0292 | 0.2680 |
| 4.3832 | 644000 | 0.0266 | 0.2680 |
| 4.3866 | 644500 | 0.0283 | 0.2668 |
| 4.3900 | 645000 | 0.0303 | 0.2677 |
| 4.3934 | 645500 | 0.0299 | 0.2701 |
| 4.3968 | 646000 | 0.0284 | 0.2680 |
| 4.4002 | 646500 | 0.0272 | 0.2664 |
| 4.4036 | 647000 | 0.0297 | 0.2662 |
| 4.4070 | 647500 | 0.029 | 0.2661 |
| 4.4104 | 648000 | 0.0281 | 0.2678 |
| 4.4138 | 648500 | 0.0282 | 0.2683 |
| 4.4172 | 649000 | 0.0278 | 0.2699 |
| 4.4206 | 649500 | 0.0309 | 0.2684 |
| 4.4240 | 650000 | 0.0288 | 0.2693 |
| 4.4274 | 650500 | 0.0307 | 0.2697 |
| 4.4308 | 651000 | 0.0272 | 0.2722 |
| 4.4342 | 651500 | 0.0289 | 0.2726 |
| 4.4376 | 652000 | 0.0288 | 0.2716 |
| 4.4410 | 652500 | 0.0289 | 0.2729 |
| 4.4444 | 653000 | 0.0297 | 0.2699 |
| 4.4478 | 653500 | 0.0286 | 0.2724 |
| 4.4512 | 654000 | 0.0298 | 0.2702 |
| 4.4546 | 654500 | 0.0302 | 0.2738 |
| 4.4580 | 655000 | 0.0292 | 0.2713 |
| 4.4614 | 655500 | 0.0297 | 0.2712 |
| 4.4648 | 656000 | 0.0286 | 0.2705 |
| 4.4682 | 656500 | 0.0285 | 0.2735 |
| 4.4716 | 657000 | 0.0294 | 0.2733 |
| 4.4750 | 657500 | 0.0291 | 0.2722 |
| 4.4784 | 658000 | 0.0283 | 0.2708 |
| 4.4818 | 658500 | 0.028 | 0.2714 |
| 4.4853 | 659000 | 0.0298 | 0.2716 |
| 4.4887 | 659500 | 0.0275 | 0.2721 |
| 4.4921 | 660000 | 0.0314 | 0.2731 |
| 4.4955 | 660500 | 0.0292 | 0.2730 |
| 4.4989 | 661000 | 0.029 | 0.2749 |
Framework Versions
- Python: 3.9.25
- Sentence Transformers: 5.1.2
- Transformers: 4.57.6
- PyTorch: 2.6.0+cu118
- Accelerate: 1.10.1
- Datasets: 4.5.0
- Tokenizers: 0.22.2
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
- Downloads last month
- 116
Model tree for erickfmm/mrbert-es-sbert-ft
Paper for erickfmm/mrbert-es-sbert-ft
Evaluation results
- Pearson Cosine on STSESself-reported0.753
- Spearman Cosine on STSESself-reported0.717