SentenceTransformer based on lightonai/modernbert-embed-large

This is a sentence-transformers model finetuned from lightonai/modernbert-embed-large. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: lightonai/modernbert-embed-large
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 1024 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'ModernBertModel'})
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the πŸ€— Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'Instruct: Retrieve semantically similar text based on safety profile\nQuery: We will see women as popes  when Islamic men stop raping women, playing tahharush, killing gays, forcing them to cover up, not allowing them to drive, stop performing genital mutilation etc. Jk we know neither will ever happen. Now run along and make my white self a sandwich honey!!',
    'Instruct: Retrieve semantically similar text based on safety profile\nQuery: It\'s official. Canadians have been driven to mental illness. \n\n(The test for civility here is confusing. I rated a comment referring to the left as disgusting and vile as not good but civil. There were no personal names. "Vile" means disgusted, and the poster was disgusted. So ??? If that\'s not OK, why is "Fiberals" allowed? That\'s referring to the Liberals as liars.)',
    "Instruct: Retrieve semantically similar text based on safety profile\nQuery: Right, so instead of proving your claim about the Hawaiian haters in this thread, you instead respond with the typical racist Hawaiian stereotypes about haoles.\n\nI see racism in Hawaii because ignorant racists like you pretend that your blind hatred is something other than what it is.\n\nThe fact is I have said nothing different than the kanaka maoil who were arrested outside of the Na'i Aupuni convention, but the difference is, you can not stand hearing it from someone who you assume does not have the magic blood.\n\nNews flash. Every single person on this planet is indigenous. Get over yourself.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.9453, 0.9609],
#         [0.9453, 1.0000, 0.9609],
#         [0.9609, 0.9609, 1.0000]], dtype=torch.bfloat16)

Training Details

Training Dataset

Unnamed Dataset

  • Size: 12,868 training samples
  • Columns: sentence_0, sentence_1, and label
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 label
    type string string float
    details
    • min: 20 tokens
    • mean: 82.03 tokens
    • max: 256 tokens
    • min: 20 tokens
    • mean: 84.69 tokens
    • max: 256 tokens
    • min: 0.0
    • mean: 0.51
    • max: 1.0
  • Samples:
    sentence_0 sentence_1 label
    Instruct: Retrieve semantically similar text based on safety profile
    Query: You threatened to kill people when you made this statement. "Except maybe criminals who have to consider whether their next victim is packing heat. Which by the way, I don't. Probably." Too many of you white folks refuse to take responsibility for your choices. I consider the GOP, in the 21st century, to be little more than a white identity death cult. You love being white and armed more than being human beings or Americans. You have made it clear that you have no problem taking a human life. Do you support the state sponsored killing of unarmed black people? BTW, you didn't respond to my comments about the Vietnam war. Are you not brave?
    Instruct: Retrieve semantically similar text based on safety profile
    Query: Shoot the messenger--an ancient tradition.
    1.0
    Instruct: Retrieve semantically similar text based on safety profile
    Query: This land is OUR land, David. If Seneca wants to do logging, then do it on private land, the problem is they have over cut their private lands and now they want to get their greedy hands on the last of OUR public forests. I have a job, I have been working since I was eleven years old and am nearing retirement age. I have also been a volunteer for over 25 years working to help us transition our economy and consciousness to one that is truly sustainable. We are in the midst of the most devastating crisis humans have ever faced due to burning fossil fuels, deforestation, over fishing, pollution, chemical farming and over human population. We have created the sixth mass species extinction! Wake up, David and the rest of you who think working to protect our last wildlands- and the biosphere in general- is "wacko." What is "wacko" is those of you who are in denial of this global crisis. Face facts.
    Instruct: Retrieve semantically similar text based on safety profile
    Query: Well that's a new level of ineptitude. They literally didn't get their own memo. I'm sure it was entirely accidental...
    1.0
    Instruct: Retrieve semantically similar text based on safety profile
    Query: What's the matter?

    Did he get too close to the truth for you by characterizing Trump as a rich narcissist, materialist, demagogue who doesn't really have any coherent ''program'' other than staying at the center of attention, expanding his family's wealth and influence, and being applauded by his core constituency?
    Instruct: Retrieve semantically similar text based on safety profile
    Query: He can get a translator. Trudeau just speaks nonsense in both.
    1.0
  • Loss: main.OptimizedContrastiveDistillationLoss

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 1
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: None
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss
0.6211 500 0.0186

Framework Versions

  • Python: 3.14.4
  • Sentence Transformers: 5.1.0
  • Transformers: 4.57.6
  • PyTorch: 2.11.0+cu128
  • Accelerate: 1.13.0
  • Datasets: 4.8.4
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
-
Safetensors
Model size
0.4B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for barealek/peftech-v1-plus

Finetuned
(9)
this model

Paper for barealek/peftech-v1-plus