SentenceTransformer

This is a sentence-transformers model trained. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Maximum Sequence Length: 8192 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False, 'architecture': 'ModernBertModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'Who is responsible for the crane in sharjah?',
    '[FIELD] custodian | [TABLE] tabAsset',
    '[TABLE] tabQuality Inspection | desc: Quality Inspection Record',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ 1.0000,  0.6763, -0.0788],
#         [ 0.6763,  1.0000, -0.0196],
#         [-0.0788, -0.0196,  1.0000]])

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine nan
spearman_cosine nan

Training Details

Training Dataset

Unnamed Dataset

  • Size: 24,529 training samples
  • Columns: sentence1 and sentence2
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2
    type string string
    details
    • min: 5 tokens
    • mean: 11.98 tokens
    • max: 28 tokens
    • min: 7 tokens
    • mean: 16.47 tokens
    • max: 41 tokens
  • Samples:
    sentence1 sentence2
    which customers have pending follow-ups scheduled for today? [FIELD] status | [TABLE] tabLead | desc: Filters for leads requiring follow-up
    Give me all milestones due this week across active projects. [TABLE] tabMilestone Tracker | desc: Milestone tracking
    show me cement bags sent to site last week [FIELD] posting_date | [TABLE] tabDelivery Note | desc: Filters for last week.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • learning_rate: 2e-05
  • num_train_epochs: 5
  • warmup_ratio: 0.1
  • warmup_steps: 0.1
  • fp16: True
  • dataloader_drop_last: True
  • dataloader_num_workers: 2
  • load_best_model_at_end: True

All Hyperparameters

Click to expand
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: None
  • warmup_ratio: 0.1
  • warmup_steps: 0.1
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • enable_jit_checkpoint: False
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • use_cpu: False
  • seed: 42
  • data_seed: None
  • bf16: False
  • fp16: True
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: -1
  • ddp_backend: None
  • debug: []
  • dataloader_drop_last: True
  • dataloader_num_workers: 2
  • dataloader_prefetch_factor: None
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • auto_find_batch_size: False
  • full_determinism: False
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • use_cache: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss eval_spearman_cosine
0.0653 50 1.4199 -
0.1305 100 1.1590 -
0.1958 150 1.1426 -
0.2611 200 0.9971 -
0.3264 250 0.9385 -
0.3916 300 0.8183 -
0.4569 350 0.8028 -
0.5222 400 0.7773 -
0.5875 450 0.7353 -
0.6527 500 0.7948 -
0.7180 550 0.7055 -
0.7833 600 0.6792 -
0.8486 650 0.7091 -
0.9138 700 0.6767 -
0.9791 750 0.6608 -
1.0 766 - nan
1.0444 800 0.5842 -
1.1097 850 0.5622 -
1.1749 900 0.5592 -
1.2402 950 0.5860 -
1.3055 1000 0.5701 -
1.3708 1050 0.5920 -
1.4360 1100 0.5430 -
1.5013 1150 0.5260 -
1.5666 1200 0.5249 -
1.6319 1250 0.5350 -
1.6971 1300 0.5543 -
1.7624 1350 0.5162 -
1.8277 1400 0.5190 -
1.8930 1450 0.5258 -
1.9582 1500 0.5090 -
2.0 1532 - nan
2.0235 1550 0.5103 -
2.0888 1600 0.4416 -
2.1540 1650 0.4374 -
2.2193 1700 0.4092 -
2.2846 1750 0.4072 -
2.3499 1800 0.4321 -
2.4151 1850 0.4519 -
2.4804 1900 0.3977 -
2.5457 1950 0.4490 -
2.6110 2000 0.4542 -
2.6762 2050 0.3865 -
2.7415 2100 0.4113 -
2.8068 2150 0.4366 -
2.8721 2200 0.3925 -
2.9373 2250 0.4150 -
3.0 2298 - nan
3.0026 2300 0.4228 -
3.0679 2350 0.3294 -
3.1332 2400 0.3256 -
3.1984 2450 0.3555 -
3.2637 2500 0.3770 -
3.3290 2550 0.3339 -
3.3943 2600 0.3764 -
3.4595 2650 0.3544 -
3.5248 2700 0.3567 -
3.5901 2750 0.3539 -
3.6554 2800 0.3585 -
3.7206 2850 0.3224 -
3.7859 2900 0.3383 -
3.8512 2950 0.3584 -
3.9164 3000 0.3584 -
3.9817 3050 0.3108 -
4.0 3064 - nan
4.0470 3100 0.3361 -
4.1123 3150 0.3048 -
4.1775 3200 0.3154 -
4.2428 3250 0.2866 -
4.3081 3300 0.2981 -
4.3734 3350 0.3149 -
4.4386 3400 0.3128 -
4.5039 3450 0.3237 -
4.5692 3500 0.3054 -
4.6345 3550 0.3015 -
4.6997 3600 0.2885 -
4.7650 3650 0.2745 -
4.8303 3700 0.3062 -
4.8956 3750 0.2996 -
4.9608 3800 0.3058 -
5.0 3830 - nan

Framework Versions

  • Python: 3.12.12
  • Sentence Transformers: 5.2.3
  • Transformers: 5.0.0
  • PyTorch: 2.10.0+cu128
  • Accelerate: 1.12.0
  • Datasets: 4.0.0
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
805
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for hyrinmansoor/changAI-nomic-embed-text-v1.5-finetuned

Evaluation results