SentenceTransformer based on facebook/mcontriever-msmarco

This is a sentence-transformers model finetuned from facebook/mcontriever-msmarco. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: facebook/mcontriever-msmarco
  • Maximum Sequence Length: 100 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 100, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    '펠릿 이송의 한계는 무엇인가?',
    '그러나 펠릿 이송은 매우 복잡한 유동형태로서 해석 및 실험적 접근 방법을 통해 해결하는 것이 바람직하지만, 지금까지는 주로 현장 경험에 의존하고 있어 효율 향상에 한계가 있어왔다.',
    '한계점을 가지고 있다',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 462,780 training samples
  • Columns: sentence_0, sentence_1, and sentence_2
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 sentence_2
    type string string string
    details
    • min: 6 tokens
    • mean: 22.32 tokens
    • max: 100 tokens
    • min: 4 tokens
    • mean: 76.74 tokens
    • max: 100 tokens
    • min: 6 tokens
    • mean: 60.36 tokens
    • max: 100 tokens
  • Samples:
    sentence_0 sentence_1 sentence_2
    블록 조립의 품질 및 생산성 향상을 위해 수행된 연구는? 블록 조립의 품질 및 생산성 향상을 위해서 자동화 용접 로봇, 설비 자동화, 공정 및 공법 시뮬레이션 등에 관한 연구는 다수 수행되어 왔다 (Lee & Kim. 관한 연구가 문헌상으로 발표되지 않는 이유로는 각 조선소의 생산공정 상일정 오차 범위 내를 만족하는 조립 블록을 여러 개 모으는 탑재 작업에서 블록 조립에서 발생하는 오차를 없애는 후작업이 이루어지고 있기 때문으로 판단된다.
    설마천 시험유역은 어디에 위치하는가? 설마천 시험유역은 한국건설기술연구원에서 1995년부터 운영해 온 유역으로 경기도 파주시 적성면 설마리와 구읍리에 위치하고 있다 설마천 시험유역은 1995년부터 한국건설기술연구원에서 운영하고 있는 유역으로 매우 작은 산지 유역임에도 우량관측소 6개소, 하천수위관측소 2개소(사방교 수위관측소는 2011년 7월 26일∼7월 28일 호우 시 유실), 지하수위관측소 2개소 및 자동기상관측소(AWS) 1개소를 운영하고 있으므로 비교적 많은 관측자료를 확보할 수 있어 향후 다양한 분석이 가능한 유역이다.
    노인의 인지기능을 측정하기 위한 두 가지 도구는 무엇인가? 첫째 MMSE-K1이라는 도구를 사용하였다. 이는 연령과 교육기간에 따라 인지기능 검사 도구를 이용하여 치매, 인지수준을 검사할 때 여러 요인에 의해 영향을 받는데 연령, 지능, 교육수준, 성별 등이다.
    권과 박(1989)은 정상노인에서 MMSE-K가 25,1± 3.9로 밝히고 있으나 본 연구 대상자는 이보다 높아 26.92±1.69를 나타내고 있다.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 1,
        "similarity_fct": "dot_score"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • num_train_epochs: 10
  • fp16: True
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 10
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Click to expand
Epoch Step Training Loss
0.0691 500 1.2496
0.1383 1000 0.5955
0.2074 1500 0.5039
0.2766 2000 0.4379
0.3457 2500 0.4055
0.4149 3000 0.3666
0.4840 3500 0.348
0.5532 4000 0.3099
0.6223 4500 0.2886
0.6915 5000 0.2592
0.7606 5500 0.2563
0.8298 6000 0.2291
0.8989 6500 0.2167
0.9681 7000 0.2001
1.0372 7500 0.1795
1.1063 8000 0.1616
1.1755 8500 0.1286
1.2446 9000 0.1195
1.3138 9500 0.1141
1.3829 10000 0.1067
1.4521 10500 0.1051
1.5212 11000 0.1007
1.5904 11500 0.0952
1.6595 12000 0.0861
1.7287 12500 0.0851
1.7978 13000 0.0883
1.8670 13500 0.0782
1.9361 14000 0.0776
2.0053 14500 0.0709
2.0744 15000 0.0702
2.1435 15500 0.0642
2.2127 16000 0.0548
2.2818 16500 0.0534
2.3510 17000 0.0539
2.4201 17500 0.0518
2.4893 18000 0.0544
2.5584 18500 0.0459
2.6276 19000 0.0476
2.6967 19500 0.0436
2.7659 20000 0.047
2.8350 20500 0.0403
2.9042 21000 0.0438
2.9733 21500 0.0439
3.0425 22000 0.035
3.1116 22500 0.0399
3.1807 23000 0.0335
3.2499 23500 0.0307
3.3190 24000 0.0334
3.3882 24500 0.032
3.4573 25000 0.0308
3.5265 25500 0.0282
3.5956 26000 0.0279
3.6648 26500 0.0281
3.7339 27000 0.0274
3.8031 27500 0.0284
3.8722 28000 0.0297
3.9414 28500 0.0289
4.0105 29000 0.0242
4.0797 29500 0.0246
4.1488 30000 0.0237
4.2180 30500 0.0211
4.2871 31000 0.0217
4.3562 31500 0.021
4.4254 32000 0.0206
4.4945 32500 0.0215
4.5637 33000 0.0176
4.6328 33500 0.0198
4.7020 34000 0.0194
4.7711 34500 0.0184
4.8403 35000 0.0189
4.9094 35500 0.0185
4.9786 36000 0.0184
5.0477 36500 0.0154
5.1169 37000 0.0168
5.1860 37500 0.0153
5.2552 38000 0.0156
5.3243 38500 0.0135
5.3934 39000 0.0126
5.4626 39500 0.0151
5.5317 40000 0.0139
5.6009 40500 0.0133
5.6700 41000 0.0119
5.7392 41500 0.0123
5.8083 42000 0.0138
5.8775 42500 0.0135
5.9466 43000 0.0112
6.0158 43500 0.011
6.0849 44000 0.0109
6.1541 44500 0.0116
6.2232 45000 0.0106
6.2924 45500 0.0109
6.3615 46000 0.01
6.4306 46500 0.0107
6.4998 47000 0.0095
6.5689 47500 0.0109
6.6381 48000 0.0092
6.7072 48500 0.0079
6.7764 49000 0.0102
6.8455 49500 0.0091
6.9147 50000 0.0091
6.9838 50500 0.0083
7.0530 51000 0.0072
7.1221 51500 0.0086
7.1913 52000 0.0087
7.2604 52500 0.0069
7.3296 53000 0.0084
7.3987 53500 0.0077
7.4678 54000 0.0078
7.5370 54500 0.0062
7.6061 55000 0.0064
7.6753 55500 0.006
7.7444 56000 0.0092
7.8136 56500 0.0063
7.8827 57000 0.007
7.9519 57500 0.005
8.0210 58000 0.0048
8.0902 58500 0.0062
8.1593 59000 0.0056
8.2285 59500 0.0045
8.2976 60000 0.0061
8.3668 60500 0.0066
8.4359 61000 0.0053
8.5050 61500 0.006
8.5742 62000 0.0059
8.6433 62500 0.0049
8.7125 63000 0.0038
8.7816 63500 0.0046
8.8508 64000 0.0045
8.9199 64500 0.0038
8.9891 65000 0.004
9.0582 65500 0.0036
9.1274 66000 0.0047
9.1965 66500 0.0039
9.2657 67000 0.0041
9.3348 67500 0.0038
9.4040 68000 0.0057
9.4731 68500 0.004
9.5422 69000 0.0044
9.6114 69500 0.0042
9.6805 70000 0.0038
9.7497 70500 0.0029
9.8188 71000 0.0035
9.8880 71500 0.0032
9.9571 72000 0.0037

Framework Versions

  • Python: 3.11.13
  • Sentence Transformers: 3.3.1
  • Transformers: 4.46.3
  • PyTorch: 2.5.1+cu124
  • Accelerate: 1.1.1
  • Datasets: 3.1.0
  • Tokenizers: 0.20.3

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
1
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for minjeongB/recomp-mcontriever-kisti

Finetuned
(2)
this model

Papers for minjeongB/recomp-mcontriever-kisti