SentenceTransformer based on facebook/mcontriever-msmarco

This is a sentence-transformers model finetuned from facebook/mcontriever-msmarco. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: facebook/mcontriever-msmarco
Maximum Sequence Length: 100 tokens
Output Dimensionality: 768 dimensions
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 100, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    '펠릿 이송의 한계는 무엇인가?',
    '그러나 펠릿 이송은 매우 복잡한 유동형태로서 해석 및 실험적 접근 방법을 통해 해결하는 것이 바람직하지만, 지금까지는 주로 현장 경험에 의존하고 있어 효율 향상에 한계가 있어왔다.',
    '한계점을 가지고 있다',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

Size: 462,780 training samples
Columns: sentence_0, sentence_1, and sentence_2

Approximate statistics based on the first 1000 samples:

	sentence_0	sentence_1	sentence_2
type	string	string	string
details	min: 6 tokens mean: 22.32 tokens max: 100 tokens	min: 4 tokens mean: 76.74 tokens max: 100 tokens	min: 6 tokens mean: 60.36 tokens max: 100 tokens

Samples:

sentence_0	sentence_1	sentence_2
`블록 조립의 품질 및 생산성 향상을 위해 수행된 연구는?`	`블록 조립의 품질 및 생산성 향상을 위해서 자동화 용접 로봇, 설비 자동화, 공정 및 공법 시뮬레이션 등에 관한 연구는 다수 수행되어 왔다 (Lee & Kim.`	`관한 연구가 문헌상으로 발표되지 않는 이유로는 각 조선소의 생산공정 상일정 오차 범위 내를 만족하는 조립 블록을 여러 개 모으는 탑재 작업에서 블록 조립에서 발생하는 오차를 없애는 후작업이 이루어지고 있기 때문으로 판단된다.`
`설마천 시험유역은 어디에 위치하는가?`	`설마천 시험유역은 한국건설기술연구원에서 1995년부터 운영해 온 유역으로 경기도 파주시 적성면 설마리와 구읍리에 위치하고 있다`	`설마천 시험유역은 1995년부터 한국건설기술연구원에서 운영하고 있는 유역으로 매우 작은 산지 유역임에도 우량관측소 6개소, 하천수위관측소 2개소(사방교 수위관측소는 2011년 7월 26일∼7월 28일 호우 시 유실), 지하수위관측소 2개소 및 자동기상관측소(AWS) 1개소를 운영하고 있으므로 비교적 많은 관측자료를 확보할 수 있어 향후 다양한 분석이 가능한 유역이다.`
`노인의 인지기능을 측정하기 위한 두 가지 도구는 무엇인가?`	`첫째 MMSE-K1이라는 도구를 사용하였다.`	`이는 연령과 교육기간에 따라 인지기능 검사 도구를 이용하여 치매, 인지수준을 검사할 때 여러 요인에 의해 영향을 받는데 연령, 지능, 교육수준, 성별 등이다. 권과 박(1989)은 정상노인에서 MMSE-K가 25,1± 3.9로 밝히고 있으나 본 연구 대상자는 이보다 높아 26.92±1.69를 나타내고 있다.`

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 1,
    "similarity_fct": "dot_score"
}

Training Hyperparameters

Non-Default Hyperparameters

per_device_train_batch_size: 64
per_device_eval_batch_size: 64
num_train_epochs: 10
fp16: True
multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: no
prediction_loss_only: True
per_device_train_batch_size: 64
per_device_eval_batch_size: 64
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 5e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1
num_train_epochs: 10
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.0
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: True
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: False
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: round_robin

Training Logs

Click to expand

Epoch	Step	Training Loss
0.0691	500	1.2496
0.1383	1000	0.5955
0.2074	1500	0.5039
0.2766	2000	0.4379
0.3457	2500	0.4055
0.4149	3000	0.3666
0.4840	3500	0.348
0.5532	4000	0.3099
0.6223	4500	0.2886
0.6915	5000	0.2592
0.7606	5500	0.2563
0.8298	6000	0.2291
0.8989	6500	0.2167
0.9681	7000	0.2001
1.0372	7500	0.1795
1.1063	8000	0.1616
1.1755	8500	0.1286
1.2446	9000	0.1195
1.3138	9500	0.1141
1.3829	10000	0.1067
1.4521	10500	0.1051
1.5212	11000	0.1007
1.5904	11500	0.0952
1.6595	12000	0.0861
1.7287	12500	0.0851
1.7978	13000	0.0883
1.8670	13500	0.0782
1.9361	14000	0.0776
2.0053	14500	0.0709
2.0744	15000	0.0702
2.1435	15500	0.0642
2.2127	16000	0.0548
2.2818	16500	0.0534
2.3510	17000	0.0539
2.4201	17500	0.0518
2.4893	18000	0.0544
2.5584	18500	0.0459
2.6276	19000	0.0476
2.6967	19500	0.0436
2.7659	20000	0.047
2.8350	20500	0.0403
2.9042	21000	0.0438
2.9733	21500	0.0439
3.0425	22000	0.035
3.1116	22500	0.0399
3.1807	23000	0.0335
3.2499	23500	0.0307
3.3190	24000	0.0334
3.3882	24500	0.032
3.4573	25000	0.0308
3.5265	25500	0.0282
3.5956	26000	0.0279
3.6648	26500	0.0281
3.7339	27000	0.0274
3.8031	27500	0.0284
3.8722	28000	0.0297
3.9414	28500	0.0289
4.0105	29000	0.0242
4.0797	29500	0.0246
4.1488	30000	0.0237
4.2180	30500	0.0211
4.2871	31000	0.0217
4.3562	31500	0.021
4.4254	32000	0.0206
4.4945	32500	0.0215
4.5637	33000	0.0176
4.6328	33500	0.0198
4.7020	34000	0.0194
4.7711	34500	0.0184
4.8403	35000	0.0189
4.9094	35500	0.0185
4.9786	36000	0.0184
5.0477	36500	0.0154
5.1169	37000	0.0168
5.1860	37500	0.0153
5.2552	38000	0.0156
5.3243	38500	0.0135
5.3934	39000	0.0126
5.4626	39500	0.0151
5.5317	40000	0.0139
5.6009	40500	0.0133
5.6700	41000	0.0119
5.7392	41500	0.0123
5.8083	42000	0.0138
5.8775	42500	0.0135
5.9466	43000	0.0112
6.0158	43500	0.011
6.0849	44000	0.0109
6.1541	44500	0.0116
6.2232	45000	0.0106
6.2924	45500	0.0109
6.3615	46000	0.01
6.4306	46500	0.0107
6.4998	47000	0.0095
6.5689	47500	0.0109
6.6381	48000	0.0092
6.7072	48500	0.0079
6.7764	49000	0.0102
6.8455	49500	0.0091
6.9147	50000	0.0091
6.9838	50500	0.0083
7.0530	51000	0.0072
7.1221	51500	0.0086
7.1913	52000	0.0087
7.2604	52500	0.0069
7.3296	53000	0.0084
7.3987	53500	0.0077
7.4678	54000	0.0078
7.5370	54500	0.0062
7.6061	55000	0.0064
7.6753	55500	0.006
7.7444	56000	0.0092
7.8136	56500	0.0063
7.8827	57000	0.007
7.9519	57500	0.005
8.0210	58000	0.0048
8.0902	58500	0.0062
8.1593	59000	0.0056
8.2285	59500	0.0045
8.2976	60000	0.0061
8.3668	60500	0.0066
8.4359	61000	0.0053
8.5050	61500	0.006
8.5742	62000	0.0059
8.6433	62500	0.0049
8.7125	63000	0.0038
8.7816	63500	0.0046
8.8508	64000	0.0045
8.9199	64500	0.0038
8.9891	65000	0.004
9.0582	65500	0.0036
9.1274	66000	0.0047
9.1965	66500	0.0039
9.2657	67000	0.0041
9.3348	67500	0.0038
9.4040	68000	0.0057
9.4731	68500	0.004
9.5422	69000	0.0044
9.6114	69500	0.0042
9.6805	70000	0.0038
9.7497	70500	0.0029
9.8188	71000	0.0035
9.8880	71500	0.0032
9.9571	72000	0.0037

Framework Versions

Python: 3.11.13
Sentence Transformers: 3.3.1
Transformers: 4.46.3
PyTorch: 2.5.1+cu124
Accelerate: 1.1.1
Datasets: 3.1.0
Tokenizers: 0.20.3

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}