SentenceTransformer based on sergeyzh/rubert-mini-frida

This is a sentence-transformers model finetuned from sergeyzh/rubert-mini-frida on the duplicates-checker-finetuning-preview dataset. It maps sentences & paragraphs to a 312-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: sergeyzh/rubert-mini-frida
Maximum Sequence Length: 2048 tokens
Output Dimensionality: 312 dimensions
Similarity Function: Cosine Similarity
Training Dataset:
- duplicates-checker-finetuning-preview

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 2048, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 312, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'USSD-команда для проверки баланса СберМобайл - *100#.',
    'Чтобы узнать баланс СберМобайл, наберите *100#.',
    'statement_statement',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 312]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Binary Classification

Datasets: binary-sts-validation and binary-sts-test
Evaluated with BinaryClassificationEvaluator

Metric	binary-sts-validation	binary-sts-test
cosine_accuracy	0.911	0.8926
cosine_accuracy_threshold	0.6444	0.7227
cosine_f1	0.9143	0.8932
cosine_f1_threshold	0.5794	0.7205
cosine_precision	0.8858	0.8881
cosine_recall	0.9448	0.8984
cosine_ap	0.9112	0.9168
cosine_mcc	0.8237	0.7853

Training Details

Training Dataset

duplicates-checker-finetuning-preview

Dataset: duplicates-checker-finetuning-preview
Size: 6,921 training samples
Columns: sentence1, sentence2, label, task_type, product, and stratify_col

Approximate statistics based on the first 1000 samples:

	sentence1	sentence2	label	task_type	product	stratify_col
type	string	string	int	string	string	string
details	min: 4 tokens mean: 17.09 tokens max: 36 tokens	min: 5 tokens mean: 17.8 tokens max: 37 tokens	0: ~49.60% 1: ~50.40%	min: 5 tokens mean: 5.67 tokens max: 7 tokens	min: 3 tokens mean: 5.4 tokens max: 9 tokens	min: 9 tokens mean: 12.08 tokens max: 17 tokens

Samples:

sentence1	sentence2	label	task_type	product	stratify_col
`Облигации Федерального Займа выпускает Министерство финансов РФ, а не Центральный Банк.`	`Облигации Федерального Займа выпускает Министерство финансов РФ, а не СберБанк.`	`0`	`correction_correction`	`Облигации`	`0_correction_correction_Облигации`
`Льгота на долгосрочное владение паями ОПИФ действует при владении более 3 лет, а не 1 года.`	`Лимит дохода для ЛДВ по ОПИФ составляет 3 млн рублей за каждый год владения, а не 1 млн.`	`0`	`correction_correction`	`Открытый паевой инвестиционный фонд`	`0_correction_correction_Открытый паевой инвестиционный фонд`
`Продажа паев ЗПИФ на бирже не требует поиска покупателя, в отличие от продажи по договору купли-продажи.`	`Потенциальный доход от фонда Современный 8 включает рентный доход и доход от роста стоимости, а не только рентный.`	`0`	`correction_correction`	`Закрытый паевой инвестиционный фонд`	`0_correction_correction_Закрытый паевой инвестиционный фонд`

Loss: CosineSimilarityLoss with these parameters:

{
    "loss_fct": "torch.nn.modules.loss.MSELoss"
}

Evaluation Dataset

duplicates-checker-finetuning-preview

Dataset: duplicates-checker-finetuning-preview
Size: 865 evaluation samples
Columns: sentence1, sentence2, label, task_type, product, and stratify_col

Approximate statistics based on the first 865 samples:

	sentence1	sentence2	label	task_type	product	stratify_col
type	string	string	int	string	string	string
details	min: 7 tokens mean: 17.05 tokens max: 36 tokens	min: 7 tokens mean: 17.79 tokens max: 33 tokens	0: ~49.71% 1: ~50.29%	min: 5 tokens mean: 5.69 tokens max: 7 tokens	min: 3 tokens mean: 5.41 tokens max: 9 tokens	min: 9 tokens mean: 12.1 tokens max: 17 tokens

Samples:

sentence1	sentence2	label	task_type	product	stratify_col
`Какой тариф Сбера подходит для начинающих инвесторов на ИИС-3?`	`Какой тарифный план Сбера рекомендован для новичков, использующих ИИС-3?`	`1`	`question_question`	`Индивидуальный инвестиционный счёт`	`1_question_question_Индивидуальный инвестиционный счёт`
`Какие типы кредитных карт Сбера вы предлагаете, и какие преимущества у каждой из них?`	`Расскажите о видах Кредитных СберКарт и их плюсах.`	`1`	`question_question`	`Кредитная СберКарта`	`1_question_question_Кредитная СберКарта`
`При отсутствии трудовой книжки стаж подтверждается справками из архива.`	`При отсутствии трудовой книжки стаж подтверждается устными показаниями свидетелей.`	`0`	`statement_statement`	`Перевод пенсии`	`0_statement_statement_Перевод пенсии`

Loss: CosineSimilarityLoss with these parameters:

{
    "loss_fct": "torch.nn.modules.loss.MSELoss"
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 16
per_device_eval_batch_size: 16
learning_rate: 9.98500910083967e-05
weight_decay: 0.27015230802651624
num_train_epochs: 25
warmup_ratio: 0.13341980194519668
load_best_model_at_end: True

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 16
per_device_eval_batch_size: 16
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 9.98500910083967e-05
weight_decay: 0.27015230802651624
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 25
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.13341980194519668
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: True
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional

Training Logs

Click to expand

Epoch	Step	Training Loss	Validation Loss	binary-sts-validation_cosine_ap	binary-sts-test_cosine_ap
0.2304	50	0.2346	-	-	-
0.4608	100	0.2214	0.2321	0.7873	-
0.6912	150	0.193	-	-	-
0.9217	200	0.1788	0.1722	0.8259	-
1.1521	250	0.1643	-	-	-
1.3825	300	0.1579	0.1469	0.8467	-
1.6129	350	0.1499	-	-	-
1.8433	400	0.1429	0.1371	0.8447	-
2.0737	450	0.1299	-	-	-
2.3041	500	0.1216	0.1261	0.8494	-
2.5346	550	0.121	-	-	-
2.7650	600	0.1099	0.1182	0.8761	-
2.9954	650	0.115	-	-	-
3.2258	700	0.0932	0.1114	0.8760	-
3.4562	750	0.0926	-	-	-
3.6866	800	0.0878	0.1068	0.8873	-
3.9171	850	0.0897	-	-	-
4.1475	900	0.0733	0.1013	0.9007	-
4.3779	950	0.069	-	-	-
4.6083	1000	0.0683	0.0987	0.8955	-
4.8387	1050	0.0706	-	-	-
5.0691	1100	0.0643	0.0962	0.8999	-
5.2995	1150	0.0541	-	-	-
5.5300	1200	0.0558	0.0933	0.9067	-
5.7604	1250	0.0572	-	-	-
5.9908	1300	0.0579	0.0928	0.9040	-
6.2212	1350	0.0434	-	-	-
6.4516	1400	0.047	0.0938	0.9049	-
6.6820	1450	0.0466	-	-	-
6.9124	1500	0.044	0.0917	0.9062	-
7.1429	1550	0.0395	-	-	-
7.3733	1600	0.0365	0.0876	0.9117	-
7.6037	1650	0.0368	-	-	-
7.8341	1700	0.0372	0.0882	0.9116	-
8.0645	1750	0.0393	-	-	-
8.2949	1800	0.0312	0.0856	0.9112	-
8.5253	1850	0.0315	-	-	-
8.7558	1900	0.0311	0.0860	0.9116	-
8.9862	1950	0.0341	-	-	-
9.2166	2000	0.0272	0.0850	0.9153	-
9.4470	2050	0.0272	-	-	-
9.6774	2100	0.0244	0.0876	0.9117	-
9.9078	2150	0.0284	-	-	-
10.1382	2200	0.0232	0.0860	0.9167	-
10.3687	2250	0.0253	-	-	-
10.5991	2300	0.0228	0.0856	0.9166	-
10.8295	2350	0.0224	-	-	-
11.0599	2400	0.0257	0.0856	0.9156	-
11.2903	2450	0.019	-	-	-
11.5207	2500	0.0187	0.0870	0.9129	-
11.7512	2550	0.0228	-	-	-
11.9816	2600	0.0214	0.0858	0.9173	-
12.2120	2650	0.0181	-	-	-
12.4424	2700	0.0197	0.0850	0.9249	-
12.6728	2750	0.0186	-	-	-
12.9032	2800	0.0174	0.0872	0.9233	-
13.1336	2850	0.0186	-	-	-
13.3641	2900	0.0132	0.0851	0.9280	-
13.5945	2950	0.0151	-	-	-
13.8249	3000	0.0184	0.0865	0.9210	-
14.0553	3050	0.0168	-	-	-
14.2857	3100	0.0136	0.0849	0.9252	-
14.5161	3150	0.0161	-	-	-
14.7465	3200	0.0157	0.0826	0.9318	-
14.9770	3250	0.0168	-	-	-
15.2074	3300	0.0134	0.0842	0.9302	-
15.4378	3350	0.0133	-	-	-
15.6682	3400	0.0129	0.0852	0.9263	-
15.8986	3450	0.0146	-	-	-
16.1290	3500	0.0121	0.0847	0.9274	-
16.3594	3550	0.0104	-	-	-
16.5899	3600	0.012	0.0840	0.9299	-
16.8203	3650	0.0119	-	-	-
17.0507	3700	0.0137	0.0852	0.9292	-
17.2811	3750	0.012	-	-	-
17.5115	3800	0.0118	0.0843	0.9281	-
17.7419	3850	0.0122	-	-	-
17.9724	3900	0.0106	0.0852	0.9280	-
18.2028	3950	0.0112	-	-	-
18.4332	4000	0.0099	0.0847	0.9311	-
18.6636	4050	0.0093	-	-	-
18.8940	4100	0.012	0.0860	0.9304	-
19.1244	4150	0.0107	-	-	-
19.3548	4200	0.0105	0.0852	0.9289	-
19.5853	4250	0.0092	-	-	-
19.8157	4300	0.0101	0.0860	0.9303	-
20.0461	4350	0.0099	-	-	-
20.2765	4400	0.01	0.0856	0.9319	-
20.5069	4450	0.0108	-	-	-
20.7373	4500	0.0084	0.0853	0.9301	-
20.9677	4550	0.0097	-	-	-
21.1982	4600	0.0071	0.0849	0.9308	-
21.4286	4650	0.0088	-	-	-
21.6590	4700	0.0094	0.0850	0.9310	-
21.8894	4750	0.0085	-	-	-
22.1198	4800	0.0099	0.0856	0.9304	-
22.3502	4850	0.0091	-	-	-
22.5806	4900	0.0086	0.0851	0.9309	-
22.8111	4950	0.0082	-	-	-
23.0415	5000	0.008	0.0857	0.9305	-
23.2719	5050	0.0084	-	-	-
23.5023	5100	0.0084	0.0855	0.9305	-
23.7327	5150	0.0078	-	-	-
23.9631	5200	0.0086	0.0857	0.9303	-
24.1935	5250	0.0082	-	-	-
24.4240	5300	0.0078	0.0855	0.9306	-
24.6544	5350	0.0077	-	-	-
24.8848	5400	0.0074	0.0855	0.9305	-
-1	-1	-	-	0.9112	0.9168

The bold row denotes the saved checkpoint.

Framework Versions

Python: 3.11.13
Sentence Transformers: 4.1.0
Transformers: 4.52.4
PyTorch: 2.6.0+cu124
Accelerate: 1.8.1
Datasets: 3.6.0
Tokenizers: 0.21.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}