SentenceTransformer

This is a sentence-transformers model trained on the csv dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Maximum Sequence Length: 8194 tokens
Output Dimensionality: 1024 dimensions
Similarity Function: Cosine Similarity
Training Dataset:
- csv

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (transformer): Transformer(
    (auto_model): XLMRobertaLoRA(
      (roberta): XLMRobertaModel(
        (embeddings): XLMRobertaEmbeddings(
          (word_embeddings): ParametrizedEmbedding(
            250002, 1024, padding_idx=1
            (parametrizations): ModuleDict(
              (weight): ParametrizationList(
                (0): LoRAParametrization()
              )
            )
          )
          (token_type_embeddings): ParametrizedEmbedding(
            1, 1024
            (parametrizations): ModuleDict(
              (weight): ParametrizationList(
                (0): LoRAParametrization()
              )
            )
          )
        )
        (emb_drop): Dropout(p=0.1, inplace=False)
        (emb_ln): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (encoder): XLMRobertaEncoder(
          (layers): ModuleList(
            (0-23): 24 x Block(
              (mixer): MHA(
                (rotary_emb): RotaryEmbedding()
                (Wqkv): ParametrizedLinearResidual(
                  in_features=1024, out_features=3072, bias=True
                  (parametrizations): ModuleDict(
                    (weight): ParametrizationList(
                      (0): LoRAParametrization()
                    )
                  )
                )
                (inner_attn): FlashSelfAttention(
                  (drop): Dropout(p=0.1, inplace=False)
                )
                (inner_cross_attn): FlashCrossAttention(
                  (drop): Dropout(p=0.1, inplace=False)
                )
                (out_proj): ParametrizedLinear(
                  in_features=1024, out_features=1024, bias=True
                  (parametrizations): ModuleDict(
                    (weight): ParametrizationList(
                      (0): LoRAParametrization()
                    )
                  )
                )
              )
              (dropout1): Dropout(p=0.1, inplace=False)
              (drop_path1): StochasticDepth(p=0.0, mode=row)
              (norm1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): ParametrizedLinear(
                  in_features=1024, out_features=4096, bias=True
                  (parametrizations): ModuleDict(
                    (weight): ParametrizationList(
                      (0): LoRAParametrization()
                    )
                  )
                )
                (fc2): ParametrizedLinear(
                  in_features=4096, out_features=1024, bias=True
                  (parametrizations): ModuleDict(
                    (weight): ParametrizationList(
                      (0): LoRAParametrization()
                    )
                  )
                )
              )
              (dropout2): Dropout(p=0.1, inplace=False)
              (drop_path2): StochasticDepth(p=0.0, mode=row)
              (norm2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
            )
          )
        )
        (pooler): XLMRobertaPooler(
          (dense): ParametrizedLinear(
            in_features=1024, out_features=1024, bias=True
            (parametrizations): ModuleDict(
              (weight): ParametrizationList(
                (0): LoRAParametrization()
              )
            )
          )
          (activation): Tanh()
        )
      )
    )
  )
  (pooler): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (normalizer): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("damon6/de_shop_api_v3_jina-embeddings-v3-base-finetuned")
# Run inference
sentences = [
    'DIGITUS Mini GBIC SFP Modul 10G Leistung',
    'Die DIGITUS 10G Mini GBIC SFP Transceiver Module bieten hohe Qualität und Zuverlässigkeit.',
    'eine 325 mm lange GPU',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Dataset: dim_1024
Evaluated with InformationRetrievalEvaluator with these parameters:
```
{
    "truncate_dim": 1024
}
```

Metric	Value
cosine_accuracy@1	0.5348
cosine_accuracy@3	0.7476
cosine_accuracy@5	0.8063
cosine_accuracy@10	0.8881
cosine_precision@1	0.5348
cosine_precision@3	0.2492
cosine_precision@5	0.1613
cosine_precision@10	0.0888
cosine_recall@1	0.5348
cosine_recall@3	0.7476
cosine_recall@5	0.8063
cosine_recall@10	0.8881
cosine_ndcg@10	0.7126
cosine_mrr@10	0.6564
cosine_map@100	0.6597

Information Retrieval

Dataset: dim_768
Evaluated with InformationRetrievalEvaluator with these parameters:
```
{
    "truncate_dim": 768
}
```

Metric	Value
cosine_accuracy@1	0.5307
cosine_accuracy@3	0.7381
cosine_accuracy@5	0.8145
cosine_accuracy@10	0.884
cosine_precision@1	0.5307
cosine_precision@3	0.246
cosine_precision@5	0.1629
cosine_precision@10	0.0884
cosine_recall@1	0.5307
cosine_recall@3	0.7381
cosine_recall@5	0.8145
cosine_recall@10	0.884
cosine_ndcg@10	0.7091
cosine_mrr@10	0.6529
cosine_map@100	0.6566

Training Details

Training Dataset

csv

Dataset: csv
Size: 6,592 training samples
Columns: anchor and positive
Approximate statistics based on the first 1000 samples:
anchor positive
type string string
details
min: 8 tokens
mean: 18.13 tokens
max: 47 tokens

min: 4 tokens
mean: 35.75 tokens
max: 565 tokens

	anchor	positive
type	string	string
details	min: 8 tokens mean: 18.13 tokens max: 47 tokens	min: 4 tokens mean: 35.75 tokens max: 565 tokens

Samples:

anchor	positive
`Poly Studio X30 Halterung VESA Wandmontage`	`Poly Studio X30 VESA and Wall Mount.`
`ALOGIC Elements Pro USB-C zu USB-A Kabel`	`Das ALOGIC Elements Pro USB-C zu USB-A Kabel ermöglicht es Ihnen, die neuesten USB-C Geräte wie Telefone, Tablets und Laptops mit Ihrem kompatiblen Zubehör oder Peripheriegerät zu verbinden.`
`Equip VGA Splitter Signal-Bandbreite 450MHz`	`Der Video Splitter bietet eine Signal-Bandbreite von 450MHz.`

Loss: MatryoshkaLoss with these parameters:

{
    "loss": "MultipleNegativesRankingLoss",
    "matryoshka_dims": [
        1024,
        768
    ],
    "matryoshka_weights": [
        1,
        1
    ],
    "n_dims_per_step": -1
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: epoch
per_device_train_batch_size: 64
per_device_eval_batch_size: 16
gradient_accumulation_steps: 4
learning_rate: 2e-05
num_train_epochs: 4
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: True
tf32: True
load_best_model_at_end: True
optim: adamw_torch_fused
batch_sampler: no_duplicates

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: epoch
prediction_loss_only: True
per_device_train_batch_size: 64
per_device_eval_batch_size: 16
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 4
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 2e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 4
max_steps: -1
lr_scheduler_type: cosine
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: True
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: True
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: True
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
tp_size: 0
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch_fused
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: no_duplicates
multi_dataset_batch_sampler: proportional

Training Logs

Epoch	Step	Training Loss	dim_1024_cosine_ndcg@10	dim_768_cosine_ndcg@10
0.0388	1	2.9827	-	-
0.0777	2	3.3738	-	-
0.1165	3	3.7603	-	-
0.1553	4	3.826	-	-
0.1942	5	3.7338	-	-
0.2330	6	3.3327	-	-
0.2718	7	3.0444	-	-
0.3107	8	2.2803	-	-
0.3495	9	3.3083	-	-
0.3883	10	2.9835	-	-
0.4272	11	2.4352	-	-
0.4660	12	2.1565	-	-
0.5049	13	2.6124	-	-
0.5437	14	2.264	-	-
0.5825	15	1.9145	-	-
0.6214	16	1.8587	-	-
0.6602	17	1.4001	-	-
0.6990	18	1.8256	-	-
0.7379	19	1.1961	-	-
0.7767	20	1.3109	-	-
0.8155	21	1.5597	-	-
0.8544	22	1.4735	-	-
0.8932	23	1.0223	-	-
0.9320	24	1.1257	-	-
0.9709	25	1.3598	-	-
1.0	26	1.1203	-	-
1.0097	27	0.0	0.6978	0.6932
1.0388	28	0.7806	-	-
1.0777	29	1.3211	-	-
1.1165	30	1.4871	-	-
1.1553	31	0.935	-	-
1.1942	32	1.7934	-	-
1.2330	33	1.1227	-	-
1.2718	34	1.3105	-	-
1.3107	35	1.103	-	-
1.3495	36	1.3717	-	-
1.3883	37	0.9901	-	-
1.4272	38	1.3036	-	-
1.4660	39	1.2308	-	-
1.5049	40	1.2515	-	-
1.5437	41	1.1814	-	-
1.5825	42	1.2111	-	-
1.6214	43	0.9332	-	-
1.6602	44	1.3395	-	-
1.6990	45	0.7583	-	-
1.7379	46	1.3086	-	-
1.7767	47	0.9326	-	-
1.8155	48	0.9746	-	-
1.8544	49	0.6618	-	-
1.8932	50	0.7228	-	-
1.9320	51	0.7546	-	-
1.9709	52	1.0044	-	-
2.0	53	0.6009	-	-
2.0097	54	0.0467	0.7122	0.7100
2.0388	55	0.9867	-	-
2.0777	56	0.9411	-	-
2.1165	57	0.8141	-	-
2.1553	58	0.743	-	-
2.1942	59	1.0353	-	-
2.2330	60	1.2375	-	-
2.2718	61	0.9801	-	-
2.3107	62	1.2372	-	-
2.3495	63	0.8672	-	-
2.3883	64	1.0209	-	-
2.4272	65	0.8059	-	-
2.4660	66	0.8108	-	-
2.5049	67	1.1173	-	-
2.5437	68	1.2396	-	-
2.5825	69	0.7141	-	-
2.6214	70	0.9623	-	-
2.6602	71	0.7726	-	-
2.6990	72	1.0766	-	-
2.7379	73	0.8263	-	-
2.7767	74	0.8879	-	-
2.8155	75	1.5984	-	-
2.8544	76	1.0657	-	-
2.8932	77	1.1301	-	-
2.9320	78	0.8932	-	-
2.9709	79	1.0989	-	-
3.0	80	0.7175	-	-
3.0097	81	0.0	0.7123	0.7106
3.0388	82	0.9822	-	-
3.0777	83	0.9128	-	-
3.1165	84	0.8309	-	-
3.1553	85	0.8732	-	-
3.1942	86	1.004	-	-
3.2330	87	0.8509	-	-
3.2718	88	1.3577	-	-
3.3107	89	1.3243	-	-
3.3495	90	0.7953	-	-
3.3883	91	1.0733	-	-
3.4272	92	0.821	-	-
3.4660	93	1.1915	-	-
3.5049	94	1.1763	-	-
3.5437	95	0.9508	-	-
3.5825	96	0.6898	-	-
3.6214	97	0.7401	-	-
3.6602	98	1.1549	-	-
3.6990	99	1.1053	-	-
3.7379	100	0.7245	0.7126	0.7091

Framework Versions

Python: 3.10.16
Sentence Transformers: 4.1.0
Transformers: 4.51.3
PyTorch: 2.6.0+cu124
Accelerate: 1.7.0
Datasets: 3.6.0
Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Downloads last month: 5

Safetensors

Model size

0.6B params

Tensor type

BF16

Papers for damon6/de_shop_api_v3_jina-embeddings-v3-base-finetuned

Evaluation results

Cosine Accuracy@1 on dim 1024
self-reported

0.535
Cosine Accuracy@3 on dim 1024
self-reported

0.748
Cosine Accuracy@5 on dim 1024
self-reported

0.806
Cosine Accuracy@10 on dim 1024
self-reported

0.888
Cosine Precision@1 on dim 1024
self-reported

0.535
Cosine Precision@3 on dim 1024
self-reported

0.249
Cosine Precision@5 on dim 1024
self-reported

0.161
Cosine Precision@10 on dim 1024
self-reported

0.089
Cosine Recall@1 on dim 1024
self-reported

0.535
Cosine Recall@3 on dim 1024
self-reported

0.748
Cosine Recall@5 on dim 1024
self-reported

0.806
Cosine Recall@10 on dim 1024
self-reported

0.888
Cosine Ndcg@10 on dim 1024
self-reported

0.713
Cosine Mrr@10 on dim 1024
self-reported

0.656
Cosine Map@100 on dim 1024
self-reported

0.660
Cosine Accuracy@1 on dim 768
self-reported

0.531
Cosine Accuracy@3 on dim 768
self-reported

0.738
Cosine Accuracy@5 on dim 768
self-reported

0.814
Cosine Accuracy@10 on dim 768
self-reported

0.884
Cosine Precision@1 on dim 768
self-reported

0.531