ModernBERT-base trained on GooAQ

This is a Cross Encoder model finetuned from svalabs/cross-electra-ms-marco-german-uncased using the sentence-transformers library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.

Model Details

Model Description

Model Sources

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import CrossEncoder

# Download from the 🤗 Hub
model = CrossEncoder("damon6/reranker-cross-electra-ms-marco-german-uncased-shop_api_v3-bce")
# Get scores for pairs of texts
pairs = [
    ['HPE ANW Networks Startup SVC U4832E Merkmal', 'HPE Care Pack Services sind leicht zu erwerben und zeichnen sich durch hohe Benutzerfreundlichkeit aus.'],
    ['HPE ANW Networks Startup SVC U4832E Merkmal', 'Dieses USB 2.0 Kabel von Delock dient zum Anschluss von verschiedenen USB Geräten, wie z. B. Drucker oder Scanner, an einen freien USB Port.'],
    ['HPE ANW Networks Startup SVC U4832E Merkmal', 'HPE ANW FC 1Y NBD EXCH 7220DC Contr SVC H3FQ1E.'],
    ['HPE ANW Networks Startup SVC U4832E Merkmal', 'Die SanDisk Extreme PRO Portable SSD ist eine robuste, zuverlässige Speicherlösung mit hoher SSD-Performance aus dem Hause SanDisk - der Marke, der professionelle Fotografen aus aller Welt vertrauen.'],
    ['HPE ANW Networks Startup SVC U4832E Merkmal', 'Farbe und schwarze Texte'],
]
scores = model.predict(pairs)
print(scores.shape)
# (5,)

# Or rank different texts based on similarity to a single text
ranks = model.rank(
    'HPE ANW Networks Startup SVC U4832E Merkmal',
    [
        'HPE Care Pack Services sind leicht zu erwerben und zeichnen sich durch hohe Benutzerfreundlichkeit aus.',
        'Dieses USB 2.0 Kabel von Delock dient zum Anschluss von verschiedenen USB Geräten, wie z. B. Drucker oder Scanner, an einen freien USB Port.',
        'HPE ANW FC 1Y NBD EXCH 7220DC Contr SVC H3FQ1E.',
        'Die SanDisk Extreme PRO Portable SSD ist eine robuste, zuverlässige Speicherlösung mit hoher SSD-Performance aus dem Hause SanDisk - der Marke, der professionelle Fotografen aus aller Welt vertrauen.',
        'Farbe und schwarze Texte',
    ]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]

Evaluation

Metrics

Cross Encoder Reranking

Metric Value
map 0.9977 (+0.9975)
mrr@10 0.9977 (+0.9975)
ndcg@10 0.9983 (+0.9979)

Training Details

Training Dataset

Unnamed Dataset

  • Size: 38,267 training samples
  • Columns: anchor, positive, and label
  • Approximate statistics based on the first 1000 samples:
    anchor positive label
    type string string int
    details
    • min: 27 characters
    • mean: 62.01 characters
    • max: 149 characters
    • min: 5 characters
    • mean: 123.87 characters
    • max: 2044 characters
    • 0: ~83.00%
    • 1: ~17.00%
  • Samples:
    anchor positive label
    HPE ANW Networks Startup SVC U4832E Merkmal HPE Care Pack Services sind leicht zu erwerben und zeichnen sich durch hohe Benutzerfreundlichkeit aus. 1
    HPE ANW Networks Startup SVC U4832E Merkmal Dieses USB 2.0 Kabel von Delock dient zum Anschluss von verschiedenen USB Geräten, wie z. B. Drucker oder Scanner, an einen freien USB Port. 0
    HPE ANW Networks Startup SVC U4832E Merkmal HPE ANW FC 1Y NBD EXCH 7220DC Contr SVC H3FQ1E. 0
  • Loss: BinaryCrossEntropyLoss with these parameters:
    {
        "activation_fn": "torch.nn.modules.linear.Identity",
        "pos_weight": 5
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • learning_rate: 2e-05
  • num_train_epochs: 2
  • warmup_ratio: 0.1
  • seed: 12
  • bf16: True
  • dataloader_num_workers: 4
  • load_best_model_at_end: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 2
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 12
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 4
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss shop_api_v3_ndcg@10
-1 -1 - 0.9694 (+0.9689)
0.0004 1 0.2568 -
0.0836 200 0.3495 -
0.1672 400 0.1825 -
0.2508 600 0.163 -
0.3344 800 0.1344 -
0.4181 1000 0.145 0.9963 (+0.9958)
0.5017 1200 0.1787 -
0.5853 1400 0.1644 -
0.6689 1600 0.1566 -
0.7525 1800 0.1058 -
0.8361 2000 0.1154 0.9981 (+0.9977)
0.9197 2200 0.1144 -
1.0033 2400 0.1295 -
1.0870 2600 0.0308 -
1.1706 2800 0.0331 -
1.2542 3000 0.0374 0.9973 (+0.9968)
1.3378 3200 0.0498 -
1.4214 3400 0.0655 -
1.5050 3600 0.0545 -
1.5886 3800 0.0486 -
1.6722 4000 0.018 0.9983 (+0.9979)
1.7559 4200 0.0424 -
1.8395 4400 0.0269 -
1.9231 4600 0.0572 -
-1 -1 - 0.9983 (+0.9979)
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.16
  • Sentence Transformers: 4.1.0
  • Transformers: 4.51.3
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.7.0
  • Datasets: 3.6.0
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
3
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for damon6/reranker-cross-electra-ms-marco-german-uncased-shop_api_v3-bce

Finetuned
(1)
this model

Paper for damon6/reranker-cross-electra-ms-marco-german-uncased-shop_api_v3-bce

Evaluation results