CrossEncoder based on microsoft/MiniLM-L12-H384-uncased

This is a Cross Encoder model finetuned from microsoft/MiniLM-L12-H384-uncased on the msmarco dataset using the sentence-transformers library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.

Model Details

Model Description

  • Model Type: Cross Encoder
  • Base model: microsoft/MiniLM-L12-H384-uncased
  • Maximum Sequence Length: 512 tokens
  • Number of Output Labels: 1 label
  • Supported Modality: Text
  • Training Dataset:
    • msmarco

Model Sources

Full Model Architecture

CrossEncoder(
  (0): Transformer({'transformer_task': 'sequence-classification', 'modality_config': {'text': {'method': 'forward', 'method_output_name': 'logits'}}, 'module_output_name': 'scores', 'architecture': 'BertForSequenceClassification'})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import CrossEncoder

# Download from the 🤗 Hub
model = CrossEncoder("rorry-brenner/reranker-MiniLM-L12-H384-uncased-msmarco-bce")
# Get scores for pairs of inputs
pairs = [
    ['play stevie nicks', "Stevie Nicks. Stephanie Lynn Stevie Nicks is an American singer-songwriter. Often regarded as the Queen of Rock n' Roll, Nicks is best known for both her work as frontwoman of Fleetwood Mac and for her solo career. She is also known for her distinctive voice, mystical visual style, and symbolic lyrics."],
    ['average cost to replace bay window', 'Cost of Bay and Bow Windows. Bay windows are not cheap and the average price for a basic bay window can be anywhere from $1200 to $3000 for just the window. The cost of the bay window will depend on how large the window is and the materials that are used for creating the window frame.ow windows are great because they offer a bit more dimension and a unique look for any home. The average cost of a typical bow window is pretty much the same as bay windows or even slightly more $1400 to $3200.'],
    ['what is the indian subcontinent', 'The Indian subcontinent. The Indian subcontinent is a vast area the size of Europe, and is today divided into the separate countries of India, Pakistan and Bangladesh.Within the subcontinent itself, there is a wide variety of peoples, languages and religions.he Indian subcontinent. The Indian subcontinent is a vast area the size of Europe, and is today divided into the separate countries of India, Pakistan and Bangladesh.'],
    ['what are the various types of language', 'UML stands for Unified Modeling Language which is used in object oriented software engineering. Although typically used in software engineering it is a rich language that can be used to model an application structures, behavior and even business processes.There are 14 UML diagram types to help you model these behavior.ML stands for Unified Modeling Language which is used in object oriented software engineering. Although typically used in software engineering it is a rich language that can be used to model an application structures, behavior and even business processes.'],
    ['can i track my package with a order number', 'Track Your UPS Package By Your Order Number. This order tracker will only track UPS shipments. If your order was shipped using USPS, please visit USPS.com and enter in the tracking number e-mailed to you.rder Number: If you placed order on-line, your order number is the 6 digits order number you received in your e-mail.'],
]
scores = model.predict(pairs)
print(scores)
# [0.974  0.8187 0.9724 0.1225 0.9689]

# Or rank different texts based on similarity to a single text
ranks = model.rank(
    'play stevie nicks',
    [
        "Stevie Nicks. Stephanie Lynn Stevie Nicks is an American singer-songwriter. Often regarded as the Queen of Rock n' Roll, Nicks is best known for both her work as frontwoman of Fleetwood Mac and for her solo career. She is also known for her distinctive voice, mystical visual style, and symbolic lyrics.",
        'Cost of Bay and Bow Windows. Bay windows are not cheap and the average price for a basic bay window can be anywhere from $1200 to $3000 for just the window. The cost of the bay window will depend on how large the window is and the materials that are used for creating the window frame.ow windows are great because they offer a bit more dimension and a unique look for any home. The average cost of a typical bow window is pretty much the same as bay windows or even slightly more $1400 to $3200.',
        'The Indian subcontinent. The Indian subcontinent is a vast area the size of Europe, and is today divided into the separate countries of India, Pakistan and Bangladesh.Within the subcontinent itself, there is a wide variety of peoples, languages and religions.he Indian subcontinent. The Indian subcontinent is a vast area the size of Europe, and is today divided into the separate countries of India, Pakistan and Bangladesh.',
        'UML stands for Unified Modeling Language which is used in object oriented software engineering. Although typically used in software engineering it is a rich language that can be used to model an application structures, behavior and even business processes.There are 14 UML diagram types to help you model these behavior.ML stands for Unified Modeling Language which is used in object oriented software engineering. Although typically used in software engineering it is a rich language that can be used to model an application structures, behavior and even business processes.',
        'Track Your UPS Package By Your Order Number. This order tracker will only track UPS shipments. If your order was shipped using USPS, please visit USPS.com and enter in the tracking number e-mailed to you.rder Number: If you placed order on-line, your order number is the 6 digits order number you received in your e-mail.',
    ]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]

Evaluation

Metrics

Cross Encoder Reranking

  • Datasets: NanoMSMARCO_R100, NanoNFCorpus_R100 and NanoNQ_R100
  • Evaluated with CrossEncoderRerankingEvaluator with these parameters:
    {
        "at_k": 10,
        "always_rerank_positives": true
    }
    
Metric NanoMSMARCO_R100 NanoNFCorpus_R100 NanoNQ_R100
map 0.5128 (+0.0232) 0.3559 (+0.0949) 0.6672 (+0.2476)
mrr@10 0.5046 (+0.0271) 0.6367 (+0.1369) 0.6882 (+0.2615)
ndcg@10 0.6016 (+0.0612) 0.4343 (+0.1092) 0.7187 (+0.2181)

Cross Encoder Nano BEIR

  • Dataset: NanoBEIR_R100_mean
  • Evaluated with CrossEncoderNanoBEIREvaluator with these parameters:
    {
        "dataset_names": [
            "msmarco",
            "nfcorpus",
            "nq"
        ],
        "dataset_id": "sentence-transformers/NanoBEIR-en",
        "rerank_k": 100,
        "at_k": 10,
        "always_rerank_positives": true
    }
    
Metric Value
map 0.5120 (+0.1219)
mrr@10 0.6098 (+0.1418)
ndcg@10 0.5849 (+0.1295)

Training Details

Training Dataset

msmarco

  • Dataset: msmarco
  • Size: 1,990,000 training samples
  • Columns: query, passage, and score
  • Approximate statistics based on the first 100 samples:
    query passage score
    type string string float
    modality text text
    details
    • min: 5 tokens
    • mean: 9.36 tokens
    • max: 29 tokens
    • min: 25 tokens
    • mean: 80.91 tokens
    • max: 162 tokens
    • min: 0.0
    • mean: 0.45
    • max: 1.0
  • Samples:
    query passage score
    does azithromycin have sulfa in it 1 They are found in antibiotic medications, as they treat bacteria…. 2 Allergies Related to Sulfa If you have allergies to sulfa there are many things that should be avoided. 1.0
    pediatrician average salary $37K Jones International University's Average Admissions Counselor Salary (20 salaries) +$1K (2%) more than national average Admissions Counselor salary ($36K) -$4K (10%) less than average Jones International University salary ($41K) 0.0
    what are chrysler brands In the event of a lost or stolen key, contact a certified Chrysler dealer, ... How to Replace Chrysler Car Keys. Chrysler... 1 How to Replace the Key in the Chrysler Crossfire Key FOB.n the event of a lost or stolen key, contact a certified Chrysler dealer, ... How to Replace Chrysler Car Keys. Chrysler... 1 How to Replace the Key in the Chrysler Crossfire Key FOB. 0.0
  • Loss: BinaryCrossEntropyLoss with these parameters:
    {
        "activation_fn": "torch.nn.modules.linear.Identity",
        "pos_weight": null
    }
    

Evaluation Dataset

msmarco

  • Dataset: msmarco
  • Size: 10,000 evaluation samples
  • Columns: query, passage, and score
  • Approximate statistics based on the first 100 samples:
    query passage score
    type string string float
    modality text text
    details
    • min: 5 tokens
    • mean: 8.6 tokens
    • max: 22 tokens
    • min: 26 tokens
    • mean: 81.51 tokens
    • max: 263 tokens
    • min: 0.0
    • mean: 0.53
    • max: 1.0
  • Samples:
    query passage score
    play stevie nicks Stevie Nicks. Stephanie Lynn Stevie Nicks is an American singer-songwriter. Often regarded as the Queen of Rock n' Roll, Nicks is best known for both her work as frontwoman of Fleetwood Mac and for her solo career. She is also known for her distinctive voice, mystical visual style, and symbolic lyrics. 1.0
    average cost to replace bay window Cost of Bay and Bow Windows. Bay windows are not cheap and the average price for a basic bay window can be anywhere from $1200 to $3000 for just the window. The cost of the bay window will depend on how large the window is and the materials that are used for creating the window frame.ow windows are great because they offer a bit more dimension and a unique look for any home. The average cost of a typical bow window is pretty much the same as bay windows or even slightly more $1400 to $3200. 1.0
    what is the indian subcontinent The Indian subcontinent. The Indian subcontinent is a vast area the size of Europe, and is today divided into the separate countries of India, Pakistan and Bangladesh.Within the subcontinent itself, there is a wide variety of peoples, languages and religions.he Indian subcontinent. The Indian subcontinent is a vast area the size of Europe, and is today divided into the separate countries of India, Pakistan and Bangladesh. 1.0
  • Loss: BinaryCrossEntropyLoss with these parameters:
    {
        "activation_fn": "torch.nn.modules.linear.Identity",
        "pos_weight": null
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 16
  • num_train_epochs: 1
  • learning_rate: 2e-05
  • warmup_steps: 0.1
  • bf16: True
  • per_device_eval_batch_size: 16
  • load_best_model_at_end: True
  • seed: 12
  • dataloader_num_workers: 4

All Hyperparameters

Click to expand
  • per_device_train_batch_size: 16
  • num_train_epochs: 1
  • max_steps: -1
  • learning_rate: 2e-05
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: None
  • warmup_steps: 0.1
  • optim: adamw_torch_fused
  • optim_args: None
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • optim_target_modules: None
  • gradient_accumulation_steps: 1
  • average_tokens_across_devices: True
  • max_grad_norm: 1.0
  • label_smoothing_factor: 0.0
  • bf16: True
  • fp16: False
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • use_liger_kernel: False
  • liger_kernel_config: None
  • use_cache: False
  • neftune_noise_alpha: None
  • torch_empty_cache_steps: None
  • auto_find_batch_size: False
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • include_num_input_tokens_seen: no
  • log_level: passive
  • log_level_replica: warning
  • disable_tqdm: False
  • project: huggingface
  • trackio_space_id: None
  • trackio_bucket_id: None
  • trackio_static_space_id: None
  • per_device_eval_batch_size: 16
  • prediction_loss_only: True
  • eval_on_start: False
  • eval_do_concat_batches: True
  • eval_use_gather_object: False
  • eval_accumulation_steps: None
  • include_for_metrics: []
  • batch_eval_metrics: False
  • save_only_model: False
  • save_on_each_node: False
  • enable_jit_checkpoint: False
  • push_to_hub: False
  • hub_private_repo: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_always_push: False
  • hub_revision: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • restore_callback_states_from_checkpoint: False
  • full_determinism: False
  • seed: 12
  • data_seed: None
  • use_cpu: False
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • dataloader_drop_last: False
  • dataloader_num_workers: 4
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • dataloader_prefetch_factor: None
  • remove_unused_columns: True
  • label_names: None
  • train_sampling_strategy: random
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • ddp_static_graph: None
  • ddp_backend: None
  • ddp_timeout: 1800
  • fsdp: []
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • deepspeed: None
  • debug: []
  • skip_memory_metrics: True
  • do_predict: False
  • resume_from_checkpoint: None
  • warmup_ratio: None
  • local_rank: -1
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss Validation Loss NanoMSMARCO_R100_ndcg@10 NanoNFCorpus_R100_ndcg@10 NanoNQ_R100_ndcg@10 NanoBEIR_R100_mean_ndcg@10
-1 -1 - - 0.0300 (-0.5104) 0.2528 (-0.0723) 0.0168 (-0.4839) 0.0999 (-0.3555)
0.0000 1 0.6905 - - - - -
0.0322 4000 0.4362 - - - - -
0.0643 8000 0.2573 - - - - -
0.0804 10000 - 0.2379 0.5997 (+0.0593) 0.4138 (+0.0887) 0.6684 (+0.1677) 0.5606 (+0.1053)
0.0965 12000 0.2365 - - - - -
0.1286 16000 0.2301 - - - - -
0.1608 20000 0.2201 0.2005 0.6586 (+0.1182) 0.3775 (+0.0524) 0.6952 (+0.1945) 0.5771 (+0.1217)
0.1930 24000 0.2163 - - - - -
0.2251 28000 0.2059 - - - - -
0.2412 30000 - 0.1832 0.6016 (+0.0612) 0.4343 (+0.1092) 0.7187 (+0.2181) 0.5849 (+0.1295)
0.2573 32000 0.2023 - - - - -
0.2894 36000 0.2022 - - - - -
0.3216 40000 0.1983 0.1821 0.6194 (+0.0790) 0.3857 (+0.0606) 0.6927 (+0.1921) 0.5659 (+0.1106)
0.3538 44000 0.1959 - - - - -
0.3859 48000 0.1926 - - - - -
0.4020 50000 - 0.1949 0.6366 (+0.0962) 0.3895 (+0.0644) 0.6998 (+0.1991) 0.5753 (+0.1199)
0.4181 52000 0.1906 - - - - -
0.4503 56000 0.1904 - - - - -
0.4824 60000 0.1810 0.1785 0.6540 (+0.1135) 0.3851 (+0.0601) 0.6875 (+0.1868) 0.5755 (+0.1201)
0.5146 64000 0.1842 - - - - -
0.5467 68000 0.1845 - - - - -
0.5628 70000 - 0.1726 0.6260 (+0.0856) 0.3718 (+0.0468) 0.7048 (+0.2041) 0.5675 (+0.1122)
0.5789 72000 0.1806 - - - - -
0.6111 76000 0.1765 - - - - -
0.6432 80000 0.1743 0.1791 0.6537 (+0.1132) 0.3686 (+0.0436) 0.7080 (+0.2074) 0.5768 (+0.1214)
0.6754 84000 0.1739 - - - - -
0.7075 88000 0.1717 - - - - -
0.7236 90000 - 0.1842 0.6470 (+0.1066) 0.3684 (+0.0433) 0.7055 (+0.2049) 0.5736 (+0.1183)
0.7397 92000 0.1696 - - - - -
0.7719 96000 0.1657 - - - - -
0.8040 100000 0.1671 0.1739 0.6541 (+0.1137) 0.3770 (+0.0520) 0.7131 (+0.2125) 0.5814 (+0.1260)
0.8362 104000 0.1675 - - - - -
0.8683 108000 0.1662 - - - - -
0.8844 110000 - 0.1672 0.6522 (+0.1118) 0.3750 (+0.0500) 0.7023 (+0.2016) 0.5765 (+0.1211)
0.9005 112000 0.1627 - - - - -
0.9327 116000 0.1642 - - - - -
0.9648 120000 0.1644 0.1675 0.6528 (+0.1124) 0.3671 (+0.0421) 0.7161 (+0.2154) 0.5787 (+0.1233)
0.9970 124000 0.1639 - - - - -
1.0 124375 - 0.1652 0.6624 (+0.1220) 0.3655 (+0.0405) 0.7266 (+0.2260) 0.5848 (+0.1295)
-1 -1 - - 0.6016 (+0.0612) 0.4343 (+0.1092) 0.7187 (+0.2181) 0.5849 (+0.1295)
  • The bold row denotes the saved checkpoint.

Training Time

  • Training: 45.3 minutes
  • Evaluation: 3.2 minutes
  • Total: 48.5 minutes

Framework Versions

  • Python: 3.12.3
  • Sentence Transformers: 5.5.0
  • Transformers: 5.8.0
  • PyTorch: 2.9.1+cu128
  • Accelerate: 1.13.0
  • Datasets: 4.8.5
  • Tokenizers: 0.22.2

Additional Resources

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
18
Safetensors
Model size
33.4M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rorry-brenner/reranker-MiniLM-L12-H384-uncased-msmarco-bce

Finetuned
(128)
this model

Paper for rorry-brenner/reranker-MiniLM-L12-H384-uncased-msmarco-bce

Evaluation results