CrossEncoder based on microsoft/MiniLM-L12-H384-uncased

This is a Cross Encoder model finetuned from microsoft/MiniLM-L12-H384-uncased on the msmarco dataset using the sentence-transformers library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.

Model Details

Model Description

Model Type: Cross Encoder
Base model: microsoft/MiniLM-L12-H384-uncased
Maximum Sequence Length: 512 tokens
Number of Output Labels: 1 label
Supported Modality: Text
Training Dataset:
- msmarco

Model Sources

Documentation: Sentence Transformers Documentation
Documentation: Cross Encoder Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Cross Encoders on Hugging Face

Full Model Architecture

CrossEncoder(
  (0): Transformer({'transformer_task': 'sequence-classification', 'modality_config': {'text': {'method': 'forward', 'method_output_name': 'logits'}}, 'module_output_name': 'scores', 'architecture': 'BertForSequenceClassification'})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import CrossEncoder

# Download from the 🤗 Hub
model = CrossEncoder("rorry-brenner/reranker-MiniLM-L12-H384-uncased-msmarco-bce")
# Get scores for pairs of inputs
pairs = [
    ['play stevie nicks', "Stevie Nicks. Stephanie Lynn Stevie Nicks is an American singer-songwriter. Often regarded as the Queen of Rock n' Roll, Nicks is best known for both her work as frontwoman of Fleetwood Mac and for her solo career. She is also known for her distinctive voice, mystical visual style, and symbolic lyrics."],
    ['average cost to replace bay window', 'Cost of Bay and Bow Windows. Bay windows are not cheap and the average price for a basic bay window can be anywhere from $1200 to $3000 for just the window. The cost of the bay window will depend on how large the window is and the materials that are used for creating the window frame.ow windows are great because they offer a bit more dimension and a unique look for any home. The average cost of a typical bow window is pretty much the same as bay windows or even slightly more $1400 to $3200.'],
    ['what is the indian subcontinent', 'The Indian subcontinent. The Indian subcontinent is a vast area the size of Europe, and is today divided into the separate countries of India, Pakistan and Bangladesh.Within the subcontinent itself, there is a wide variety of peoples, languages and religions.he Indian subcontinent. The Indian subcontinent is a vast area the size of Europe, and is today divided into the separate countries of India, Pakistan and Bangladesh.'],
    ['what are the various types of language', 'UML stands for Unified Modeling Language which is used in object oriented software engineering. Although typically used in software engineering it is a rich language that can be used to model an application structures, behavior and even business processes.There are 14 UML diagram types to help you model these behavior.ML stands for Unified Modeling Language which is used in object oriented software engineering. Although typically used in software engineering it is a rich language that can be used to model an application structures, behavior and even business processes.'],
    ['can i track my package with a order number', 'Track Your UPS Package By Your Order Number. This order tracker will only track UPS shipments. If your order was shipped using USPS, please visit USPS.com and enter in the tracking number e-mailed to you.rder Number: If you placed order on-line, your order number is the 6 digits order number you received in your e-mail.'],
]
scores = model.predict(pairs)
print(scores)
# [0.974  0.8187 0.9724 0.1225 0.9689]

# Or rank different texts based on similarity to a single text
ranks = model.rank(
    'play stevie nicks',
    [
        "Stevie Nicks. Stephanie Lynn Stevie Nicks is an American singer-songwriter. Often regarded as the Queen of Rock n' Roll, Nicks is best known for both her work as frontwoman of Fleetwood Mac and for her solo career. She is also known for her distinctive voice, mystical visual style, and symbolic lyrics.",
        'Cost of Bay and Bow Windows. Bay windows are not cheap and the average price for a basic bay window can be anywhere from $1200 to $3000 for just the window. The cost of the bay window will depend on how large the window is and the materials that are used for creating the window frame.ow windows are great because they offer a bit more dimension and a unique look for any home. The average cost of a typical bow window is pretty much the same as bay windows or even slightly more $1400 to $3200.',
        'The Indian subcontinent. The Indian subcontinent is a vast area the size of Europe, and is today divided into the separate countries of India, Pakistan and Bangladesh.Within the subcontinent itself, there is a wide variety of peoples, languages and religions.he Indian subcontinent. The Indian subcontinent is a vast area the size of Europe, and is today divided into the separate countries of India, Pakistan and Bangladesh.',
        'UML stands for Unified Modeling Language which is used in object oriented software engineering. Although typically used in software engineering it is a rich language that can be used to model an application structures, behavior and even business processes.There are 14 UML diagram types to help you model these behavior.ML stands for Unified Modeling Language which is used in object oriented software engineering. Although typically used in software engineering it is a rich language that can be used to model an application structures, behavior and even business processes.',
        'Track Your UPS Package By Your Order Number. This order tracker will only track UPS shipments. If your order was shipped using USPS, please visit USPS.com and enter in the tracking number e-mailed to you.rder Number: If you placed order on-line, your order number is the 6 digits order number you received in your e-mail.',
    ]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]

Evaluation

Metrics

Cross Encoder Reranking

Datasets: NanoMSMARCO_R100, NanoNFCorpus_R100 and NanoNQ_R100

Evaluated with CrossEncoderRerankingEvaluator with these parameters:

{
    "at_k": 10,
    "always_rerank_positives": true
}

Metric	NanoMSMARCO_R100	NanoNFCorpus_R100	NanoNQ_R100
map	0.5128 (+0.0232)	0.3559 (+0.0949)	0.6672 (+0.2476)
mrr@10	0.5046 (+0.0271)	0.6367 (+0.1369)	0.6882 (+0.2615)
ndcg@10	0.6016 (+0.0612)	0.4343 (+0.1092)	0.7187 (+0.2181)

Cross Encoder Nano BEIR

Dataset: NanoBEIR_R100_mean

Evaluated with CrossEncoderNanoBEIREvaluator with these parameters:

{
    "dataset_names": [
        "msmarco",
        "nfcorpus",
        "nq"
    ],
    "dataset_id": "sentence-transformers/NanoBEIR-en",
    "rerank_k": 100,
    "at_k": 10,
    "always_rerank_positives": true
}

Metric	Value
map	0.5120 (+0.1219)
mrr@10	0.6098 (+0.1418)
ndcg@10	0.5849 (+0.1295)

Training Details

Training Dataset

msmarco

Dataset: msmarco
Size: 1,990,000 training samples
Columns: query, passage, and score

Approximate statistics based on the first 100 samples:

	query	passage	score
type	string	string	float
modality	text	text
details	min: 5 tokens mean: 9.36 tokens max: 29 tokens	min: 25 tokens mean: 80.91 tokens max: 162 tokens	min: 0.0 mean: 0.45 max: 1.0

Samples:

query	passage	score
`does azithromycin have sulfa in it`	`1 They are found in antibiotic medications, as they treat bacteriaâ¦. 2 Allergies Related to Sulfa If you have allergies to sulfa there are many things that should be avoided.`	`1.0`
`pediatrician average salary`	`$37K Jones International University's Average Admissions Counselor Salary (20 salaries) +$1K (2%) more than national average Admissions Counselor salary ($36K) -$4K (10%) less than average Jones International University salary ($41K)`	`0.0`
`what are chrysler brands`	`In the event of a lost or stolen key, contact a certified Chrysler dealer, ... How to Replace Chrysler Car Keys. Chrysler... 1 How to Replace the Key in the Chrysler Crossfire Key FOB.n the event of a lost or stolen key, contact a certified Chrysler dealer, ... How to Replace Chrysler Car Keys. Chrysler... 1 How to Replace the Key in the Chrysler Crossfire Key FOB.`	`0.0`

Loss: BinaryCrossEntropyLoss with these parameters:

{
    "activation_fn": "torch.nn.modules.linear.Identity",
    "pos_weight": null
}

Evaluation Dataset

msmarco

Dataset: msmarco
Size: 10,000 evaluation samples
Columns: query, passage, and score

Approximate statistics based on the first 100 samples:

	query	passage	score
type	string	string	float
modality	text	text
details	min: 5 tokens mean: 8.6 tokens max: 22 tokens	min: 26 tokens mean: 81.51 tokens max: 263 tokens	min: 0.0 mean: 0.53 max: 1.0

Samples:

query	passage	score
`play stevie nicks`	`Stevie Nicks. Stephanie Lynn Stevie Nicks is an American singer-songwriter. Often regarded as the Queen of Rock n' Roll, Nicks is best known for both her work as frontwoman of Fleetwood Mac and for her solo career. She is also known for her distinctive voice, mystical visual style, and symbolic lyrics.`	`1.0`
`average cost to replace bay window`	`Cost of Bay and Bow Windows. Bay windows are not cheap and the average price for a basic bay window can be anywhere from $1200 to $3000 for just the window. The cost of the bay window will depend on how large the window is and the materials that are used for creating the window frame.ow windows are great because they offer a bit more dimension and a unique look for any home. The average cost of a typical bow window is pretty much the same as bay windows or even slightly more $1400 to $3200.`	`1.0`
`what is the indian subcontinent`	`The Indian subcontinent. The Indian subcontinent is a vast area the size of Europe, and is today divided into the separate countries of India, Pakistan and Bangladesh.Within the subcontinent itself, there is a wide variety of peoples, languages and religions.he Indian subcontinent. The Indian subcontinent is a vast area the size of Europe, and is today divided into the separate countries of India, Pakistan and Bangladesh.`	`1.0`

Loss: BinaryCrossEntropyLoss with these parameters:

{
    "activation_fn": "torch.nn.modules.linear.Identity",
    "pos_weight": null
}

Training Hyperparameters

Non-Default Hyperparameters

per_device_train_batch_size: 16
num_train_epochs: 1
learning_rate: 2e-05
warmup_steps: 0.1
bf16: True
per_device_eval_batch_size: 16
load_best_model_at_end: True
seed: 12
dataloader_num_workers: 4

All Hyperparameters

Click to expand

per_device_train_batch_size: 16
num_train_epochs: 1
max_steps: -1
learning_rate: 2e-05
lr_scheduler_type: linear
lr_scheduler_kwargs: None
warmup_steps: 0.1
optim: adamw_torch_fused
optim_args: None
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
optim_target_modules: None
gradient_accumulation_steps: 1
average_tokens_across_devices: True
max_grad_norm: 1.0
label_smoothing_factor: 0.0
bf16: True
fp16: False
bf16_full_eval: False
fp16_full_eval: False
tf32: None
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
use_liger_kernel: False
liger_kernel_config: None
use_cache: False
neftune_noise_alpha: None
torch_empty_cache_steps: None
auto_find_batch_size: False
log_on_each_node: True
logging_nan_inf_filter: True
include_num_input_tokens_seen: no
log_level: passive
log_level_replica: warning
disable_tqdm: False
project: huggingface
trackio_space_id: None
trackio_bucket_id: None
trackio_static_space_id: None
per_device_eval_batch_size: 16
prediction_loss_only: True
eval_on_start: False
eval_do_concat_batches: True
eval_use_gather_object: False
eval_accumulation_steps: None
include_for_metrics: []
batch_eval_metrics: False
save_only_model: False
save_on_each_node: False
enable_jit_checkpoint: False
push_to_hub: False
hub_private_repo: None
hub_model_id: None
hub_strategy: every_save
hub_always_push: False
hub_revision: None
load_best_model_at_end: True
ignore_data_skip: False
restore_callback_states_from_checkpoint: False
full_determinism: False
seed: 12
data_seed: None
use_cpu: False
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
parallelism_config: None
dataloader_drop_last: False
dataloader_num_workers: 4
dataloader_pin_memory: True
dataloader_persistent_workers: False
dataloader_prefetch_factor: None
remove_unused_columns: True
label_names: None
train_sampling_strategy: random
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
ddp_static_graph: None
ddp_backend: None
ddp_timeout: 1800
fsdp: []
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
deepspeed: None
debug: []
skip_memory_metrics: True
do_predict: False
resume_from_checkpoint: None
warmup_ratio: None
local_rank: -1
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional
router_mapping: {}
learning_rate_mapping: {}

Training Logs

Epoch	Step	Training Loss	Validation Loss	NanoMSMARCO_R100_ndcg@10	NanoNFCorpus_R100_ndcg@10	NanoNQ_R100_ndcg@10	NanoBEIR_R100_mean_ndcg@10
-1	-1	-	-	0.0300 (-0.5104)	0.2528 (-0.0723)	0.0168 (-0.4839)	0.0999 (-0.3555)
0.0000	1	0.6905	-	-	-	-	-
0.0322	4000	0.4362	-	-	-	-	-
0.0643	8000	0.2573	-	-	-	-	-
0.0804	10000	-	0.2379	0.5997 (+0.0593)	0.4138 (+0.0887)	0.6684 (+0.1677)	0.5606 (+0.1053)
0.0965	12000	0.2365	-	-	-	-	-
0.1286	16000	0.2301	-	-	-	-	-
0.1608	20000	0.2201	0.2005	0.6586 (+0.1182)	0.3775 (+0.0524)	0.6952 (+0.1945)	0.5771 (+0.1217)
0.1930	24000	0.2163	-	-	-	-	-
0.2251	28000	0.2059	-	-	-	-	-
0.2412	30000	-	0.1832	0.6016 (+0.0612)	0.4343 (+0.1092)	0.7187 (+0.2181)	0.5849 (+0.1295)
0.2573	32000	0.2023	-	-	-	-	-
0.2894	36000	0.2022	-	-	-	-	-
0.3216	40000	0.1983	0.1821	0.6194 (+0.0790)	0.3857 (+0.0606)	0.6927 (+0.1921)	0.5659 (+0.1106)
0.3538	44000	0.1959	-	-	-	-	-
0.3859	48000	0.1926	-	-	-	-	-
0.4020	50000	-	0.1949	0.6366 (+0.0962)	0.3895 (+0.0644)	0.6998 (+0.1991)	0.5753 (+0.1199)
0.4181	52000	0.1906	-	-	-	-	-
0.4503	56000	0.1904	-	-	-	-	-
0.4824	60000	0.1810	0.1785	0.6540 (+0.1135)	0.3851 (+0.0601)	0.6875 (+0.1868)	0.5755 (+0.1201)
0.5146	64000	0.1842	-	-	-	-	-
0.5467	68000	0.1845	-	-	-	-	-
0.5628	70000	-	0.1726	0.6260 (+0.0856)	0.3718 (+0.0468)	0.7048 (+0.2041)	0.5675 (+0.1122)
0.5789	72000	0.1806	-	-	-	-	-
0.6111	76000	0.1765	-	-	-	-	-
0.6432	80000	0.1743	0.1791	0.6537 (+0.1132)	0.3686 (+0.0436)	0.7080 (+0.2074)	0.5768 (+0.1214)
0.6754	84000	0.1739	-	-	-	-	-
0.7075	88000	0.1717	-	-	-	-	-
0.7236	90000	-	0.1842	0.6470 (+0.1066)	0.3684 (+0.0433)	0.7055 (+0.2049)	0.5736 (+0.1183)
0.7397	92000	0.1696	-	-	-	-	-
0.7719	96000	0.1657	-	-	-	-	-
0.8040	100000	0.1671	0.1739	0.6541 (+0.1137)	0.3770 (+0.0520)	0.7131 (+0.2125)	0.5814 (+0.1260)
0.8362	104000	0.1675	-	-	-	-	-
0.8683	108000	0.1662	-	-	-	-	-
0.8844	110000	-	0.1672	0.6522 (+0.1118)	0.3750 (+0.0500)	0.7023 (+0.2016)	0.5765 (+0.1211)
0.9005	112000	0.1627	-	-	-	-	-
0.9327	116000	0.1642	-	-	-	-	-
0.9648	120000	0.1644	0.1675	0.6528 (+0.1124)	0.3671 (+0.0421)	0.7161 (+0.2154)	0.5787 (+0.1233)
0.9970	124000	0.1639	-	-	-	-	-
1.0	124375	-	0.1652	0.6624 (+0.1220)	0.3655 (+0.0405)	0.7266 (+0.2260)	0.5848 (+0.1295)
-1	-1	-	-	0.6016 (+0.0612)	0.4343 (+0.1092)	0.7187 (+0.2181)	0.5849 (+0.1295)

The bold row denotes the saved checkpoint.

Training Time

Training: 45.3 minutes
Evaluation: 3.2 minutes
Total: 48.5 minutes

Framework Versions

Python: 3.12.3
Sentence Transformers: 5.5.0
Transformers: 5.8.0
PyTorch: 2.9.1+cu128
Accelerate: 1.13.0
Datasets: 4.8.5
Tokenizers: 0.22.2

Additional Resources

Training and Finetuning Reranker Models with Sentence Transformers: the end-to-end guide for training or finetuning Cross Encoder (reranker) models.
Multimodal Embedding & Reranker Models with Sentence Transformers: use text, image, audio, and video reranker models through the same API.
Training and Finetuning Multimodal Embedding & Reranker Models with Sentence Transformers: training multimodal Cross Encoders.

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

Downloads last month: 18

Safetensors

Model size

33.4M params

Tensor type

F32

Inference Providers NEW

Text Ranking

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rorry-brenner/reranker-MiniLM-L12-H384-uncased-msmarco-bce

Base model

microsoft/MiniLM-L12-H384-uncased

Finetuned

(128)

this model

Paper for rorry-brenner/reranker-MiniLM-L12-H384-uncased-msmarco-bce

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Paper • 1908.10084 • Published Aug 27, 2019 • 13

Evaluation results

Map on NanoMSMARCO R100
self-reported

0.513
Mrr@10 on NanoMSMARCO R100
self-reported

0.505
Ndcg@10 on NanoMSMARCO R100
self-reported

0.602
Map on NanoNFCorpus R100
self-reported

0.356
Mrr@10 on NanoNFCorpus R100
self-reported

0.637
Ndcg@10 on NanoNFCorpus R100
self-reported

0.434
Map on NanoNQ R100
self-reported

0.667
Mrr@10 on NanoNQ R100
self-reported

0.688