CrossEncoder based on microsoft/MiniLM-L12-H384-uncased

This is a Cross Encoder model finetuned from microsoft/MiniLM-L12-H384-uncased on the ms_marco dataset using the sentence-transformers library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.

Model Details

Model Description

Model Type: Cross Encoder
Base model: microsoft/MiniLM-L12-H384-uncased
Maximum Sequence Length: 512 tokens
Number of Output Labels: 1 label
Training Dataset:
- ms_marco
Language: en

Model Sources

Documentation: Sentence Transformers Documentation
Documentation: Cross Encoder Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Cross Encoders on Hugging Face

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import CrossEncoder

# Download from the 🤗 Hub
model = CrossEncoder("yjoonjang/reranker-msmarco-v1.1-MiniLM-L12-H384-uncased-plistmle-sigmoid")
# Get scores for pairs of texts
pairs = [
    ['How many calories in an egg', 'There are on average between 55 and 80 calories in an egg depending on its size.'],
    ['How many calories in an egg', 'Egg whites are very low in calories, have no fat, no cholesterol, and are loaded with protein.'],
    ['How many calories in an egg', 'Most of the calories in an egg come from the yellow yolk in the center.'],
]
scores = model.predict(pairs)
print(scores.shape)
# (3,)

# Or rank different texts based on similarity to a single text
ranks = model.rank(
    'How many calories in an egg',
    [
        'There are on average between 55 and 80 calories in an egg depending on its size.',
        'Egg whites are very low in calories, have no fat, no cholesterol, and are loaded with protein.',
        'Most of the calories in an egg come from the yellow yolk in the center.',
    ]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]

Evaluation

Metrics

Cross Encoder Reranking

Datasets: NanoMSMARCO_R100, NanoNFCorpus_R100 and NanoNQ_R100

Evaluated with CrossEncoderRerankingEvaluator with these parameters:

{
    "at_k": 10,
    "always_rerank_positives": true
}

Metric	NanoMSMARCO_R100	NanoNFCorpus_R100	NanoNQ_R100
map	0.4636 (-0.0260)	0.3174 (+0.0564)	0.5700 (+0.1504)
mrr@10	0.4500 (-0.0275)	0.4912 (-0.0086)	0.5739 (+0.1472)
ndcg@10	0.5191 (-0.0213)	0.3169 (-0.0081)	0.6383 (+0.1377)

Cross Encoder Nano BEIR

Dataset: NanoBEIR_R100_mean

Evaluated with CrossEncoderNanoBEIREvaluator with these parameters:

{
    "dataset_names": [
        "msmarco",
        "nfcorpus",
        "nq"
    ],
    "rerank_k": 100,
    "at_k": 10,
    "always_rerank_positives": true
}

Metric	Value
map	0.4503 (+0.0603)
mrr@10	0.5051 (+0.0371)
ndcg@10	0.4915 (+0.0361)

Training Details

Training Dataset

ms_marco

Dataset: ms_marco at a47ee7a
Size: 78,704 training samples
Columns: query, docs, and labels

Approximate statistics based on the first 1000 samples:

	query	docs	labels
type	string	list	list
details	min: 11 characters mean: 33.74 characters max: 100 characters	min: 3 elements mean: 6.50 elements max: 10 elements	min: 3 elements mean: 6.50 elements max: 10 elements

Samples:

query	docs	labels
`cost of installing central air`	['Central Air Average Costs. The actual cost of central air installation depends on a number of factors, including the size of the home as well as the unit’s tonnage and SEER rating. 1 In a 2,000 square foot home with existing ductwork, central air conditioning costs $3,000 to $5,000 installed. 1 In a 2,000 square foot home with existing ductwork, central air conditioning costs $3,000 to $5,000 installed. 2 If ductwork is additionally required, costs could reach $6,000 to $10,000 or more. 3 Mini-split central air conditioner prices average $1,500 to $3', 'For example, homes with forced hot air heating will have the duct work necessary for a fast and easy installation, when the project involves the running of ducts however the prices climb significantly. The average price to install a central air conditioner will range from $2650 to upwards of $15K. This installation cannot be considered a DIY project, and it is traditional for a homeowner to hire a contractor for the job. Central ai...	`[1, 0, 0, 0, 0, ...]`
`how much does it cost to set up a cabinet shop`	['According to Kennedy, most cabinets range from $500 to $1,500 per cabinet box. Based on an estimated 30 cabinets in an average-size kitchen, you can be looking at a cost of about $15,000-$45,000, she says. Discover everything you need to know about cabinets with our free guide! 1. Measure the dimensions of your kitchen', "December 28, 2005 Question Those of you who consider your operation small, what type of machinery is the minimum for what you do? I'm starting a one man shop, 2,400 square feet, and know what I would like to have to start, but am curious how the rest of you get by. A simple streamlined operation that worked for professional builders, and sell some to DIYers for a retail price. I am a one man shop that builds cabinets, furniture and exterior/interior doors. My shop is 1600 sq ft with 300 sq ft of it being a small spray room.", 'Seven years later and I moved out of the garage to a more legitimate setting in an industrial park. Today, 25 years after starting out, my co...	`[1, 0, 0, 0, 0, ...]`
`how close can a gas meter be to a condensing unit`	['Is it dangerous if it is close to the gas meter/pipe? Thanks! It should be 3 feet from the gas meter vent, and not the actual gas meter itself. The gas company can come out later to extend this vent further away from the meter if it is within 3 feet. But the chance that anything actually happening because of the ac too close to the vent is insanely remote. I would be more worried about getting hit by lightning than any problems with the gas.', 'Condensing Unit Too Close to House – Bad air conditioner installation jobs such as this one proves that it is in the best interest of the homeowner to hire competent HVAC air conditioner and heating installers so that the job is done correctly.', "Re: Condensing furnace Exhaust, Distances from window, electric and gas meters. Joel, 3 ft from operable window is what I have on the Electrical Service. Gas meter looks OK. Install instructions in your post says if below 100,000 btu clearance is 12', and 36' if over 100,000 btu.", 'Condensing Unit Too Close to House. This condensing unit was too close to the house to effectively reject heat. It was a bad HVAC condensing unit installation job by the HVAC installers. A mechanical inspector rejected the final for the permit until the condensing unit was correctly installed. It is recommended that condensing units have at least 2 feet of space so that it can']	`[1, 0, 0, 0]`

Loss: ListMLELoss with these parameters:

{
    "lambda_weight": "sentence_transformers.cross_encoder.losses.ListMLELoss.ListMLELambdaWeight",
    "activation_fct": "torch.nn.modules.activation.Sigmoid",
    "mini_batch_size": 16,
    "respect_input_order": true
}

Evaluation Dataset

ms_marco

Dataset: ms_marco at a47ee7a
Size: 1,000 evaluation samples
Columns: query, docs, and labels

Approximate statistics based on the first 1000 samples:

	query	docs	labels
type	string	list	list
details	min: 11 characters mean: 34.38 characters max: 99 characters	min: 2 elements mean: 6.00 elements max: 10 elements	min: 2 elements mean: 6.00 elements max: 10 elements

Samples:

query	docs	labels
`how long does an iva stay on your credit file`	['For example your payments to your mobile phone (if you’re on a contract) and electricity companies will also appear in your credit report. Your IVA will show on your credit file for six years from the day it started. So if your IVA was five years long it will only be listed on your credit file for a further 12 months. The idea behind asking creditors to correct the dates on default notices is to make sure that these too will be gone within 12 months. Post IVA credit file clean up. It’s a happy day when your individual voluntary arrangement (IVA) finally ends, you’re well and truly free and clear and your money is your own again. You can also take satisfaction from the fact that you have done your best by your creditors.', 'LinkedIn0. An Individual Voluntary Arrangement (IVA) is recorded on your credit file for 6 years. During this time your credit rating will be negatively affected. Unfortunately your credit rating will not suddenly become good again after your Arrangement has ended ...	`[1, 0, 0, 0, 0, ...]`
`Plants which produce their gametes in flowers are called what?`	['Plants which produce their gametes in flowers are called: antheridium, gymnosperms, angiosperms, or vascular. They are called angiosperms.', 'In humans, cells that do not produce gametes are collectively called somatic cells. Somatic cells do not include sperm and ova, the cells from which they are made, and und … ifferentiated stem cells.', 'This event is called fertilization. The male gametes produced by animals and some plants (e.g., club mosses, horsetails, ferns) are called spermatozoa (plural of spermatozoon), or simply sperm. Their female gametes are called ova (plural of ovum). Ova are often called eggs. Most plants produce male gametes called pollen grains.', 'Unlike animals, plants have multicellular haploid and multicellular diploid stages in their life cycle. Gametes develop from the multicellular haploid gametophytes (Greek phyton, plant). Fertilization gives rise to a multicellular, diploid sporophyte that produces haploid spores via meiosis.', 'Original conversation...	`[1, 0, 0, 0, 0, ...]`
`what is a dts sound system`	['DTS is a series of multichannel audio technologies owned by DTS, Inc. (formerly known as D igital T heater S ystems, Inc.), an American company specializing in digital surround sound formats used for both commercial/theatrical and consumer grade applications. This system is the consumer version of the DTS standard, using a similar codec without needing separate DTS CD-ROM media. Both music and movie DVDs allow delivery of DTS audio signal, but DTS was not part of the original DVD specification, so early DVD players do not recognize DTS audio tracks at all.', 'DTS Connect is a blanket name for a two-part system used on the computer platform only, in order to convert PC audio into the DTS format, transported via a single S/PDIF cable. The two components of the system are DTS Interactive and DTS Neo:PC. This system is the consumer version of the DTS standard, using a similar codec without needing separate DTS CD-ROM media. Both music and movie DVDs allow delivery of DTS audio signal, bu...	`[1, 0, 0, 0, 0, ...]`

Loss: ListMLELoss with these parameters:

{
    "lambda_weight": "sentence_transformers.cross_encoder.losses.ListMLELoss.ListMLELambdaWeight",
    "activation_fct": "torch.nn.modules.activation.Sigmoid",
    "mini_batch_size": 16,
    "respect_input_order": true
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 16
per_device_eval_batch_size: 16
learning_rate: 2e-05
num_train_epochs: 1
warmup_ratio: 0.1
seed: 12
bf16: True
load_best_model_at_end: True

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 16
per_device_eval_batch_size: 16
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 2e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 1
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 12
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: True
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: True
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional

Training Logs

Epoch	Step	Training Loss	Validation Loss	NanoMSMARCO_R100_ndcg@10	NanoNFCorpus_R100_ndcg@10	NanoNQ_R100_ndcg@10	NanoBEIR_R100_mean_ndcg@10
-1	-1	-	-	0.0407 (-0.4997)	0.2816 (-0.0435)	0.0231 (-0.4775)	0.1151 (-0.3402)
0.0002	1	883.6996	-	-	-	-	-
0.0508	250	921.6613	-	-	-	-	-
0.1016	500	904.6479	856.3090	0.1094 (-0.4310)	0.2034 (-0.1216)	0.2049 (-0.2957)	0.1726 (-0.2828)
0.1525	750	900.1757	-	-	-	-	-
0.2033	1000	892.1912	847.0684	0.3615 (-0.1789)	0.2856 (-0.0394)	0.5605 (+0.0598)	0.4025 (-0.0528)
0.2541	1250	891.0896	-	-	-	-	-
0.3049	1500	882.4826	844.2736	0.4446 (-0.0959)	0.3072 (-0.0178)	0.6115 (+0.1108)	0.4544 (-0.0009)
0.3558	1750	878.0654	-	-	-	-	-
0.4066	2000	878.2091	840.3965	0.4614 (-0.0791)	0.3450 (+0.0200)	0.6472 (+0.1466)	0.4845 (+0.0292)
0.4574	2250	878.5553	-	-	-	-	-
0.5082	2500	877.2454	841.2769	0.4602 (-0.0802)	0.3123 (-0.0127)	0.5765 (+0.0759)	0.4497 (-0.0057)
0.5591	2750	864.5746	-	-	-	-	-
0.6099	3000	899.3305	838.2897	0.4752 (-0.0652)	0.3152 (-0.0099)	0.6333 (+0.1326)	0.4746 (+0.0192)
0.6607	3250	870.9701	-	-	-	-	-
0.7115	3500	873.4406	835.9516	0.5191 (-0.0213)	0.3169 (-0.0081)	0.6383 (+0.1377)	0.4915 (+0.0361)
0.7624	3750	882.9871	-	-	-	-	-
0.8132	4000	881.5676	836.2292	0.5024 (-0.0380)	0.3269 (+0.0019)	0.6350 (+0.1343)	0.4881 (+0.0327)
0.8640	4250	884.8231	-	-	-	-	-
0.9148	4500	875.8995	834.7368	0.5028 (-0.0376)	0.3284 (+0.0034)	0.6200 (+0.1193)	0.4837 (+0.0284)
0.9656	4750	868.8395	-	-	-	-	-
-1	-1	-	-	0.5191 (-0.0213)	0.3169 (-0.0081)	0.6383 (+0.1377)	0.4915 (+0.0361)

The bold row denotes the saved checkpoint.

Framework Versions

Python: 3.11.11
Sentence Transformers: 3.5.0.dev0
Transformers: 4.49.0
PyTorch: 2.6.0+cu124
Accelerate: 1.5.2
Datasets: 3.4.0
Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

ListMLELoss

@inproceedings{lan2013position,
    title={Position-aware ListMLE: a sequential learning process for ranking},
    author={Lan, Yanyan and Guo, Jiafeng and Cheng, Xueqi and Liu, Tie-Yan},
    booktitle={Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence},
    pages={333--342},
    year={2013}
}

Downloads last month: 5

Safetensors

Model size

33.4M params

Tensor type

F32

Inference Providers NEW

Text Ranking

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for yjoonjang/reranker-msmarco-v1.1-MiniLM-L12-H384-uncased-plistmle-sigmoid

Base model

microsoft/MiniLM-L12-H384-uncased

Finetuned

(125)

this model

Dataset used to train yjoonjang/reranker-msmarco-v1.1-MiniLM-L12-H384-uncased-plistmle-sigmoid

Paper for yjoonjang/reranker-msmarco-v1.1-MiniLM-L12-H384-uncased-plistmle-sigmoid

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Paper • 1908.10084 • Published Aug 27, 2019 • 12

Evaluation results

Map on NanoMSMARCO R100
self-reported

0.464
Mrr@10 on NanoMSMARCO R100
self-reported

0.450
Ndcg@10 on NanoMSMARCO R100
self-reported

0.519
Map on NanoNFCorpus R100
self-reported

0.317
Mrr@10 on NanoNFCorpus R100
self-reported

0.491
Ndcg@10 on NanoNFCorpus R100
self-reported

0.317
Map on NanoNQ R100
self-reported

0.570
Mrr@10 on NanoNQ R100
self-reported

0.574
Ndcg@10 on NanoNQ R100
self-reported

0.638
Map on NanoBEIR R100 mean
self-reported

0.450
Mrr@10 on NanoBEIR R100 mean
self-reported

0.505
Ndcg@10 on NanoBEIR R100 mean
self-reported

0.491