ModernBERT-base trained on Chemistry

This is a Cross Encoder model finetuned from GaborMadarasz/ModernBERT-base-hungarian using the sentence-transformers library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.

Model Details

Model Description

Model Type: Cross Encoder
Base model: GaborMadarasz/ModernBERT-base-hungarian
Maximum Sequence Length: 8192 tokens
Number of Output Labels: 1 label
Language: hu
License: apache-2.0

Model Sources

Documentation: Sentence Transformers Documentation
Documentation: Cross Encoder Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Cross Encoders on Hugging Face

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import CrossEncoder

# Download from the 🤗 Hub
model = CrossEncoder("GaborMadarasz/reranker-ModernBERT-base-hungarian")
# Get scores for pairs of texts
pairs = [
    ['Milyen halmazállapotú a klór szobahőmérsékleten?', 'Gáz'],
    ['Milyen halmazállapotú a klór szobahőmérsékleten?', 'Gáz.'],
    ['Mi az izoméria fogalma?', 'Azonos összegképletű, de eltérő szerkezetű és tulajdonságú anyagok. '],
    ['Melyik elektronhéjon található a hidrogénatom egyetlen elektronja?', 'Az első héjon.'],
    ['Milyen felhasználási területei vannak a szilíciumnak?', 'Ötvözőelemként, tranzisztorok, integrált áramkörök, fényelemek előállítására.'],
]
scores = model.predict(pairs)
print(scores.shape)
# (5,)

# Or rank different texts based on similarity to a single text
ranks = model.rank(
    'Milyen halmazállapotú a klór szobahőmérsékleten?',
    [
        'Gáz',
        'Gáz.',
        'Azonos összegképletű, de eltérő szerkezetű és tulajdonságú anyagok. ',
        'Az első héjon.',
        'Ötvözőelemként, tranzisztorok, integrált áramkörök, fényelemek előállítására.',
    ]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]

Evaluation

Metrics

Cross Encoder Reranking

Dataset: chem-dev

Evaluated with CrossEncoderRerankingEvaluator with these parameters:

{
    "at_k": 10,
    "always_rerank_positives": false
}

Metric	Value
map	0.4646 (+0.0929)
mrr@10	0.4614 (+0.0966)
ndcg@10	0.4928 (+0.0910)

Training Details

Training Dataset

Unnamed Dataset

Size: 32,113 training samples
Columns: query, answer, and label

Approximate statistics based on the first 1000 samples:

	query	answer	label
type	string	string	int
details	min: 8 characters mean: 52.3 characters max: 159 characters	min: 1 characters mean: 83.87 characters max: 531 characters	0: ~69.80% 1: ~30.20%

Samples:

query	answer	label
`Milyen halmazállapotú a klór szobahőmérsékleten?`	`Gáz`	`1`
`Milyen halmazállapotú a klór szobahőmérsékleten?`	`Gáz.`	`1`
`Mi az izoméria fogalma?`	`Azonos összegképletű, de eltérő szerkezetű és tulajdonságú anyagok.`	`1`

Loss: BinaryCrossEntropyLoss with these parameters:

{
    "activation_fn": "torch.nn.modules.linear.Identity",
    "pos_weight": 5
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 2
per_device_eval_batch_size: 2
gradient_accumulation_steps: 8
learning_rate: 2e-05
warmup_ratio: 0.1
seed: 12
dataloader_num_workers: 2
load_best_model_at_end: True

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 2
per_device_eval_batch_size: 2
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 8
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 2e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 3
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 12
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 2
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: True
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
hub_revision: None
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
liger_kernel_config: None
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional
router_mapping: {}
learning_rate_mapping: {}

Training Logs

Epoch	Step	Training Loss	chem-dev_ndcg@10
-1	-1	-	0.1188 (-0.2831)
0.0005	1	1.9222	-
0.0498	100	1.8084	-
0.0996	200	1.2947	0.2862 (-0.1157)
0.1495	300	1.1573	-
0.1993	400	1.17	0.3567 (-0.0452)
0.2491	500	1.0609	-
0.2989	600	1.01	0.3747 (-0.0272)
0.3488	700	0.9806	-
0.3986	800	0.9208	0.3963 (-0.0056)
0.4484	900	0.9022	-
0.4982	1000	0.8722	0.4106 (+0.0087)
0.5480	1100	0.9325	-
0.5979	1200	0.768	0.4316 (+0.0298)
0.6477	1300	0.8151	-
0.6975	1400	0.7569	0.4506 (+0.0487)
0.7473	1500	0.7216	-
0.7972	1600	0.7571	0.4643 (+0.0625)
0.8470	1700	0.6993	-
0.8968	1800	0.6709	0.4713 (+0.0694)
0.9466	1900	0.7021	-
0.9965	2000	0.7693	0.4805 (+0.0787)
1.0458	2100	0.5179	-
1.0957	2200	0.4932	0.4800 (+0.0781)
1.1455	2300	0.5568	-
1.1953	2400	0.4191	0.4821 (+0.0803)
1.2451	2500	0.4702	-
1.2949	2600	0.4126	0.4851 (+0.0833)
1.3448	2700	0.4744	-
1.3946	2800	0.4404	0.4907 (+0.0888)
1.4444	2900	0.4712	-
1.4942	3000	0.4382	0.4913 (+0.0894)
1.5441	3100	0.5049	-
1.5939	3200	0.4714	0.4886 (+0.0868)
1.6437	3300	0.3885	-
1.6935	3400	0.4361	0.4924 (+0.0906)
1.7434	3500	0.4207	-
1.7932	3600	0.4384	0.4928 (+0.0910)
1.8430	3700	0.4187	-
1.8928	3800	0.4271	0.4937 (+0.0919)
1.9426	3900	0.3581	-
1.9925	4000	0.3751	0.4910 (+0.0891)
2.0419	4100	0.2494	-
2.0917	4200	0.2045	0.4869 (+0.0850)
2.1415	4300	0.1532	-
2.1913	4400	0.1268	0.4838 (+0.0820)
2.2411	4500	0.2108	-
2.2910	4600	0.2292	0.4889 (+0.0870)
2.3408	4700	0.2154	-
2.3906	4800	0.1574	0.4921 (+0.0902)
2.4404	4900	0.1677	-
2.4903	5000	0.1596	0.4826 (+0.0807)
2.5401	5100	0.1456	-
2.5899	5200	0.2177	0.4867 (+0.0849)
2.6397	5300	0.1227	-
2.6895	5400	0.1638	0.4880 (+0.0862)
2.7394	5500	0.1192	-
2.7892	5600	0.2003	0.4848 (+0.0829)
2.8390	5700	0.2717	-
2.8888	5800	0.1546	0.4841 (+0.0822)
2.9387	5900	0.268	-
2.9885	6000	0.2253	0.4858 (+0.0840)
-1	-1	-	0.4928 (+0.0910)

The bold row denotes the saved checkpoint.

Framework Versions

Python: 3.10.12
Sentence Transformers: 5.0.0
Transformers: 4.53.2
PyTorch: 2.7.0+cpu
Accelerate: 1.6.0
Datasets: 3.2.0
Tokenizers: 0.21.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}