BASILISK EL Cross-Encoder (BiomedBERT AB) v1

This model is a biomedical entity-linking (EL) cross-encoder used by BASILISK to rerank UMLS concept candidates for a mention in context.

It is fine-tuned from:

microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext
Base revision: e1354b7a3a09615f6aba48dfad4b7a613eef7062

Published model repo:

Bam3752/basilisk-el-ce-biomedbert-ab-v1

Pinned release revision:

9b11786be83352196058af654d255e8441a75356

Task

Binary classification over a (mention/context, candidate concept) pair:

label 1: candidate concept is correct for the mention in context
label 0: candidate concept is incorrect

In BASILISK runtime, candidate ranking score is:

score = sigmoid(logit_1 - logit_0)

Input Serialization

The model is trained and used with paired text:

Left sequence: mention: <mention> ; context: <local_context>
Right sequence: candidate: <concept_name> ; tuis: <TUI list>

This format should be kept consistent between training and inference.

Training Data and Splits

Data was generated by the BASILISK EL training-set pipeline.

Split sizes:

Train: 281,902
Dev: 35,420
Test: 35,100

Class balance:

Train positives: 124,487, negatives: 157,415 (neg_per_pos=1.2645)
Dev positives: 15,608, negatives: 19,812 (neg_per_pos=1.2693)
Test positives: 15,405, negatives: 19,695 (neg_per_pos=1.2785)

Training Configuration

Key settings:

Profile: quality_max
Objective (this run): binary cross-entropy
Epochs: 4
Batch size: 24
Gradient accumulation: 4 (effective batch 96)
Max length: 256
Learning rate: 2e-5 with cosine schedule
Warmup ratio: 0.1
Weight decay: 0.01
Seed: 13
Device: mps

Model selection:

Metric: dev_f1
Best checkpoint step: 5500
Best selected dev F1: 0.9174

Calibration

A calibration artifact is provided at:

calibration/biomedbert_ab.json

Calibration method:

global temperature scaling + bucket offsets
temperature: 1.25

Dev calibration effect:

Raw ECE: 0.017000 -> Calibrated ECE: 0.003967
Raw Brier: 0.052839 -> Calibrated Brier: 0.051863

Evaluation Summary

Raw metrics:

Split	Accuracy	Precision	Recall	F1	ECE	Brier
Dev	0.9292	0.9446	0.8918	0.9174	0.017000	0.052839
Test	0.9282	0.9456	0.8875	0.9156	0.017083	0.053027

Calibrated metrics:

Split	Accuracy	Precision	Recall	F1	ECE	Brier
Dev (calibrated)	0.9296	0.9455	0.8916	0.9178	0.003967	0.051863
Test (calibrated)	0.9282	0.9458	0.8874	0.9156	0.006504	0.052334

Usage (Transformers)

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

repo_id = "Bam3752/basilisk-el-ce-biomedbert-ab-v1"
revision = "9b11786be83352196058af654d255e8441a75356"

tokenizer = AutoTokenizer.from_pretrained(repo_id, revision=revision)
model = AutoModelForSequenceClassification.from_pretrained(repo_id, revision=revision)
model.eval()

left = "mention: aspirin ; context: patient was started on aspirin for secondary prevention"
right = "candidate: Aspirin ; tuis: T121,T109"
enc = tokenizer(left, right, truncation=True, padding=True, max_length=256, return_tensors="pt")

with torch.no_grad():
    logits = model(**enc).logits.squeeze(0)
    # BASILISK CE probability
    p = torch.sigmoid(logits[1] - logits[0]).item()

print(f"ce_probability={p:.4f}")

Intended Use

Intended for:

biomedical entity-linking candidate reranking in BASILISK
high-recall candidate sets where contextual disambiguation is needed

Not intended for:

standalone medical diagnosis or clinical decision support
domains far outside biomedical literature/terminologies
direct use without candidate generation and ontology constraints

Limitations

Performance depends on candidate generator recall.
Calibration is fit on this training pipeline's dev distribution and may drift on different data.
Ambiguous mentions and rare concepts may still require ontology constraints or additional signals.

Ethics and Safety

This is a research/engineering model for NLP ranking, not a medical device. Outputs may be wrong and require human review in sensitive workflows.

Reproducibility Notes

Primary artifact sources:

artifacts/el_ce_ab_biomedbert/manifest.json
artifacts/el_ce_ab_biomedbert/metrics.json
artifacts/el_ce_ab_biomedbert/calibration.json

Downloads last month: 14

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for Bam3752/basilisk-el-ce-biomedbert-ab-v1

Base model

microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext

Finetuned

(147)

this model

Evaluation results

Test F1 (raw) on BASILISK EL held-out split
test set self-reported

0.916
Test ECE (calibrated) on BASILISK EL held-out split
test set self-reported

0.007
Test Brier (calibrated) on BASILISK EL held-out split
test set self-reported

0.052