BASILISK EL Cross-Encoder (BiomedBERT AB) v1
This model is a biomedical entity-linking (EL) cross-encoder used by BASILISK to rerank UMLS concept candidates for a mention in context.
It is fine-tuned from:
microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext- Base revision:
e1354b7a3a09615f6aba48dfad4b7a613eef7062
Published model repo:
Bam3752/basilisk-el-ce-biomedbert-ab-v1
Pinned release revision:
9b11786be83352196058af654d255e8441a75356
Task
Binary classification over a (mention/context, candidate concept) pair:
- label
1: candidate concept is correct for the mention in context - label
0: candidate concept is incorrect
In BASILISK runtime, candidate ranking score is:
score = sigmoid(logit_1 - logit_0)
Input Serialization
The model is trained and used with paired text:
- Left sequence:
mention: <mention> ; context: <local_context> - Right sequence:
candidate: <concept_name> ; tuis: <TUI list>
This format should be kept consistent between training and inference.
Training Data and Splits
Data was generated by the BASILISK EL training-set pipeline.
Split sizes:
- Train:
281,902 - Dev:
35,420 - Test:
35,100
Class balance:
- Train positives:
124,487, negatives:157,415(neg_per_pos=1.2645) - Dev positives:
15,608, negatives:19,812(neg_per_pos=1.2693) - Test positives:
15,405, negatives:19,695(neg_per_pos=1.2785)
Training Configuration
Key settings:
- Profile:
quality_max - Objective (this run): binary cross-entropy
- Epochs:
4 - Batch size:
24 - Gradient accumulation:
4(effective batch96) - Max length:
256 - Learning rate:
2e-5with cosine schedule - Warmup ratio:
0.1 - Weight decay:
0.01 - Seed:
13 - Device:
mps
Model selection:
- Metric:
dev_f1 - Best checkpoint step:
5500 - Best selected dev F1:
0.9174
Calibration
A calibration artifact is provided at:
calibration/biomedbert_ab.json
Calibration method:
- global temperature scaling + bucket offsets
- temperature:
1.25
Dev calibration effect:
- Raw ECE:
0.017000-> Calibrated ECE:0.003967 - Raw Brier:
0.052839-> Calibrated Brier:0.051863
Evaluation Summary
Raw metrics:
| Split | Accuracy | Precision | Recall | F1 | ECE | Brier |
|---|---|---|---|---|---|---|
| Dev | 0.9292 | 0.9446 | 0.8918 | 0.9174 | 0.017000 | 0.052839 |
| Test | 0.9282 | 0.9456 | 0.8875 | 0.9156 | 0.017083 | 0.053027 |
Calibrated metrics:
| Split | Accuracy | Precision | Recall | F1 | ECE | Brier |
|---|---|---|---|---|---|---|
| Dev (calibrated) | 0.9296 | 0.9455 | 0.8916 | 0.9178 | 0.003967 | 0.051863 |
| Test (calibrated) | 0.9282 | 0.9458 | 0.8874 | 0.9156 | 0.006504 | 0.052334 |
Usage (Transformers)
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
repo_id = "Bam3752/basilisk-el-ce-biomedbert-ab-v1"
revision = "9b11786be83352196058af654d255e8441a75356"
tokenizer = AutoTokenizer.from_pretrained(repo_id, revision=revision)
model = AutoModelForSequenceClassification.from_pretrained(repo_id, revision=revision)
model.eval()
left = "mention: aspirin ; context: patient was started on aspirin for secondary prevention"
right = "candidate: Aspirin ; tuis: T121,T109"
enc = tokenizer(left, right, truncation=True, padding=True, max_length=256, return_tensors="pt")
with torch.no_grad():
logits = model(**enc).logits.squeeze(0)
# BASILISK CE probability
p = torch.sigmoid(logits[1] - logits[0]).item()
print(f"ce_probability={p:.4f}")
Intended Use
Intended for:
- biomedical entity-linking candidate reranking in BASILISK
- high-recall candidate sets where contextual disambiguation is needed
Not intended for:
- standalone medical diagnosis or clinical decision support
- domains far outside biomedical literature/terminologies
- direct use without candidate generation and ontology constraints
Limitations
- Performance depends on candidate generator recall.
- Calibration is fit on this training pipeline's dev distribution and may drift on different data.
- Ambiguous mentions and rare concepts may still require ontology constraints or additional signals.
Ethics and Safety
This is a research/engineering model for NLP ranking, not a medical device. Outputs may be wrong and require human review in sensitive workflows.
Reproducibility Notes
Primary artifact sources:
artifacts/el_ce_ab_biomedbert/manifest.jsonartifacts/el_ce_ab_biomedbert/metrics.jsonartifacts/el_ce_ab_biomedbert/calibration.json
- Downloads last month
- 14
Model tree for Bam3752/basilisk-el-ce-biomedbert-ab-v1
Evaluation results
- Test F1 (raw) on BASILISK EL held-out splittest set self-reported0.916
- Test ECE (calibrated) on BASILISK EL held-out splittest set self-reported0.007
- Test Brier (calibrated) on BASILISK EL held-out splittest set self-reported0.052