BASILISK EL Cross-Encoder (BiomedBERT AB) v1

This model is a biomedical entity-linking (EL) cross-encoder used by BASILISK to rerank UMLS concept candidates for a mention in context.

It is fine-tuned from:

  • microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext
  • Base revision: e1354b7a3a09615f6aba48dfad4b7a613eef7062

Published model repo:

  • Bam3752/basilisk-el-ce-biomedbert-ab-v1

Pinned release revision:

  • 9b11786be83352196058af654d255e8441a75356

Task

Binary classification over a (mention/context, candidate concept) pair:

  • label 1: candidate concept is correct for the mention in context
  • label 0: candidate concept is incorrect

In BASILISK runtime, candidate ranking score is:

  • score = sigmoid(logit_1 - logit_0)

Input Serialization

The model is trained and used with paired text:

  • Left sequence: mention: <mention> ; context: <local_context>
  • Right sequence: candidate: <concept_name> ; tuis: <TUI list>

This format should be kept consistent between training and inference.

Training Data and Splits

Data was generated by the BASILISK EL training-set pipeline.

Split sizes:

  • Train: 281,902
  • Dev: 35,420
  • Test: 35,100

Class balance:

  • Train positives: 124,487, negatives: 157,415 (neg_per_pos=1.2645)
  • Dev positives: 15,608, negatives: 19,812 (neg_per_pos=1.2693)
  • Test positives: 15,405, negatives: 19,695 (neg_per_pos=1.2785)

Training Configuration

Key settings:

  • Profile: quality_max
  • Objective (this run): binary cross-entropy
  • Epochs: 4
  • Batch size: 24
  • Gradient accumulation: 4 (effective batch 96)
  • Max length: 256
  • Learning rate: 2e-5 with cosine schedule
  • Warmup ratio: 0.1
  • Weight decay: 0.01
  • Seed: 13
  • Device: mps

Model selection:

  • Metric: dev_f1
  • Best checkpoint step: 5500
  • Best selected dev F1: 0.9174

Calibration

A calibration artifact is provided at:

  • calibration/biomedbert_ab.json

Calibration method:

  • global temperature scaling + bucket offsets
  • temperature: 1.25

Dev calibration effect:

  • Raw ECE: 0.017000 -> Calibrated ECE: 0.003967
  • Raw Brier: 0.052839 -> Calibrated Brier: 0.051863

Evaluation Summary

Raw metrics:

Split Accuracy Precision Recall F1 ECE Brier
Dev 0.9292 0.9446 0.8918 0.9174 0.017000 0.052839
Test 0.9282 0.9456 0.8875 0.9156 0.017083 0.053027

Calibrated metrics:

Split Accuracy Precision Recall F1 ECE Brier
Dev (calibrated) 0.9296 0.9455 0.8916 0.9178 0.003967 0.051863
Test (calibrated) 0.9282 0.9458 0.8874 0.9156 0.006504 0.052334

Usage (Transformers)

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

repo_id = "Bam3752/basilisk-el-ce-biomedbert-ab-v1"
revision = "9b11786be83352196058af654d255e8441a75356"

tokenizer = AutoTokenizer.from_pretrained(repo_id, revision=revision)
model = AutoModelForSequenceClassification.from_pretrained(repo_id, revision=revision)
model.eval()

left = "mention: aspirin ; context: patient was started on aspirin for secondary prevention"
right = "candidate: Aspirin ; tuis: T121,T109"
enc = tokenizer(left, right, truncation=True, padding=True, max_length=256, return_tensors="pt")

with torch.no_grad():
    logits = model(**enc).logits.squeeze(0)
    # BASILISK CE probability
    p = torch.sigmoid(logits[1] - logits[0]).item()

print(f"ce_probability={p:.4f}")

Intended Use

Intended for:

  • biomedical entity-linking candidate reranking in BASILISK
  • high-recall candidate sets where contextual disambiguation is needed

Not intended for:

  • standalone medical diagnosis or clinical decision support
  • domains far outside biomedical literature/terminologies
  • direct use without candidate generation and ontology constraints

Limitations

  • Performance depends on candidate generator recall.
  • Calibration is fit on this training pipeline's dev distribution and may drift on different data.
  • Ambiguous mentions and rare concepts may still require ontology constraints or additional signals.

Ethics and Safety

This is a research/engineering model for NLP ranking, not a medical device. Outputs may be wrong and require human review in sensitive workflows.

Reproducibility Notes

Primary artifact sources:

  • artifacts/el_ce_ab_biomedbert/manifest.json
  • artifacts/el_ce_ab_biomedbert/metrics.json
  • artifacts/el_ce_ab_biomedbert/calibration.json
Downloads last month
14
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Bam3752/basilisk-el-ce-biomedbert-ab-v1

Evaluation results