argos-dentsight-stage2-conditions-v1

Status: research preview, NOT clinical-grade. Three of four hard per-class acceptance gates failed; the global mAP@0.5 = 0.53 number is ranking-only (the classification head's biases never moved off their focal-loss prior — see Limitations). v2 retrain at 1664×928 with cls-head LR-multiplier + focal_alpha=0.25 is in progress.

Stage-2 of the argos-dentsight two-stage transformer detector for panoramic dental X-rays (OPGs). Stage-1 (Mobe1/argos-dentsight-stage1-fdi-v1) localizes the 32 FDI tooth positions; this model detects 13 condition labels (caries, calculus, RC-treated, impacted, restoration, crown, periapical-radiolucency, root-stump, bridge, implant, tooth-bud, missing, other-finding) within the same OPG.

Model description

D-FINE-Large fine-tuned from ustc-community/dfine-large-coco for 18 epochs on Mobe1/argos-dentsight-opg-v1 (v8 OPG REPORTING corpus, 13-class condition subset). Input 1280×704 (aspect-preserving FitInBox + pad). Class-balanced WeightedRandomSampler per epoch. Focal-loss γ=2.0, α=0.75. DENTEX 2023 pretraining was deliberately skipped — see "path B" decision in docs/SESSION_HANDOFF_2026-05-03.md (license + per-class overlap).

Intended use

Research / dissertation companion to the AI The Dentist MSc dissertation (Tun Ye Minn, University of Essex).
Per-tooth condition ranking for downstream LLM grounding-caption generation, after IoU-matching to stage-1 tooth boxes.

Out of scope

Any clinical decision-making. Not validated against dentist agreement.
Standalone use without IoU-matching to stage-1 — predicted boxes do not encode tooth identity.
Surfacing caries / calculus / periapical-radiolucency outputs to a clinician without explicit "low-confidence advisory" framing — these classes failed the project's per-class acceptance gates (see below).
Billing, insurance, or treatment-recommendation pipelines.

Training data

Mobe1/argos-dentsight-opg-v1 (private), v8 of the Roboflow "OPG REPORTING" corpus normalized into a 32-FDI

13-condition unified label space. Splits: train 1549 / valid 443 / test 221. Stage-2 training filters each row to condition annotations only; tooth-FDI annotations are dropped (handled by stage-1). Class-balanced sampling: per-image weights = 1 / class_count over the labels in that image, drawn with WeightedRandomSampler(replacement=True).

Evaluation results

Best validation epoch was epoch 15 (map_50 = 0.5298); checkpoint published is the best-eval one per --load_best_model_at_end.

class	mAP@0.5:0.95	mAR@100	floor (acceptance gate)	pass?
crown	0.7437	0.8826	—	strong
implant	0.7405	0.8197	—	strong
bridge	0.6884	0.8894	—	strong
RC-treated	0.6797	0.8435	—	strong
tooth-bud	0.6159	0.8792	—	strong
impacted	0.5421	0.8970	—	strong
root-stump	0.4563	0.6972	—	strong
restoration	0.2117	0.4632	—	acceptable
missing	0.2033	0.4006	—	acceptable
other-finding	0.0934	0.5577	≥ 0.20 (soft)	FAIL
periapical-radiolucency	0.0728	0.2477	≥ 0.25	FAIL
caries	0.0372	0.1996	≥ 0.30	FAIL
calculus	0.0143	0.1209	≥ 0.10	FAIL
macro mAP@0.5	0.5298	—	≥ 0.45	pass

Acceptance-gate verdict (in plain language): The global mAP@0.5 = 0.53 clears the project's 0.45 macro-mAP gate. Three of four hard per-class floors (caries, calculus, periapical-radiolucency) and the soft other-finding floor all fail. Caries / calculus / periapical-radiolucency outputs from this v1 checkpoint should be considered advisory-only and not surfaced to a clinician without explicit "low-confidence" annotation.

Limitations (v1)

Classification head bias collapse (head not learning class-specific scores). All 13 logits in decoder.class_embed[5].bias drifted < 0.008 from the focal-loss bias prior −log(13) ≈ −2.565 over 18 epochs; enc_score_head.bias shows the same pattern. Inference on real OPGs caps at ~0.034 sigmoid score (below the 0.071 prior). The per-class mAP differences therefore reflect ordinal ranking + box-overlap quality, not calibrated class-specific confidence. v2 tracks this as the primary fix.
Class imbalance for low-frequency findings (caries, calculus, periapical-radiolucency) was only partially mitigated by class-balanced sampling; the head-collapse issue is upstream of imbalance.
Image resolution capped at 1280×704 (D-FINE stride-32 constraint; spec called for 1280×720, adjusted on 2026-05-03 commit a5dd338). Spec §4 recommends retrain at 1664×936 if recall@P=0.95 < 0.6 on caries / calculus / periapical — that retrain is in progress as v2.
Trained without DENTEX pretraining (path B; license + class-overlap).
No Tier-2 release-gate eval (recall@P=0.95, calibration ECE, left/right consistency, empty-mouth FP) — that is Phase 6.
No Tier-3 dentist-agreement evaluation has been run.

Training procedure

Hyperparameters (v1)

learning_rate: 5e-05
train_batch_size: 4 (effective 4; no gradient accumulation)
eval_batch_size: 2
seed: 42
optimizer: AdamW (fused), betas=(0.9,0.999), eps=1e-08
lr_scheduler: cosine, warmup_ratio=0.05
num_epochs: 18
focal-loss α=0.75 (inherited), γ=2.0
sampler: class-balanced WeightedRandomSampler (rare-condition oversampling)
input resolution: 1280×704 (FitInBox + pad)
HF Jobs flavor: a10g-large (~71 min wall-clock, ~$1.89)

Framework versions

Transformers 4.57.6
PyTorch 2.11.0+cu130
Datasets 4.8.5
Tokenizers 0.22.2

Citation

Companion model to the AI The Dentist MSc dissertation (Tun Ye Minn, University of Essex, MSc AI). Predecessor codebase: Raghu2411/AI_The_Dentist.

License

Inherits apache-2.0 from base model ustc-community/dfine-large-coco. Project-license decision is open per docs/specs/2026-05-02-dental-detection-v1-design.md (open item §10 row 8); confirm before any public release.

Downloads last month: 193

Safetensors

Model size

31.2M params

Tensor type

F32

Model tree for Mobe1/argos-dentsight-stage2-conditions-v1

Base model

ustc-community/dfine-large-coco

Finetuned

(1)

this model

Evaluation results

best validation mAP@0.5 (epoch 15, threshold=0.0) on argos-dentsight-opg-v1 (v8 OPG REPORTING corpus)
self-reported

0.530