argos-dentsight-stage2-conditions-v1

Status: research preview, NOT clinical-grade. Three of four hard per-class acceptance gates failed; the global mAP@0.5 = 0.53 number is ranking-only (the classification head's biases never moved off their focal-loss prior β€” see Limitations). v2 retrain at 1664Γ—928 with cls-head LR-multiplier + focal_alpha=0.25 is in progress.

Stage-2 of the argos-dentsight two-stage transformer detector for panoramic dental X-rays (OPGs). Stage-1 (Mobe1/argos-dentsight-stage1-fdi-v1) localizes the 32 FDI tooth positions; this model detects 13 condition labels (caries, calculus, RC-treated, impacted, restoration, crown, periapical-radiolucency, root-stump, bridge, implant, tooth-bud, missing, other-finding) within the same OPG.

Model description

D-FINE-Large fine-tuned from ustc-community/dfine-large-coco for 18 epochs on Mobe1/argos-dentsight-opg-v1 (v8 OPG REPORTING corpus, 13-class condition subset). Input 1280Γ—704 (aspect-preserving FitInBox + pad). Class-balanced WeightedRandomSampler per epoch. Focal-loss Ξ³=2.0, Ξ±=0.75. DENTEX 2023 pretraining was deliberately skipped β€” see "path B" decision in docs/SESSION_HANDOFF_2026-05-03.md (license + per-class overlap).

Intended use

  • Research / dissertation companion to the AI The Dentist MSc dissertation (Tun Ye Minn, University of Essex).
  • Per-tooth condition ranking for downstream LLM grounding-caption generation, after IoU-matching to stage-1 tooth boxes.

Out of scope

  • Any clinical decision-making. Not validated against dentist agreement.
  • Standalone use without IoU-matching to stage-1 β€” predicted boxes do not encode tooth identity.
  • Surfacing caries / calculus / periapical-radiolucency outputs to a clinician without explicit "low-confidence advisory" framing β€” these classes failed the project's per-class acceptance gates (see below).
  • Billing, insurance, or treatment-recommendation pipelines.

Training data

Mobe1/argos-dentsight-opg-v1 (private), v8 of the Roboflow "OPG REPORTING" corpus normalized into a 32-FDI

  • 13-condition unified label space. Splits: train 1549 / valid 443 / test 221. Stage-2 training filters each row to condition annotations only; tooth-FDI annotations are dropped (handled by stage-1). Class-balanced sampling: per-image weights = 1 / class_count over the labels in that image, drawn with WeightedRandomSampler(replacement=True).

Evaluation results

Best validation epoch was epoch 15 (map_50 = 0.5298); checkpoint published is the best-eval one per --load_best_model_at_end.

class mAP@0.5:0.95 mAR@100 floor (acceptance gate) pass?
crown 0.7437 0.8826 β€” strong
implant 0.7405 0.8197 β€” strong
bridge 0.6884 0.8894 β€” strong
RC-treated 0.6797 0.8435 β€” strong
tooth-bud 0.6159 0.8792 β€” strong
impacted 0.5421 0.8970 β€” strong
root-stump 0.4563 0.6972 β€” strong
restoration 0.2117 0.4632 β€” acceptable
missing 0.2033 0.4006 β€” acceptable
other-finding 0.0934 0.5577 β‰₯ 0.20 (soft) FAIL
periapical-radiolucency 0.0728 0.2477 β‰₯ 0.25 FAIL
caries 0.0372 0.1996 β‰₯ 0.30 FAIL
calculus 0.0143 0.1209 β‰₯ 0.10 FAIL
macro mAP@0.5 0.5298 β€” β‰₯ 0.45 pass

Acceptance-gate verdict (in plain language): The global mAP@0.5 = 0.53 clears the project's 0.45 macro-mAP gate. Three of four hard per-class floors (caries, calculus, periapical-radiolucency) and the soft other-finding floor all fail. Caries / calculus / periapical-radiolucency outputs from this v1 checkpoint should be considered advisory-only and not surfaced to a clinician without explicit "low-confidence" annotation.

Limitations (v1)

  • Classification head bias collapse (head not learning class-specific scores). All 13 logits in decoder.class_embed[5].bias drifted < 0.008 from the focal-loss bias prior βˆ’log(13) β‰ˆ βˆ’2.565 over 18 epochs; enc_score_head.bias shows the same pattern. Inference on real OPGs caps at ~0.034 sigmoid score (below the 0.071 prior). The per-class mAP differences therefore reflect ordinal ranking + box-overlap quality, not calibrated class-specific confidence. v2 tracks this as the primary fix.
  • Class imbalance for low-frequency findings (caries, calculus, periapical-radiolucency) was only partially mitigated by class-balanced sampling; the head-collapse issue is upstream of imbalance.
  • Image resolution capped at 1280Γ—704 (D-FINE stride-32 constraint; spec called for 1280Γ—720, adjusted on 2026-05-03 commit a5dd338). Spec Β§4 recommends retrain at 1664Γ—936 if recall@P=0.95 < 0.6 on caries / calculus / periapical β€” that retrain is in progress as v2.
  • Trained without DENTEX pretraining (path B; license + class-overlap).
  • No Tier-2 release-gate eval (recall@P=0.95, calibration ECE, left/right consistency, empty-mouth FP) β€” that is Phase 6.
  • No Tier-3 dentist-agreement evaluation has been run.

Training procedure

Hyperparameters (v1)

  • learning_rate: 5e-05
  • train_batch_size: 4 (effective 4; no gradient accumulation)
  • eval_batch_size: 2
  • seed: 42
  • optimizer: AdamW (fused), betas=(0.9,0.999), eps=1e-08
  • lr_scheduler: cosine, warmup_ratio=0.05
  • num_epochs: 18
  • focal-loss Ξ±=0.75 (inherited), Ξ³=2.0
  • sampler: class-balanced WeightedRandomSampler (rare-condition oversampling)
  • input resolution: 1280Γ—704 (FitInBox + pad)
  • HF Jobs flavor: a10g-large (~71 min wall-clock, ~$1.89)

Framework versions

  • Transformers 4.57.6
  • PyTorch 2.11.0+cu130
  • Datasets 4.8.5
  • Tokenizers 0.22.2

Citation

Companion model to the AI The Dentist MSc dissertation (Tun Ye Minn, University of Essex, MSc AI). Predecessor codebase: Raghu2411/AI_The_Dentist.

License

Inherits apache-2.0 from base model ustc-community/dfine-large-coco. Project-license decision is open per docs/specs/2026-05-02-dental-detection-v1-design.md (open item Β§10 row 8); confirm before any public release.

Downloads last month
193
Safetensors
Model size
31.2M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Mobe1/argos-dentsight-stage2-conditions-v1

Finetuned
(1)
this model

Evaluation results

  • best validation mAP@0.5 (epoch 15, threshold=0.0) on argos-dentsight-opg-v1 (v8 OPG REPORTING corpus)
    self-reported
    0.530