argos-dentsight-stage2-conditions-v1
Status: research preview, NOT clinical-grade. Three of four hard per-class acceptance gates failed; the global mAP@0.5 = 0.53 number is ranking-only (the classification head's biases never moved off their focal-loss prior β see Limitations). v2 retrain at 1664Γ928 with cls-head LR-multiplier + focal_alpha=0.25 is in progress.
Stage-2 of the argos-dentsight two-stage transformer detector for panoramic
dental X-rays (OPGs). Stage-1 (Mobe1/argos-dentsight-stage1-fdi-v1) localizes
the 32 FDI tooth positions; this model detects 13 condition labels (caries,
calculus, RC-treated, impacted, restoration, crown, periapical-radiolucency,
root-stump, bridge, implant, tooth-bud, missing, other-finding) within the
same OPG.
Model description
D-FINE-Large fine-tuned from ustc-community/dfine-large-coco for 18 epochs
on Mobe1/argos-dentsight-opg-v1 (v8 OPG REPORTING corpus, 13-class condition
subset). Input 1280Γ704 (aspect-preserving FitInBox + pad). Class-balanced
WeightedRandomSampler per epoch. Focal-loss Ξ³=2.0, Ξ±=0.75. DENTEX 2023
pretraining was deliberately skipped β see "path B" decision in
docs/SESSION_HANDOFF_2026-05-03.md (license + per-class overlap).
Intended use
- Research / dissertation companion to the AI The Dentist MSc dissertation (Tun Ye Minn, University of Essex).
- Per-tooth condition ranking for downstream LLM grounding-caption generation, after IoU-matching to stage-1 tooth boxes.
Out of scope
- Any clinical decision-making. Not validated against dentist agreement.
- Standalone use without IoU-matching to stage-1 β predicted boxes do not encode tooth identity.
- Surfacing caries / calculus / periapical-radiolucency outputs to a clinician without explicit "low-confidence advisory" framing β these classes failed the project's per-class acceptance gates (see below).
- Billing, insurance, or treatment-recommendation pipelines.
Training data
Mobe1/argos-dentsight-opg-v1
(private), v8 of the Roboflow "OPG REPORTING" corpus normalized into a 32-FDI
- 13-condition unified label space. Splits: train 1549 / valid 443 / test 221.
Stage-2 training filters each row to condition annotations only; tooth-FDI
annotations are dropped (handled by stage-1). Class-balanced sampling: per-image
weights =
1 / class_countover the labels in that image, drawn withWeightedRandomSampler(replacement=True).
Evaluation results
Best validation epoch was epoch 15 (map_50 = 0.5298); checkpoint
published is the best-eval one per --load_best_model_at_end.
| class | mAP@0.5:0.95 | mAR@100 | floor (acceptance gate) | pass? |
|---|---|---|---|---|
| crown | 0.7437 | 0.8826 | β | strong |
| implant | 0.7405 | 0.8197 | β | strong |
| bridge | 0.6884 | 0.8894 | β | strong |
| RC-treated | 0.6797 | 0.8435 | β | strong |
| tooth-bud | 0.6159 | 0.8792 | β | strong |
| impacted | 0.5421 | 0.8970 | β | strong |
| root-stump | 0.4563 | 0.6972 | β | strong |
| restoration | 0.2117 | 0.4632 | β | acceptable |
| missing | 0.2033 | 0.4006 | β | acceptable |
| other-finding | 0.0934 | 0.5577 | β₯ 0.20 (soft) | FAIL |
| periapical-radiolucency | 0.0728 | 0.2477 | β₯ 0.25 | FAIL |
| caries | 0.0372 | 0.1996 | β₯ 0.30 | FAIL |
| calculus | 0.0143 | 0.1209 | β₯ 0.10 | FAIL |
| macro mAP@0.5 | 0.5298 | β | β₯ 0.45 | pass |
Acceptance-gate verdict (in plain language): The global mAP@0.5 = 0.53 clears the project's 0.45 macro-mAP gate. Three of four hard per-class floors (caries, calculus, periapical-radiolucency) and the soft other-finding floor all fail. Caries / calculus / periapical-radiolucency outputs from this v1 checkpoint should be considered advisory-only and not surfaced to a clinician without explicit "low-confidence" annotation.
Limitations (v1)
- Classification head bias collapse (head not learning class-specific
scores). All 13 logits in
decoder.class_embed[5].biasdrifted < 0.008 from the focal-loss bias priorβlog(13) β β2.565over 18 epochs;enc_score_head.biasshows the same pattern. Inference on real OPGs caps at ~0.034 sigmoid score (below the 0.071 prior). The per-class mAP differences therefore reflect ordinal ranking + box-overlap quality, not calibrated class-specific confidence. v2 tracks this as the primary fix. - Class imbalance for low-frequency findings (caries, calculus, periapical-radiolucency) was only partially mitigated by class-balanced sampling; the head-collapse issue is upstream of imbalance.
- Image resolution capped at 1280Γ704 (D-FINE stride-32 constraint; spec
called for 1280Γ720, adjusted on 2026-05-03 commit
a5dd338). Spec Β§4 recommends retrain at 1664Γ936 if recall@P=0.95 < 0.6 on caries / calculus / periapical β that retrain is in progress as v2. - Trained without DENTEX pretraining (path B; license + class-overlap).
- No Tier-2 release-gate eval (recall@P=0.95, calibration ECE, left/right consistency, empty-mouth FP) β that is Phase 6.
- No Tier-3 dentist-agreement evaluation has been run.
Training procedure
Hyperparameters (v1)
- learning_rate: 5e-05
- train_batch_size: 4 (effective 4; no gradient accumulation)
- eval_batch_size: 2
- seed: 42
- optimizer: AdamW (fused), betas=(0.9,0.999), eps=1e-08
- lr_scheduler: cosine, warmup_ratio=0.05
- num_epochs: 18
- focal-loss Ξ±=0.75 (inherited), Ξ³=2.0
- sampler: class-balanced WeightedRandomSampler (rare-condition oversampling)
- input resolution: 1280Γ704 (FitInBox + pad)
- HF Jobs flavor: a10g-large (~71 min wall-clock, ~$1.89)
Framework versions
- Transformers 4.57.6
- PyTorch 2.11.0+cu130
- Datasets 4.8.5
- Tokenizers 0.22.2
Citation
Companion model to the AI The Dentist MSc dissertation
(Tun Ye Minn, University of Essex, MSc AI). Predecessor codebase:
Raghu2411/AI_The_Dentist.
License
Inherits apache-2.0 from base model ustc-community/dfine-large-coco.
Project-license decision is open per
docs/specs/2026-05-02-dental-detection-v1-design.md (open item Β§10 row 8);
confirm before any public release.
- Downloads last month
- 193
Model tree for Mobe1/argos-dentsight-stage2-conditions-v1
Base model
ustc-community/dfine-large-cocoEvaluation results
- best validation mAP@0.5 (epoch 15, threshold=0.0) on argos-dentsight-opg-v1 (v8 OPG REPORTING corpus)self-reported0.530