Model Card: Nematostella Rosette Detector
Model Description
- Model type: Attention U-Net (Oktay et al. 2018)
- Task: Semantic segmentation — pixel-wise detection of epithelial rosette structures in Nematostella vectensis confocal microscopy images
- Input: 2-channel binary boundary representation (512×512×2): thin inner cell boundary lines + morphologically thickened cell boundaries. No fluorescence intensity used.
- Output: Pixel-wise probability map (0–1) of rosette likelihood
- Framework: PyTorch
- License: MIT
Intended Use
- Primary use: Generating candidate rosette proposals for expert-reviewed human-in-the-loop annotation in napari
- Out-of-scope: Direct automated quantification without expert review; application to other organisms, tissue types, or imaging modalities without retraining
Training Data
- 214 confocal microscopy images of Nematostella vectensis juvenile epidermis
- Acquired on Olympus IX83 FV3000, 60× silicone objective, 1024×1024 px, 0.134 µm/px
- Ground truth: manually annotated rosette instance masks (napari), minimum 5 cells sharing a common central axis or coalescing around an extruding cell
- Will be deposited on Zenodo upon publication
Evaluation
Evaluated on held-out validation set (54 images, 269 rosette instances, 20% of total dataset):
| Metric |
Value |
| Pixel-level Dice |
0.51 |
| Pixel-level F1 |
0.61 |
| Pixel-level Recall |
0.64 |
| Event-level Recall (≥1px, threshold 0.5) |
88.8% (239/269) |
| Rosettes with ≥10% pixel coverage at threshold 0.4 |
83.3% (224/269) |
| Rosettes with >80% pixel coverage (threshold 0.5) |
50.2% (135/269) |
| Rosettes with >40% pixel coverage (threshold 0.5) |
67.7% (182/269) |
| Completely missed (no heatmap signal) |
11.2% (30/269) |
Note: Pixel-level recall (0.64) reflects boundary imprecision in detected rosettes, not missed detection events. Event-level recall (88.8%) is the operationally relevant metric for the human-in-the-loop workflow. Inference uses 512×512 sliding window with 128px overlap.
Architecture
- 4-level encoder-decoder (U-Net)
- Additive attention gates at 3 upsampling junctions
- Feature maps: 64 → 128 → 256 → 512 → 1024 (bottleneck)
- Final layer: 1×1 convolution + Sigmoid
Training Configuration
| Parameter |
Value |
| Loss |
0.5× BCE + 0.5× Dice |
| Optimizer |
AdamW |
| Learning rate |
1×10⁻⁴ |
| Batch size |
4 |
| Early stopping |
Patience 15 epochs (val loss) |
| Input patch size |
512×512 |
Data Augmentation
Random rotation (p=0.5), elastic deformation (α=120, σ=6, p=0.4), affine transforms (p=0.6), coarse dropout (p=0.3) via Albumentations.
Limitations
- Trained exclusively on a single laboratory's images (single instrument, single organism, single staining protocol)
- Generalisation to other imaging setups not evaluated
- 11.2% of rosette events receive no predicted pixels at threshold 0.5 — expert full-image review of the full image is required
- Validation set was also used for early stopping (standard practice); the model was never trained on validation images
Hardware
Apple MacBook Pro M2 Max (64 GB unified memory), PyTorch MPS backend. Training: a few hours. Inference: <1 min/image.
References
Oktay, O. et al. (2018). Attention U-Net: Learning Where to Look for the Pancreas. arXiv:1804.03999.