โ ๏ธ IMPORTANT WARNING โ Model Effectiveness
These bootstrap models are part of a degeneracy chain. The ICI-DC bootstrap S1 was trained on synthetic data generated by a model (S2-coeff1.5) that itself had underfit during training. The bootstrap S2 was then fine-tuned on that degraded S1. Each generation of fine-tuning further degraded the base model's innate mutation discrimination capability (base Omni-DNA-20M achieves 0.951 AUC raw DNA; these models achieve ~0.30 AUC).
The SAD coefficient reported for the bootstrap S2 (~12.03 LR-adjusted) is a mathematical artifact of the training configuration, not an indicator of genuine training convergence.
These models are preserved for historical and reproducibility purposes only.
Omni-DNA SAD Bootstrap Checkpoint
Omni-DNA-20M fine-tuned via ICI-DC โ SAD pipeline using bootstrap synthetic data.
Training Details
- Base model: Nhoodie/omni-dna-ici-dc-bootstrap (ICI-DC pre-trained on bootstrap synthetic data)
- Training data: 3,317 real mutation pairs (SAD attenuation)
- Best checkpoint: Epoch 3.37, eval loss 0.308
- Hyperparameters: LR=1e-5, epochs=5, batch_size=16, grad_accum=2, cosine schedule
- LR-adjusted SAD coefficient: 12.03
Benchmark (826 test pairs)
| Axis | Metric | Score |
|---|---|---|
| Mutation Detection | F1 | 0.660 |
| Embedding Distance | Seq AUC | 0.358 |
| Masked Prediction | Surprise ฮ | โ1.79 |
| Discriminative | AUC | 0.304 |
Related Models
- Nhoodie/omni-dna-ici-dc-bootstrap โ ICI-DC pre-training checkpoint (before SAD)
- Nhoodie/omni-dna-sad-mutation โ Original SAD checkpoint (coeff 4.89)
- Downloads last month
- 40
Model tree for Nhoodie/omni-dna-sad-mutation-bootstrap
Base model
zehui127/Omni-DNA-20M