โš ๏ธ IMPORTANT WARNING โ€” Model Effectiveness

These bootstrap models are part of a degeneracy chain. The ICI-DC bootstrap S1 was trained on synthetic data generated by a model (S2-coeff1.5) that itself had underfit during training. The bootstrap S2 was then fine-tuned on that degraded S1. Each generation of fine-tuning further degraded the base model's innate mutation discrimination capability (base Omni-DNA-20M achieves 0.951 AUC raw DNA; these models achieve ~0.30 AUC).

The SAD coefficient reported for the bootstrap S2 (~12.03 LR-adjusted) is a mathematical artifact of the training configuration, not an indicator of genuine training convergence.

These models are preserved for historical and reproducibility purposes only.


Omni-DNA SAD Bootstrap Checkpoint

Omni-DNA-20M fine-tuned via ICI-DC โ†’ SAD pipeline using bootstrap synthetic data.

Training Details

  • Base model: Nhoodie/omni-dna-ici-dc-bootstrap (ICI-DC pre-trained on bootstrap synthetic data)
  • Training data: 3,317 real mutation pairs (SAD attenuation)
  • Best checkpoint: Epoch 3.37, eval loss 0.308
  • Hyperparameters: LR=1e-5, epochs=5, batch_size=16, grad_accum=2, cosine schedule
  • LR-adjusted SAD coefficient: 12.03

Benchmark (826 test pairs)

Axis Metric Score
Mutation Detection F1 0.660
Embedding Distance Seq AUC 0.358
Masked Prediction Surprise ฮ” โˆ’1.79
Discriminative AUC 0.304

Related Models

Downloads last month
40
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Nhoodie/omni-dna-sad-mutation-bootstrap

Finetuned
(4)
this model