Joblib
yinuozhang commited on
Commit
e719470
·
verified ·
1 Parent(s): 96bdf50

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -10
README.md CHANGED
@@ -575,23 +575,23 @@ Predicts peptide-protein binding affinity. Requires both peptide and target prot
575
 
576
  **Interpretation:**<br>
577
 
578
- - Scores ≥ 9 correspond to tight binders (K ≤ 10⁻⁹ M, nanomolar to picomolar range)<br>
579
- - Scores between 7 and 9 correspond to medium binders (10⁻⁷–10⁻⁹ M, nanomolar to micromolar range)<br>
580
- - Scores < 7 correspond to weak binders (K ≥ 10⁻⁶ M, micromolar and weaker)<br>
581
- - A difference of 1 unit in score corresponds to an approximately tenfold change in binding affinity.<br>
582
 
583
  ---
584
 
585
  ### Uncertainty Interpretation <br>
586
  #### Entropy (classifiers)<br>
587
 
588
- Binary predictive entropy of the output probability $\bar{p}$:<br>
589
 
590
  $$\mathcal{H} = -\bar{p}\log\bar{p} - (1 - \bar{p})\log(1 - \bar{p})$$<br>
591
 
592
- - For **DNN classifiers**: $\bar{p}$ is the mean probability across 5 independently seeded models (deep ensemble). High entropy reflects both epistemic uncertainty (seed disagreement) and aleatoric uncertainty (collectively diffuse predictions).<br>
593
 
594
- - For **XGBoost / SVM / ElasticNet classifiers**: $\bar{p}$ is the single model's output probability (or sigmoid of decision function for ElasticNet). Entropy reflects output confidence of a single model only.<br>
595
 
596
  | Range | Interpretation |
597
  |---|---|
@@ -607,15 +607,20 @@ $$\mathcal{H} = -\bar{p}\log\bar{p} - (1 - \bar{p})\log(1 - \bar{p})$$<br>
607
 
608
  Returned as a tuple `(lo, hi)` with 90% marginal coverage guarantee.<br>
609
 
610
- We implement the **residual normalised conformity score** following [Lei et al. (2018)](https://doi.org/10.1080/01621459.2017.1307116) and [Cordier et al. (2023) / MAPIE](https://proceedings.mlr.press/v204/cordier23a.html). An auxiliary XGBoost model $\hat{\sigma}(\mathbf{x})$ is trained on held-out embeddings and absolute residuals $|y_i - \hat{y}_i|$. At inference:<br>
611
 
612
  $$[\hat{y}(\mathbf{x}) - q \cdot \hat{\sigma}(\mathbf{x}),\ \hat{y}(\mathbf{x}) + q \cdot \hat{\sigma}(\mathbf{x})]$$
 
613
 
614
- where $q$ is the $\lceil(n+1)(1-\alpha)\rceil / n$ quantile of the normalized scores $s_i = |y_i - \hat{y}_i| / \hat{\sigma}(\mathbf{x}_i)$.<br>
 
615
 
616
  - **Interval width varies per input** -- molecules more dissimilar to training data tend to receive wider intervals<br>
617
- - **Coverage guarantee**: on exchangeable data, $P(y \in [\hat{y} - q\hat{\sigma},\ \hat{y} + q\hat{\sigma}]) \geq 0.90$<br>
 
 
618
  - **The guarantee is marginal**, not conditional, as an unusually narrow interval on an out-of-distribution molecule does not guarantee correctness<br>
 
619
  - **Full access**: We already computed MAPIE for all regression models; users are allowed to directly use them for customized model lists.<br>
620
 
621
  ---
 
575
 
576
  **Interpretation:**<br>
577
 
578
+ - Scores ≥ 9 correspond to tight binders (K ≤ 10⁻⁹ M, nanomolar to picomolar range)<br>
579
+ - Scores between 7 and 9 correspond to medium binders (10⁻⁷–10⁻⁹ M, nanomolar to micromolar range)<br>
580
+ - Scores < 7 correspond to weak binders (K ≥ 10⁻⁶ M, micromolar and weaker)<br>
581
+ - A difference of 1 unit in score corresponds to an approximately tenfold change in binding affinity.<br>
582
 
583
  ---
584
 
585
  ### Uncertainty Interpretation <br>
586
  #### Entropy (classifiers)<br>
587
 
588
+ Binary predictive entropy of the output probability p̄:<br>
589
 
590
  $$\mathcal{H} = -\bar{p}\log\bar{p} - (1 - \bar{p})\log(1 - \bar{p})$$<br>
591
 
592
+ - For **DNN classifiers**: p̄ is the mean probability across 5 independently seeded models (deep ensemble). High entropy reflects both epistemic uncertainty (seed disagreement) and aleatoric uncertainty (collectively diffuse predictions).<br>
593
 
594
+ - For **XGBoost / SVM / ElasticNet classifiers**: p̄ is the single model's output probability (or sigmoid of decision function for ElasticNet). Entropy reflects output confidence of a single model only.<br>
595
 
596
  | Range | Interpretation |
597
  |---|---|
 
607
 
608
  Returned as a tuple `(lo, hi)` with 90% marginal coverage guarantee.<br>
609
 
610
+ We implement the **residual normalised conformity score** following [Lei et al. (2018)](https://doi.org/10.1080/01621459.2017.1307116) and [Cordier et al. (2023) / MAPIE](https://proceedings.mlr.press/v204/cordier23a.html). An auxiliary XGBoost model $\hat{\sigma}(\mathbf{x})$ is trained on held-out embeddings and absolute residuals |yᵢ ŷᵢ|. At inference:<br>
611
 
612
  $$[\hat{y}(\mathbf{x}) - q \cdot \hat{\sigma}(\mathbf{x}),\ \hat{y}(\mathbf{x}) + q \cdot \hat{\sigma}(\mathbf{x})]$$
613
+
614
 
615
+ where q is the (n+1)(1−α) / n quantile of the normalized scores sᵢ = |yᵢ ŷᵢ| / σ̂(x).
616
+
617
 
618
  - **Interval width varies per input** -- molecules more dissimilar to training data tend to receive wider intervals<br>
619
+
620
+ - **Coverage guarantee**: on exchangeable data, P(y ∈ [ŷ − qσ̂, ŷ + qσ̂]) ≥ 0.90<br>
621
+
622
  - **The guarantee is marginal**, not conditional, as an unusually narrow interval on an out-of-distribution molecule does not guarantee correctness<br>
623
+
624
  - **Full access**: We already computed MAPIE for all regression models; users are allowed to directly use them for customized model lists.<br>
625
 
626
  ---