Update README.md
Browse files
README.md
CHANGED
|
@@ -575,23 +575,23 @@ Predicts peptide-protein binding affinity. Requires both peptide and target prot
|
|
| 575 |
|
| 576 |
**Interpretation:**<br>
|
| 577 |
|
| 578 |
-
|
| 579 |
-
|
| 580 |
-
|
| 581 |
-
|
| 582 |
|
| 583 |
---
|
| 584 |
|
| 585 |
### Uncertainty Interpretation <br>
|
| 586 |
#### Entropy (classifiers)<br>
|
| 587 |
|
| 588 |
-
Binary predictive entropy of the output probability
|
| 589 |
|
| 590 |
$$\mathcal{H} = -\bar{p}\log\bar{p} - (1 - \bar{p})\log(1 - \bar{p})$$<br>
|
| 591 |
|
| 592 |
-
- For **DNN classifiers**:
|
| 593 |
|
| 594 |
-
- For **XGBoost / SVM / ElasticNet classifiers**:
|
| 595 |
|
| 596 |
| Range | Interpretation |
|
| 597 |
|---|---|
|
|
@@ -607,15 +607,20 @@ $$\mathcal{H} = -\bar{p}\log\bar{p} - (1 - \bar{p})\log(1 - \bar{p})$$<br>
|
|
| 607 |
|
| 608 |
Returned as a tuple `(lo, hi)` with 90% marginal coverage guarantee.<br>
|
| 609 |
|
| 610 |
-
We implement the **residual normalised conformity score** following [Lei et al. (2018)](https://doi.org/10.1080/01621459.2017.1307116) and [Cordier et al. (2023) / MAPIE](https://proceedings.mlr.press/v204/cordier23a.html). An auxiliary XGBoost model $\hat{\sigma}(\mathbf{x})$ is trained on held-out embeddings and absolute residuals
|
| 611 |
|
| 612 |
$$[\hat{y}(\mathbf{x}) - q \cdot \hat{\sigma}(\mathbf{x}),\ \hat{y}(\mathbf{x}) + q \cdot \hat{\sigma}(\mathbf{x})]$$
|
|
|
|
| 613 |
|
| 614 |
-
where
|
|
|
|
| 615 |
|
| 616 |
- **Interval width varies per input** -- molecules more dissimilar to training data tend to receive wider intervals<br>
|
| 617 |
-
|
|
|
|
|
|
|
| 618 |
- **The guarantee is marginal**, not conditional, as an unusually narrow interval on an out-of-distribution molecule does not guarantee correctness<br>
|
|
|
|
| 619 |
- **Full access**: We already computed MAPIE for all regression models; users are allowed to directly use them for customized model lists.<br>
|
| 620 |
|
| 621 |
---
|
|
|
|
| 575 |
|
| 576 |
**Interpretation:**<br>
|
| 577 |
|
| 578 |
+
- Scores ≥ 9 correspond to tight binders (K ≤ 10⁻⁹ M, nanomolar to picomolar range)<br>
|
| 579 |
+
- Scores between 7 and 9 correspond to medium binders (10⁻⁷–10⁻⁹ M, nanomolar to micromolar range)<br>
|
| 580 |
+
- Scores < 7 correspond to weak binders (K ≥ 10⁻⁶ M, micromolar and weaker)<br>
|
| 581 |
+
- A difference of 1 unit in score corresponds to an approximately tenfold change in binding affinity.<br>
|
| 582 |
|
| 583 |
---
|
| 584 |
|
| 585 |
### Uncertainty Interpretation <br>
|
| 586 |
#### Entropy (classifiers)<br>
|
| 587 |
|
| 588 |
+
Binary predictive entropy of the output probability p̄:<br>
|
| 589 |
|
| 590 |
$$\mathcal{H} = -\bar{p}\log\bar{p} - (1 - \bar{p})\log(1 - \bar{p})$$<br>
|
| 591 |
|
| 592 |
+
- For **DNN classifiers**: p̄ is the mean probability across 5 independently seeded models (deep ensemble). High entropy reflects both epistemic uncertainty (seed disagreement) and aleatoric uncertainty (collectively diffuse predictions).<br>
|
| 593 |
|
| 594 |
+
- For **XGBoost / SVM / ElasticNet classifiers**: p̄ is the single model's output probability (or sigmoid of decision function for ElasticNet). Entropy reflects output confidence of a single model only.<br>
|
| 595 |
|
| 596 |
| Range | Interpretation |
|
| 597 |
|---|---|
|
|
|
|
| 607 |
|
| 608 |
Returned as a tuple `(lo, hi)` with 90% marginal coverage guarantee.<br>
|
| 609 |
|
| 610 |
+
We implement the **residual normalised conformity score** following [Lei et al. (2018)](https://doi.org/10.1080/01621459.2017.1307116) and [Cordier et al. (2023) / MAPIE](https://proceedings.mlr.press/v204/cordier23a.html). An auxiliary XGBoost model $\hat{\sigma}(\mathbf{x})$ is trained on held-out embeddings and absolute residuals |yᵢ − ŷᵢ|. At inference:<br>
|
| 611 |
|
| 612 |
$$[\hat{y}(\mathbf{x}) - q \cdot \hat{\sigma}(\mathbf{x}),\ \hat{y}(\mathbf{x}) + q \cdot \hat{\sigma}(\mathbf{x})]$$
|
| 613 |
+
|
| 614 |
|
| 615 |
+
where q is the ⌈(n+1)(1−α)⌉ / n quantile of the normalized scores sᵢ = |yᵢ − ŷᵢ| / σ̂(xᵢ).
|
| 616 |
+
|
| 617 |
|
| 618 |
- **Interval width varies per input** -- molecules more dissimilar to training data tend to receive wider intervals<br>
|
| 619 |
+
|
| 620 |
+
- **Coverage guarantee**: on exchangeable data, P(y ∈ [ŷ − qσ̂, ŷ + qσ̂]) ≥ 0.90<br>
|
| 621 |
+
|
| 622 |
- **The guarantee is marginal**, not conditional, as an unusually narrow interval on an out-of-distribution molecule does not guarantee correctness<br>
|
| 623 |
+
|
| 624 |
- **Full access**: We already computed MAPIE for all regression models; users are allowed to directly use them for customized model lists.<br>
|
| 625 |
|
| 626 |
---
|