ChatterjeeLab
/

PeptiVerse

Joblib

Model card Files Files and versions

xet

Community

yinuozhang commited on 7 days ago

Commit

e719470

verified ·

1 Parent(s): 96bdf50

Update README.md

Browse files

Files changed (1) hide show

README.md +15 -10

README.md CHANGED Viewed

@@ -575,23 +575,23 @@ Predicts peptide-protein binding affinity. Requires both peptide and target prot
 **Interpretation:**<br>
-    - Scores ≥ 9 correspond to tight binders (K ≤ 10⁻⁹ M, nanomolar to picomolar range)<br>
-    - Scores between 7 and 9 correspond to medium binders (10⁻⁷–10⁻⁹ M, nanomolar to micromolar range)<br>
-    - Scores < 7 correspond to weak binders (K ≥ 10⁻⁶ M, micromolar and weaker)<br>
-    - A difference of 1 unit in score corresponds to an approximately tenfold change in binding affinity.<br>
 ---
 ### Uncertainty Interpretation <br>
 #### Entropy (classifiers)<br>
-Binary predictive entropy of the output probability $\bar{p}$:<br>
 $$\mathcal{H} = -\bar{p}\log\bar{p} - (1 - \bar{p})\log(1 - \bar{p})$$<br>
-- For **DNN classifiers**: $\bar{p}$ is the mean probability across 5 independently seeded models (deep ensemble). High entropy reflects both epistemic uncertainty (seed disagreement) and aleatoric uncertainty (collectively diffuse predictions).<br>
-- For **XGBoost / SVM / ElasticNet classifiers**: $\bar{p}$ is the single model's output probability (or sigmoid of decision function for ElasticNet). Entropy reflects output confidence of a single model only.<br>
 | Range | Interpretation |
 |---|---|
@@ -607,15 +607,20 @@ $$\mathcal{H} = -\bar{p}\log\bar{p} - (1 - \bar{p})\log(1 - \bar{p})$$<br>
 Returned as a tuple `(lo, hi)` with 90% marginal coverage guarantee.<br>
-We implement the **residual normalised conformity score** following [Lei et al. (2018)](https://doi.org/10.1080/01621459.2017.1307116) and [Cordier et al. (2023) / MAPIE](https://proceedings.mlr.press/v204/cordier23a.html). An auxiliary XGBoost model $\hat{\sigma}(\mathbf{x})$ is trained on held-out embeddings and absolute residuals $|y_i - \hat{y}_i|$. At inference:<br>
 $$[\hat{y}(\mathbf{x}) - q \cdot \hat{\sigma}(\mathbf{x}),\ \hat{y}(\mathbf{x}) + q \cdot \hat{\sigma}(\mathbf{x})]$$
-where $q$ is the $\lceil(n+1)(1-\alpha)\rceil / n$ quantile of the normalized scores $s_i = |y_i - \hat{y}_i| / \hat{\sigma}(\mathbf{x}_i)$.<br>
 - **Interval width varies per input** -- molecules more dissimilar to training data tend to receive wider intervals<br>
-- **Coverage guarantee**: on exchangeable data, $P(y \in [\hat{y} - q\hat{\sigma},\ \hat{y} + q\hat{\sigma}]) \geq 0.90$<br>
 - **The guarantee is marginal**, not conditional, as an unusually narrow interval on an out-of-distribution molecule does not guarantee correctness<br>
 - **Full access**: We already computed MAPIE for all regression models; users are allowed to directly use them for customized model lists.<br>
 ---

 **Interpretation:**<br>
+- Scores ≥ 9 correspond to tight binders (K ≤ 10⁻⁹ M, nanomolar to picomolar range)<br>
+- Scores between 7 and 9 correspond to medium binders (10⁻⁷–10⁻⁹ M, nanomolar to micromolar range)<br>
+- Scores < 7 correspond to weak binders (K ≥ 10⁻⁶ M, micromolar and weaker)<br>
+- A difference of 1 unit in score corresponds to an approximately tenfold change in binding affinity.<br>
 ---
 ### Uncertainty Interpretation <br>
 #### Entropy (classifiers)<br>
+Binary predictive entropy of the output probability p̄:<br>
 $$\mathcal{H} = -\bar{p}\log\bar{p} - (1 - \bar{p})\log(1 - \bar{p})$$<br>
+- For **DNN classifiers**: p̄ is the mean probability across 5 independently seeded models (deep ensemble). High entropy reflects both epistemic uncertainty (seed disagreement) and aleatoric uncertainty (collectively diffuse predictions).<br>
+- For **XGBoost / SVM / ElasticNet classifiers**: p̄ is the single model's output probability (or sigmoid of decision function for ElasticNet). Entropy reflects output confidence of a single model only.<br>
 | Range | Interpretation |
 |---|---|
 Returned as a tuple `(lo, hi)` with 90% marginal coverage guarantee.<br>
+We implement the **residual normalised conformity score** following [Lei et al. (2018)](https://doi.org/10.1080/01621459.2017.1307116) and [Cordier et al. (2023) / MAPIE](https://proceedings.mlr.press/v204/cordier23a.html). An auxiliary XGBoost model $\hat{\sigma}(\mathbf{x})$ is trained on held-out embeddings and absolute residuals |yᵢ − ŷᵢ|. At inference:<br>
 $$[\hat{y}(\mathbf{x}) - q \cdot \hat{\sigma}(\mathbf{x}),\ \hat{y}(\mathbf{x}) + q \cdot \hat{\sigma}(\mathbf{x})]$$
+where q is the ⌈(n+1)(1−α)⌉ / n quantile of the normalized scores sᵢ = |yᵢ − ŷᵢ| / σ̂(xᵢ).
 - **Interval width varies per input** -- molecules more dissimilar to training data tend to receive wider intervals<br>
+- **Coverage guarantee**: on exchangeable data, P(y ∈ [ŷ − qσ̂, ŷ + qσ̂]) ≥ 0.90<br>
 - **The guarantee is marginal**, not conditional, as an unusually narrow interval on an out-of-distribution molecule does not guarantee correctness<br>
 - **Full access**: We already computed MAPIE for all regression models; users are allowed to directly use them for customized model lists.<br>
 ---