Lal Claude Opus 4.6 commited on
Commit ·
7140455
1
Parent(s): 637e71b
Rewrite model card with correct information
Browse filesThe previous README was incorrectly copied from human-atac-catlas-model.
- Fix model name (human-chromhmm-fullstack-model, not human-atac-catlas)
- Fix dataset reference (human-chromhmm-fullstack-data)
- Fix task description (16 chromatin states, not 204 cell types)
- Fix repo_id in loading code
- Add weights_only=False to loading code
- Add test and validation metrics (accuracy, AUROC, avg precision)
- Add per-class test metrics for all 16 states
- Add training hyperparameters
- Add parameter count (71.5M)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
README.md
CHANGED
|
@@ -1,5 +1,4 @@
|
|
| 1 |
---
|
| 2 |
-
# 1. Metadata Block
|
| 3 |
license: mit
|
| 4 |
library_name: pytorch-lightning
|
| 5 |
pipeline_tag: tabular-classification
|
|
@@ -7,19 +6,76 @@ tags:
|
|
| 7 |
- biology
|
| 8 |
- genomics
|
| 9 |
datasets:
|
| 10 |
-
- Genentech/human-
|
| 11 |
base_model:
|
| 12 |
- Genentech/enformer-model
|
| 13 |
---
|
| 14 |
|
| 15 |
-
# human-
|
| 16 |
|
| 17 |
## Model Description
|
| 18 |
-
This model is a multi-
|
| 19 |
|
| 20 |
-
- **Architecture:** Fine-tuned Enformer
|
| 21 |
- **Input:** Genomic sequences (hg38)
|
| 22 |
-
- **Output:**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 23 |
|
| 24 |
## Repository Content
|
| 25 |
1. `model.ckpt`: The trained model weights and hyperparameters (PyTorch Lightning checkpoint).
|
|
@@ -34,10 +90,10 @@ from grelu.lightning import LightningModel
|
|
| 34 |
from huggingface_hub import hf_hub_download
|
| 35 |
|
| 36 |
ckpt_path = hf_hub_download(
|
| 37 |
-
repo_id="Genentech/human-
|
| 38 |
filename="model.ckpt"
|
| 39 |
)
|
| 40 |
|
| 41 |
-
model = LightningModel.load_from_checkpoint(ckpt_path)
|
| 42 |
model.eval()
|
| 43 |
-
```
|
|
|
|
| 1 |
---
|
|
|
|
| 2 |
license: mit
|
| 3 |
library_name: pytorch-lightning
|
| 4 |
pipeline_tag: tabular-classification
|
|
|
|
| 6 |
- biology
|
| 7 |
- genomics
|
| 8 |
datasets:
|
| 9 |
+
- Genentech/human-chromhmm-fullstack-data
|
| 10 |
base_model:
|
| 11 |
- Genentech/enformer-model
|
| 12 |
---
|
| 13 |
|
| 14 |
+
# human-chromhmm-fullstack-model
|
| 15 |
|
| 16 |
## Model Description
|
| 17 |
+
This model is a multi-class classifier trained to predict chromatin state annotations for genomic DNA sequences. It classifies sequences into 16 chromatin states based on the ChromHMM fullstack annotation. It was trained by fine-tuning the Enformer model using the `grelu` library.
|
| 18 |
|
| 19 |
+
- **Architecture:** Fine-tuned Enformer (EnformerPretrainedModel)
|
| 20 |
- **Input:** Genomic sequences (hg38)
|
| 21 |
+
- **Output:** Probability distribution over 16 chromatin states
|
| 22 |
+
- **Parameters:** 71.5M total (all trainable)
|
| 23 |
+
|
| 24 |
+
### Chromatin States
|
| 25 |
+
Acet, BivProm, DNase, EnhA, EnhWk, GapArtf, HET, PromF, Quies, ReprPC, TSS, Tx, TxEnh, TxEx, TxWk, znf
|
| 26 |
+
|
| 27 |
+
## Performance
|
| 28 |
+
|
| 29 |
+
Metrics are computed per chromatin state and averaged across all 16 states.
|
| 30 |
+
|
| 31 |
+
### Test Set
|
| 32 |
+
| Metric | Mean | Std | Min | Max |
|
| 33 |
+
|--------|------|-----|-----|-----|
|
| 34 |
+
| Accuracy | 0.4373 | 0.2162 | 0.2455 | 0.8528 |
|
| 35 |
+
| AUROC | 0.8609 | 0.0767 | 0.7652 | 0.9952 |
|
| 36 |
+
| Average Precision | 0.4113 | 0.1974 | 0.1362 | 0.8015 |
|
| 37 |
+
|
| 38 |
+
### Validation Set
|
| 39 |
+
| Metric | Mean | Std | Min | Max |
|
| 40 |
+
|--------|------|-----|-----|-----|
|
| 41 |
+
| Accuracy | 0.4487 | 0.2098 | 0.2164 | 0.8696 |
|
| 42 |
+
| AUROC | 0.8654 | 0.0763 | 0.7594 | 0.9950 |
|
| 43 |
+
| Average Precision | 0.4155 | 0.1848 | 0.1241 | 0.7812 |
|
| 44 |
+
|
| 45 |
+
### Per-class Test Metrics
|
| 46 |
+
| State | Accuracy | AUROC | AvgPrec |
|
| 47 |
+
|-------|----------|-------|---------|
|
| 48 |
+
| Acet | 0.2939 | 0.7973 | 0.2091 |
|
| 49 |
+
| BivProm | 0.5431 | 0.9373 | 0.3575 |
|
| 50 |
+
| DNase | 0.8528 | 0.9905 | 0.7527 |
|
| 51 |
+
| EnhA | 0.2950 | 0.8145 | 0.3368 |
|
| 52 |
+
| EnhWk | 0.2683 | 0.8144 | 0.2947 |
|
| 53 |
+
| GapArtf | 0.7988 | 0.9517 | 0.7029 |
|
| 54 |
+
| HET | 0.2455 | 0.8236 | 0.4982 |
|
| 55 |
+
| PromF | 0.5940 | 0.9557 | 0.6369 |
|
| 56 |
+
| Quies | 0.3662 | 0.8512 | 0.3610 |
|
| 57 |
+
| ReprPC | 0.2874 | 0.7652 | 0.2522 |
|
| 58 |
+
| TSS | 0.8302 | 0.9952 | 0.8015 |
|
| 59 |
+
| Tx | 0.2590 | 0.8072 | 0.3197 |
|
| 60 |
+
| TxEnh | 0.2694 | 0.8252 | 0.2770 |
|
| 61 |
+
| TxEx | 0.5336 | 0.8821 | 0.3563 |
|
| 62 |
+
| TxWk | 0.2510 | 0.7781 | 0.2880 |
|
| 63 |
+
| znf | 0.3079 | 0.7851 | 0.1362 |
|
| 64 |
+
|
| 65 |
+
## Training Details
|
| 66 |
+
|
| 67 |
+
| Parameter | Value |
|
| 68 |
+
|-----------|-------|
|
| 69 |
+
| Task | Multiclass classification |
|
| 70 |
+
| Loss | Binary Cross-Entropy (with class weights) |
|
| 71 |
+
| Optimizer | Adam |
|
| 72 |
+
| Learning rate | 0.0001 |
|
| 73 |
+
| Batch size | 512 |
|
| 74 |
+
| Max epochs | 10 |
|
| 75 |
+
| Devices | 4 |
|
| 76 |
+
| n_transformers | 1 |
|
| 77 |
+
| crop_len | 0 |
|
| 78 |
+
| grelu version | 1.0.4.post1.dev39 |
|
| 79 |
|
| 80 |
## Repository Content
|
| 81 |
1. `model.ckpt`: The trained model weights and hyperparameters (PyTorch Lightning checkpoint).
|
|
|
|
| 90 |
from huggingface_hub import hf_hub_download
|
| 91 |
|
| 92 |
ckpt_path = hf_hub_download(
|
| 93 |
+
repo_id="Genentech/human-chromhmm-fullstack-model",
|
| 94 |
filename="model.ckpt"
|
| 95 |
)
|
| 96 |
|
| 97 |
+
model = LightningModel.load_from_checkpoint(ckpt_path, weights_only=False)
|
| 98 |
model.eval()
|
| 99 |
+
```
|