---
license: mit
library_name: pytorch-lightning
pipeline_tag: tabular-classification
tags:
- biology
- genomics
datasets:
- Genentech/human-chromhmm-fullstack-data
base_model:
- Genentech/enformer-model
---

# human-chromhmm-fullstack-model

## Model Description
This model is a multi-class classifier trained to predict chromatin state annotations for genomic DNA sequences. It classifies sequences into 16 chromatin states based on the ChromHMM fullstack annotation. It was trained by fine-tuning the Enformer model using the `grelu` library.

- **Architecture:** Fine-tuned Enformer (EnformerPretrainedModel)
- **Input:** Genomic sequences (hg38)
- **Output:** Probability distribution over 16 chromatin states
- **Parameters:** 71.5M total (all trainable)

### Chromatin States
Acet, BivProm, DNase, EnhA, EnhWk, GapArtf, HET, PromF, Quies, ReprPC, TSS, Tx, TxEnh, TxEx, TxWk, znf

## Performance

Metrics are computed per chromatin state and averaged across all 16 states.

### Test Set
| Metric | Mean | Std | Min | Max |
|--------|------|-----|-----|-----|
| Accuracy | 0.4373 | 0.2162 | 0.2455 | 0.8528 |
| AUROC | 0.8609 | 0.0767 | 0.7652 | 0.9952 |
| Average Precision | 0.4113 | 0.1974 | 0.1362 | 0.8015 |

### Validation Set
| Metric | Mean | Std | Min | Max |
|--------|------|-----|-----|-----|
| Accuracy | 0.4487 | 0.2098 | 0.2164 | 0.8696 |
| AUROC | 0.8654 | 0.0763 | 0.7594 | 0.9950 |
| Average Precision | 0.4155 | 0.1848 | 0.1241 | 0.7812 |

### Per-class Test Metrics
| State | Accuracy | AUROC | AvgPrec |
|-------|----------|-------|---------|
| Acet | 0.2939 | 0.7973 | 0.2091 |
| BivProm | 0.5431 | 0.9373 | 0.3575 |
| DNase | 0.8528 | 0.9905 | 0.7527 |
| EnhA | 0.2950 | 0.8145 | 0.3368 |
| EnhWk | 0.2683 | 0.8144 | 0.2947 |
| GapArtf | 0.7988 | 0.9517 | 0.7029 |
| HET | 0.2455 | 0.8236 | 0.4982 |
| PromF | 0.5940 | 0.9557 | 0.6369 |
| Quies | 0.3662 | 0.8512 | 0.3610 |
| ReprPC | 0.2874 | 0.7652 | 0.2522 |
| TSS | 0.8302 | 0.9952 | 0.8015 |
| Tx | 0.2590 | 0.8072 | 0.3197 |
| TxEnh | 0.2694 | 0.8252 | 0.2770 |
| TxEx | 0.5336 | 0.8821 | 0.3563 |
| TxWk | 0.2510 | 0.7781 | 0.2880 |
| znf | 0.3079 | 0.7851 | 0.1362 |

## Training Details

| Parameter | Value |
|-----------|-------|
| Task | Multiclass classification |
| Loss | Binary Cross-Entropy (with class weights) |
| Optimizer | Adam |
| Learning rate | 0.0001 |
| Batch size | 512 |
| Max epochs | 10 |
| Devices | 4 |
| n_transformers | 1 |
| crop_len | 0 |
| grelu version | 1.0.4.post1.dev39 |

## Repository Content
1. `model.ckpt`: The trained model weights and hyperparameters (PyTorch Lightning checkpoint).
2. `2_train.ipynb`: Jupyter notebook containing the training logic, architecture definition, and evaluation loops.
3. `output.log`: Training logs.

## How to use
To load this model for inference or fine-tuning, use the `grelu` interface:

```python
from grelu.lightning import LightningModel
from huggingface_hub import hf_hub_download

ckpt_path = hf_hub_download(
    repo_id="Genentech/human-chromhmm-fullstack-model",
    filename="model.ckpt"
)

model = LightningModel.load_from_checkpoint(ckpt_path, weights_only=False)
model.eval()
```