--- license: mit library_name: pytorch-lightning pipeline_tag: tabular-classification tags: - biology - genomics datasets: - Genentech/human-chromhmm-fullstack-data base_model: - Genentech/enformer-model --- # human-chromhmm-fullstack-model ## Model Description This model is a multi-class classifier trained to predict chromatin state annotations for genomic DNA sequences. It classifies sequences into 16 chromatin states based on the ChromHMM fullstack annotation. It was trained by fine-tuning the Enformer model using the `grelu` library. - **Architecture:** Fine-tuned Enformer (EnformerPretrainedModel) - **Input:** Genomic sequences (hg38) - **Output:** Probability distribution over 16 chromatin states - **Parameters:** 71.5M total (all trainable) ### Chromatin States Acet, BivProm, DNase, EnhA, EnhWk, GapArtf, HET, PromF, Quies, ReprPC, TSS, Tx, TxEnh, TxEx, TxWk, znf ## Performance Metrics are computed per chromatin state and averaged across all 16 states. ### Test Set | Metric | Mean | Std | Min | Max | |--------|------|-----|-----|-----| | Accuracy | 0.4373 | 0.2162 | 0.2455 | 0.8528 | | AUROC | 0.8609 | 0.0767 | 0.7652 | 0.9952 | | Average Precision | 0.4113 | 0.1974 | 0.1362 | 0.8015 | ### Validation Set | Metric | Mean | Std | Min | Max | |--------|------|-----|-----|-----| | Accuracy | 0.4487 | 0.2098 | 0.2164 | 0.8696 | | AUROC | 0.8654 | 0.0763 | 0.7594 | 0.9950 | | Average Precision | 0.4155 | 0.1848 | 0.1241 | 0.7812 | ### Per-class Test Metrics | State | Accuracy | AUROC | AvgPrec | |-------|----------|-------|---------| | Acet | 0.2939 | 0.7973 | 0.2091 | | BivProm | 0.5431 | 0.9373 | 0.3575 | | DNase | 0.8528 | 0.9905 | 0.7527 | | EnhA | 0.2950 | 0.8145 | 0.3368 | | EnhWk | 0.2683 | 0.8144 | 0.2947 | | GapArtf | 0.7988 | 0.9517 | 0.7029 | | HET | 0.2455 | 0.8236 | 0.4982 | | PromF | 0.5940 | 0.9557 | 0.6369 | | Quies | 0.3662 | 0.8512 | 0.3610 | | ReprPC | 0.2874 | 0.7652 | 0.2522 | | TSS | 0.8302 | 0.9952 | 0.8015 | | Tx | 0.2590 | 0.8072 | 0.3197 | | TxEnh | 0.2694 | 0.8252 | 0.2770 | | TxEx | 0.5336 | 0.8821 | 0.3563 | | TxWk | 0.2510 | 0.7781 | 0.2880 | | znf | 0.3079 | 0.7851 | 0.1362 | ## Training Details | Parameter | Value | |-----------|-------| | Task | Multiclass classification | | Loss | Binary Cross-Entropy (with class weights) | | Optimizer | Adam | | Learning rate | 0.0001 | | Batch size | 512 | | Max epochs | 10 | | Devices | 4 | | n_transformers | 1 | | crop_len | 0 | | grelu version | 1.0.4.post1.dev39 | ## Repository Content 1. `model.ckpt`: The trained model weights and hyperparameters (PyTorch Lightning checkpoint). 2. `2_train.ipynb`: Jupyter notebook containing the training logic, architecture definition, and evaluation loops. 3. `output.log`: Training logs. ## How to use To load this model for inference or fine-tuning, use the `grelu` interface: ```python from grelu.lightning import LightningModel from huggingface_hub import hf_hub_download ckpt_path = hf_hub_download( repo_id="Genentech/human-chromhmm-fullstack-model", filename="model.ckpt" ) model = LightningModel.load_from_checkpoint(ckpt_path, weights_only=False) model.eval() ```