majoroslab
/

BlueSTARR

variant effect prediction

Model card Files Files and versions

hlapp commited on 21 days ago

Commit

0eca5eb

·

verified ·

1 Parent(s): 98b9b94

Fixes hyperlink

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -96,7 +96,7 @@ BlueSTARR was originally inspired by applying the [DeepSTARR model](https://doi.
 The default architecture consists of a convolutional neural network (CNN) that accepts one-hot encoded 300 bp DNA sequences as input. Our default model comprises five one-dimensional convolutional layers with 1024, 512, 256, 128 and 64 filters, and kernel sizes of 8,16, 32, 64 and 128, respectively. This choice was made to maintain a large receptive field without using pooling after each layer (in contrast to DeepSTARR). Each convolutional layer is followed by batch normalization and ReLU activation, and a dropout layer is applied before each convolution. All layers use the same padding with no dilation, no residual connections and no intermediate pooling. The final convolutional layer outputs were aggregated by global average pooling and connected directly to a single output neuron for prediction.
-The model was trained using the Adam optimizer (https://doi.org/10.48550/arXiv.1412.6980) with a learning rate of 0.002, a mean-squared error (MSE) loss function, batch size of 128, and an early stopping “patience” value of 10. All of the foregoing hyperparameters can be easily changed via the model configuration text file.
 Variations of the above-mentioned architecture were explored to evaluate different receptive fields, sequence lengths, number of layers, inclusion of attention mechanism, and a custom loss function based on negative log-likelihood (NLL).

 The default architecture consists of a convolutional neural network (CNN) that accepts one-hot encoded 300 bp DNA sequences as input. Our default model comprises five one-dimensional convolutional layers with 1024, 512, 256, 128 and 64 filters, and kernel sizes of 8,16, 32, 64 and 128, respectively. This choice was made to maintain a large receptive field without using pooling after each layer (in contrast to DeepSTARR). Each convolutional layer is followed by batch normalization and ReLU activation, and a dropout layer is applied before each convolution. All layers use the same padding with no dilation, no residual connections and no intermediate pooling. The final convolutional layer outputs were aggregated by global average pooling and connected directly to a single output neuron for prediction.
+The model was trained using the [Adam optimizer](https://doi.org/10.48550/arXiv.1412.6980) with a learning rate of 0.002, a mean-squared error (MSE) loss function, batch size of 128, and an early stopping “patience” value of 10. All of the foregoing hyperparameters can be easily changed via the model configuration text file.
 Variations of the above-mentioned architecture were explored to evaluate different receptive fields, sequence lengths, number of layers, inclusion of attention mechanism, and a custom loss function based on negative log-likelihood (NLL).