Pulse88-40M-Alpha-Preview Architectural Variant E is a high-efficiency, 40.8 million parameter causal piano continuation model trained on 86k pieces from the Godzilla MIDI Dataset. It utilizes a hybrid architecture combining Gated Delta Networks (GDN) with sparse Grouped-Query Attention (GQA) anchors to achieve long-context musical coherence.
Bullet Points
- Architecture: Hybrid Gated Delta Network + Sparse GQA
- Parameters: 40,784,528 (~40.8M)
- Vocabulary: 171-token event vocabulary (delta onset, pitch, duration, velocity)
- Context Window: 2048 tokens (512 seed / 1536 continuation)
- Training Data: Godzilla MIDI Dataset Piano Subset (86k piano pieces)
Variant E 40M Architecture Summary
Variant E is a decoder-only autoregressive piano MIDI model built on a custom 171-token event vocabulary with event quads (delta, pitch, duration, velocity) and event size 4. The 40M profile uses d_model 640, 13 layers, and a 2048-token context window (512 seed plus 1536 continuation). Each layer is pre-norm residual and stacks two Gated Delta Net blocks, with sparse grouped-query attention anchors inserted every 2 layers and always in the final layer. In the 13-layer profile this gives attention anchors at layers 2, 4, 6, 8, 10, 12, and 13. Token embedding and output head are tied, dropout is 0.1, and output logits use 1/sqrt(d_model) scaling.
For the 40M shape, GDN runs with inner_dim 320 and 4 heads; attention runs with 8 query heads and grouped KV sharing (groups 4, effective KV heads 2). The training notebook enforces strict real GDN kernels (flash-linear-attention required) and blocks fallback when strict mode is enabled. Training is configured for pretokenized NPZ manifests up to 100k pieces, with optimizer/schedule settings of AdamW, learning rate 2e-4, cosine decay, dynamic warmup resolution, weight decay 0.01, label smoothing 0.1, and max grad norm 1.0. The run is set up for dual-T4 DDP on Kaggle (one process per GPU) with checkpoint resume flow, which matches your two-session training setup.
Training
See the training_logs.txt for exact loss numbers.

Dataset
The model was trained on the Godzilla MIDI Dataset, specifically 86,000 pieces from the piano subset. This dataset, created by Project Los Angeles (Aleksandr Lev), provides a massive and diverse corpus of MIDI data that allows the model to learn complex harmonic structures and temporal continuity.
Demo
Bluebird Continuation
Single Note Continuation
In this generation the model was given only a single note (C4).
For optimal results, a longer seed is recommended.
---
Other Generations
God Rest Ye Merry Gentlemen Continuation
Continuation of a simple motif
Sabrina by John Williams
Wii Channel Continuation
Audio rendered with Advanced MIDI Renderer
Intended Use
- Research and experimentation in symbolic piano continuation
- Evaluation of GDN + sparse attention design choices
Limitations
- The model is limited to piano-only MIDI data and does not generalize to multi-instrument compositions.
- Performance degrades with very short or highly irregular input seeds.
- The model may produce repetitive or unstable outputs over long continuations.
- As an alpha preview, the model has not been extensively optimized for musical quality or stylistic control.
Project Future and Purpose
The purpose of this project is the research of novel architectures in the symbolic music Machine learning space. This is a small scale preview of what is to come. I plan to continue working on the architecture to keep it on the bleeding edge of technology.
Warranty
This model is intended for research purposes only. It is provided “as is,” without any warranties, express or implied. The authors make no guarantees regarding its performance, reliability, or fitness for a particular purpose. Use at your own risk.
Citation & Credits
If you use this model, please credit the original data source:
@misc{GodzillaMIDIDataset2025,
title = {Godzilla MIDI Dataset: Enormous, comprehensive, normalized and searchable MIDI dataset for MIR and symbolic music AI purposes},
author = {Alex Lev},
publisher = {Project Los Angeles / Tegridy Code},
year = {2025},
url = {https://huggingface.co/datasets/projectlosangeles/Godzilla-MIDI-Dataset}
@inproceedings{lev2026tegridytools,
title = {tegridy-tools: Symbolic Music NLP Artificial Intelligence Toolkit},
author = {Aleksandr Lev},
booktitle = {GitHub},
year = {2026},
}
