PredANNpp-Pretrain-Entropy-ctx16-ep10000-seed42
Model description
This repository contains a PredANN++ PyTorch Lightning checkpoint for EEG-based music representation learning and/or song identification.
- Canonical repository:
Shogo-Noguchi/PredANNpp-Pretrain-Entropy-ctx16-ep10000-seed42 - Checkpoint file:
predannpp_pretrain_entropy_ctx16_ep10000_seed42.ckpt - Stage: pretrain-only
- Target / teacher representation: Entropy (ctx16)
- Architecture: EncoderDecoder multitask
- Random seed: 42
- SHA256:
7779e76f065f75b8049861c794de77db9343f452e469398b42f153a17ca89d12
This is a multitask pretraining checkpoint, not an encoder-only finetuned Song ID classifier. It keeps the encoder/decoder components needed for masked teacher-token prediction, plus the auxiliary Song ID pathway used during multitask training.
The repository name intentionally omits NMEDT because this checkpoint is positioned as a general-purpose representation pretraining checkpoint. The card still discloses NMED-T as the training data for provenance.
Capabilities
Masked prediction of MusicGen Entropy token sequences; auxiliary Song ID classification during multitask training. For direct 3-second EEG to Song ID inference, use an EncoderOnly finetuned checkpoint instead.
Input and output
- Input EEG: 128 channels, 125 Hz, 3-second segments, following the PredANN++ / NMED-T preprocessing pipeline.
- Output: depends on stage. Pretraining checkpoints expose the multitask pretraining outputs; the full-scratch checkpoint outputs 10-class Song ID logits.
Training data
- Dataset: NMED-T (Naturalistic Music EEG Dataset – Tempo), 10 songs, 20 subjects, trial=1, as used in the PredANN++ experiments.
- Teacher / target source: MusicGen Entropy token sequences.
Training procedure
Multitask pretraining for 10000 epochs with 50% masking, seed 42. No downstream finetuning checkpoint is included in this repository.
Intended use
Continuing pretraining, finetuning into NMED-T Song ID or other EEG downstream tasks, and reproducing PredANN++ pretraining ablations.
Not intended use
- Medical diagnosis, clinical decision making, or biometric identification.
- Commercial use without checking the PredANN++ code license, NMED-T terms, and upstream model/feature licenses.
- Immediate Song ID inference as a final classifier without loading the correct multitask module or performing downstream finetuning.
License and upstream dependencies
MusicGen / AudioCraft-derived features. Keep the released checkpoint under CC-BY-NC-4.0 for compatibility with the existing PredANN++ HF collection.
Reproducibility notes
- The original source path at release time was:
/data/Backup_AkamaUbuntu/home/sony_csl/workspace/noguchi/work/mind-model/Surprisal_Model/codes_3s/best_checkpoints/newMF_ctx16/multitask/EntropyMultitask_newMF/SongAcc/last.ckpt. metadata.jsonstores the standardized release metadata.SHA256SUMSstores the checkpoint checksum.- Use the PredANN++ GitHub repository for model definitions and evaluation scripts.
Links
- Project page: https://shogonoguchi.github.io/PredANNpp/
- GitHub: https://github.com/ShogoNoguchi/PredANNpp
- Hugging Face collection: https://huggingface.co/collections/Shogo-Noguchi/predann-models
- Paper: https://arxiv.org/abs/2603.03190
Citation
If you use this checkpoint, cite the PredANN++ paper and the NMED-T dataset. For MuQ / MusicGen-derived teacher features, also cite the relevant upstream model or toolkit.
@misc{noguchi2026expectationacousticneuralnetwork,
title={Expectation and Acoustic Neural Network Representations Enhance Music Identification from Brain Activity},
author={Shogo Noguchi and Taketo Akama and Tai Nakamura and Shun Minamikawa and Natalia Polouliakh},
year={2026},
eprint={2603.03190},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2603.03190}
}