TheArtist Music Transformer — F5 (Jazz Only, no pop rehearsal)
Jazz-only fine-tune with no pop rehearsal. Reference point for catastrophic forgetting in the companion paper. Strictly dominated by F4 on every axis.
One of six checkpoints released alongside the paper Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation (Lee, 2026). See the collection overview at PearlLeeStudio/TheArtist-MusicTransformer-pop-baseline.
Model summary
| Field | Value |
|---|---|
| Architecture | Music Transformer with relative positional attention |
| Parameters | 25,661,440 |
| Vocabulary size | 351 tokens |
| Max sequence length | 256 |
| d_model / heads / FFN / layers | 512 / 8 / 2048 / 8 |
| Fine-tune resumed from | Phase 0 pop baseline |
| Best epoch | 7 |
Training data
All 1,513 jazz training sequences. No pop rehearsal data.
Evaluation (held-out per-genre test sets)
| Metric | Pop test | Jazz test |
|---|---|---|
| Top-1 accuracy | 82.10% | 81.30% |
| Top-5 accuracy | 96.31% | 92.44% |
| Perplexity | 1.96 | 2.24 |
| Δ vs. Phase 0 baseline | −2.14 | +8.44 |
F5 illustrates the catastrophic-forgetting failure mode that motivated the paper. Pop accuracy collapses by 2.14 points within a single fine-tune epoch and stabilizes there. Jazz top-1 reaches 81.30%, which is matched by F4 (which also keeps an extra 0.92 points of pop). On every operating axis F5 is dominated by F4, so F5 should not be selected as a production checkpoint. It is released here for replication of the per-epoch forgetting curve and for researchers who want to inspect the failure mode directly.
Known failure modes (this checkpoint specifically)
Chord progressions trend toward dense chromatic voicings that are commercially niche. Generations on pop prompts retain diatonic structure but with persistent chromatic substitution. See paper §6.4 and §7.6 for representative continuations.
Usage
import torch
from huggingface_hub import hf_hub_download
from model import MusicTransformer
from tokenizer import ChordTokenizer
ckpt_path = hf_hub_download(
repo_id="PearlLeeStudio/TheArtist-MusicTransformer-ft-jazz-only",
filename="best.pt",
)
tokenizer = ChordTokenizer()
ckpt = torch.load(ckpt_path, map_location="cpu", weights_only=False)
model = MusicTransformer(
vocab_size=tokenizer.vocab_size,
d_model=512, n_heads=8, d_ff=2048, n_layers=8,
max_seq_len=256, dropout=0.0, pad_id=tokenizer.pad_id,
)
model.load_state_dict(ckpt["model_state_dict"])
model.eval()
Training-data licenses
| Dataset | License |
|---|---|
| Jazz Harmony Treebank | Public |
| JazzStandards (iReal Pro) | Community redistribution |
| Weimar Jazz Database | ODbL |
| JAAH | Research-use public |
Citation
Preprint: arXiv:2605.04998.
@misc{lee2026chordmix,
title = {Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation},
author = {Lee, Jinju},
year = {2026},
eprint = {2605.04998},
archivePrefix = {arXiv}
}
- Downloads last month
- 15