TheArtist Music Transformer — F5 (Jazz Only, no pop rehearsal)

Jazz-only fine-tune with no pop rehearsal. Reference point for catastrophic forgetting in the companion paper. Strictly dominated by F4 on every axis.

One of six checkpoints released alongside the paper Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation (Lee, 2026). See the collection overview at PearlLeeStudio/TheArtist-MusicTransformer-pop-baseline.

Model summary

Field Value
Architecture Music Transformer with relative positional attention
Parameters 25,661,440
Vocabulary size 351 tokens
Max sequence length 256
d_model / heads / FFN / layers 512 / 8 / 2048 / 8
Fine-tune resumed from Phase 0 pop baseline
Best epoch 7

Training data

All 1,513 jazz training sequences. No pop rehearsal data.

Evaluation (held-out per-genre test sets)

Metric Pop test Jazz test
Top-1 accuracy 82.10% 81.30%
Top-5 accuracy 96.31% 92.44%
Perplexity 1.96 2.24
Δ vs. Phase 0 baseline −2.14 +8.44

F5 illustrates the catastrophic-forgetting failure mode that motivated the paper. Pop accuracy collapses by 2.14 points within a single fine-tune epoch and stabilizes there. Jazz top-1 reaches 81.30%, which is matched by F4 (which also keeps an extra 0.92 points of pop). On every operating axis F5 is dominated by F4, so F5 should not be selected as a production checkpoint. It is released here for replication of the per-epoch forgetting curve and for researchers who want to inspect the failure mode directly.

Known failure modes (this checkpoint specifically)

Chord progressions trend toward dense chromatic voicings that are commercially niche. Generations on pop prompts retain diatonic structure but with persistent chromatic substitution. See paper §6.4 and §7.6 for representative continuations.

Usage

import torch
from huggingface_hub import hf_hub_download
from model import MusicTransformer
from tokenizer import ChordTokenizer

ckpt_path = hf_hub_download(
    repo_id="PearlLeeStudio/TheArtist-MusicTransformer-ft-jazz-only",
    filename="best.pt",
)
tokenizer = ChordTokenizer()
ckpt = torch.load(ckpt_path, map_location="cpu", weights_only=False)

model = MusicTransformer(
    vocab_size=tokenizer.vocab_size,
    d_model=512, n_heads=8, d_ff=2048, n_layers=8,
    max_seq_len=256, dropout=0.0, pad_id=tokenizer.pad_id,
)
model.load_state_dict(ckpt["model_state_dict"])
model.eval()

Training-data licenses

Dataset License
Jazz Harmony Treebank Public
JazzStandards (iReal Pro) Community redistribution
Weimar Jazz Database ODbL
JAAH Research-use public

Citation

Preprint: arXiv:2605.04998.

@misc{lee2026chordmix,
  title         = {Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation},
  author        = {Lee, Jinju},
  year          = {2026},
  eprint        = {2605.04998},
  archivePrefix = {arXiv}
}
Downloads last month
15
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for PearlLeeStudio/TheArtist-MusicTransformer-ft-jazz-only