TheArtist Music Transformer — F5 (Jazz Only, no pop rehearsal)

Jazz-only fine-tune with no pop rehearsal. Reference point for catastrophic forgetting in the companion paper. Strictly dominated by F4 on every axis.

One of six checkpoints released alongside the paper Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation (Lee, 2026). See the collection overview at PearlLeeStudio/TheArtist-MusicTransformer-pop-baseline.

Model summary

Field	Value
Architecture	Music Transformer with relative positional attention
Parameters	25,661,440
Vocabulary size	351 tokens
Max sequence length	256
d_model / heads / FFN / layers	512 / 8 / 2048 / 8
Fine-tune resumed from	Phase 0 pop baseline
Best epoch	7

Training data

All 1,513 jazz training sequences. No pop rehearsal data.

Evaluation (held-out per-genre test sets)

Metric	Pop test	Jazz test
Top-1 accuracy	82.10%	81.30%
Top-5 accuracy	96.31%	92.44%
Perplexity	1.96	2.24
Δ vs. Phase 0 baseline	−2.14	+8.44

F5 illustrates the catastrophic-forgetting failure mode that motivated the paper. Pop accuracy collapses by 2.14 points within a single fine-tune epoch and stabilizes there. Jazz top-1 reaches 81.30%, which is matched by F4 (which also keeps an extra 0.92 points of pop). On every operating axis F5 is dominated by F4, so F5 should not be selected as a production checkpoint. It is released here for replication of the per-epoch forgetting curve and for researchers who want to inspect the failure mode directly.

Known failure modes (this checkpoint specifically)

Chord progressions trend toward dense chromatic voicings that are commercially niche. Generations on pop prompts retain diatonic structure but with persistent chromatic substitution. See paper §6.4 and §7.6 for representative continuations.

Usage

import torch
from huggingface_hub import hf_hub_download
from model import MusicTransformer
from tokenizer import ChordTokenizer

ckpt_path = hf_hub_download(
    repo_id="PearlLeeStudio/TheArtist-MusicTransformer-ft-jazz-only",
    filename="best.pt",
)
tokenizer = ChordTokenizer()
ckpt = torch.load(ckpt_path, map_location="cpu", weights_only=False)

model = MusicTransformer(
    vocab_size=tokenizer.vocab_size,
    d_model=512, n_heads=8, d_ff=2048, n_layers=8,
    max_seq_len=256, dropout=0.0, pad_id=tokenizer.pad_id,
)
model.load_state_dict(ckpt["model_state_dict"])
model.eval()

Training-data licenses

Dataset	License
Jazz Harmony Treebank	Public
JazzStandards (iReal Pro)	Community redistribution
Weimar Jazz Database	ODbL
JAAH	Research-use public

Citation

Preprint: arXiv:2605.04998.

@misc{lee2026chordmix,
  title         = {Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation},
  author        = {Lee, Jinju},
  year          = {2026},
  eprint        = {2605.04998},
  archivePrefix = {arXiv}
}

Downloads last month: 15

Paper for PearlLeeStudio/TheArtist-MusicTransformer-ft-jazz-only

Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation

Paper • 2605.04998 • Published 2 days ago