JAZZMUS — OMR for Jazz Lead Sheets

An end-to-end Optical Music Recognition model that transcribes handwritten jazz lead sheet images (melody + chords) into Humdrum **kern / **mxhm notation.

Paper: Optical Music Recognition of Jazz Lead Sheets — ISMIR 2025

Overview

This model addresses the challenge of recognising handwritten jazz lead sheets, a score type that encodes melody and chord symbols. Chords are a score component not handled by existing OMR systems, and handwritten images exhibit high variability and quality issues.

The model is a Sheet Music Transformer (SMT) — a ConvNeXt encoder paired with a Transformer decoder — pretrained on polyphonic piano scores (antoniorv6/smt-grandstaff) and fine-tuned on the JAZZMUS dataset (handwritten + synthetic lead sheets) using medium-level tokenisation.

A YOLOv11 staff detector is bundled in this repository to enable full-page inference.

Quick Start

Command-line inference

Clone the repository and install dependencies:

git clone https://github.com/JuanCarlosMartinezSevilla/ISMIR-Jazzmus.git
cd ISMIR-Jazzmus
pip install -e ".[predict]"

Transcribe a full-page lead sheet image:

# Print to stdout
python predict.py page.jpg

# Save to a .krn file
python predict.py page.jpg -o output.krn

# Print and save
python predict.py page.jpg -o output.krn -p

# Process multiple pages
python predict.py page1.jpg page2.jpg -o results/

The script automatically downloads the SMT model and YOLO staff detector weights from this repository.

Programmatic usage

import torch
from safetensors.torch import load_file
from torch.nn import Conv1d
from huggingface_hub import hf_hub_download

from jazzmus.model.smt.configuration_smt import SMTConfig
from jazzmus.model.smt.modeling_smt import SMTModelForCausalLM

REPO_ID = "JuanCarlosMartinezSevilla/jazzmus-model"

# Load config and weights
config = SMTConfig.from_pretrained(REPO_ID)
weights_path = hf_hub_download(repo_id=REPO_ID, filename="model.safetensors")
sd = load_file(weights_path)

# Build model (handles pretrained embedding / output layer size mismatch)
embed_size = sd["decoder.embedding.weight"].shape[0]
out_size = sd["decoder.out_layer.weight"].shape[0]
config.out_categories = embed_size
model = SMTModelForCausalLM(config)
if embed_size != out_size:
    model.decoder.out_layer = Conv1d(config.d_model, out_size, kernel_size=1)
model.load_state_dict(sd, strict=True)
config.out_categories = out_size

model.eval()

Datasets

This model was trained on:

Dataset	Description
PRAIG/JAZZMUS	293 handwritten jazz lead sheet page images with ground truth
PRAIG/JAZZMUS_staffLevel	2021 region-level staff crops used for training
PRAIG/JAZZMUS_Synthetic	326 synthetic lead sheet images (Classical + MuseJazz fonts)

Limitations

The model operates at staff-region level; full-page inference depends on the bundled YOLO staff detector.
Chord recognition is harder than melody recognition — most errors involve chord symbols (see qualitative analysis in the paper).
Implicit accidentals from key signatures are sometimes missed.
Performance may degrade on score styles not represented in the training set (e.g., printed commercial lead sheets, non-jazz genres).

Citation

@inproceedings{juan_carlos_martinez_sevilla_2025_17811464,
  author       = {Juan Carlos Martinez-Sevilla and
                  Francesco Foscarin and
                  Patricia Garcia-Iasci and
                  David Rizo and
                  Jorge Calvo-Zaragoza and
                  Gerhard Widmer},
  title        = {Optical Music Recognition of Jazz Lead Sheets},
  booktitle    = {Proceedings of the 26th International Society for
                   Music Information Retrieval Conference},
  year         = 2025,
  pages        = {710-716},
  publisher    = {ISMIR},
  month        = sep,
  venue        = {Daejeon, South Korea and Online},
  doi          = {10.5281/zenodo.17811464},
  url          = {https://doi.org/10.5281/zenodo.17811464},
}

Model tree for JuanCarlosMartinezSevilla/jazzmus-model

Base model

antoniorv6/smt-grandstaff

Finetuned

(1)

this model

Datasets used to train JuanCarlosMartinezSevilla/jazzmus-model

Paper for JuanCarlosMartinezSevilla/jazzmus-model

Optical Music Recognition of Jazz Lead Sheets

Paper • 2509.05329 • Published Aug 31, 2025

JuanCarlosMartinezSevilla
/

jazzmus-model