JAZZMUS — OMR for Jazz Lead Sheets

An end-to-end Optical Music Recognition model that transcribes handwritten jazz lead sheet images (melody + chords) into Humdrum **kern / **mxhm notation.

Paper: Optical Music Recognition of Jazz Lead Sheets — ISMIR 2025

Overview

This model addresses the challenge of recognising handwritten jazz lead sheets, a score type that encodes melody and chord symbols. Chords are a score component not handled by existing OMR systems, and handwritten images exhibit high variability and quality issues.

The model is a Sheet Music Transformer (SMT) — a ConvNeXt encoder paired with a Transformer decoder — pretrained on polyphonic piano scores (antoniorv6/smt-grandstaff) and fine-tuned on the JAZZMUS dataset (handwritten + synthetic lead sheets) using medium-level tokenisation.

A YOLOv11 staff detector is bundled in this repository to enable full-page inference.

Quick Start

Command-line inference

Clone the repository and install dependencies:

git clone https://github.com/JuanCarlosMartinezSevilla/ISMIR-Jazzmus.git
cd ISMIR-Jazzmus
pip install -e ".[predict]"

Transcribe a full-page lead sheet image:

# Print to stdout
python predict.py page.jpg

# Save to a .krn file
python predict.py page.jpg -o output.krn

# Print and save
python predict.py page.jpg -o output.krn -p

# Process multiple pages
python predict.py page1.jpg page2.jpg -o results/

The script automatically downloads the SMT model and YOLO staff detector weights from this repository.

Programmatic usage

import torch
from safetensors.torch import load_file
from torch.nn import Conv1d
from huggingface_hub import hf_hub_download

from jazzmus.model.smt.configuration_smt import SMTConfig
from jazzmus.model.smt.modeling_smt import SMTModelForCausalLM

REPO_ID = "JuanCarlosMartinezSevilla/jazzmus-model"

# Load config and weights
config = SMTConfig.from_pretrained(REPO_ID)
weights_path = hf_hub_download(repo_id=REPO_ID, filename="model.safetensors")
sd = load_file(weights_path)

# Build model (handles pretrained embedding / output layer size mismatch)
embed_size = sd["decoder.embedding.weight"].shape[0]
out_size = sd["decoder.out_layer.weight"].shape[0]
config.out_categories = embed_size
model = SMTModelForCausalLM(config)
if embed_size != out_size:
    model.decoder.out_layer = Conv1d(config.d_model, out_size, kernel_size=1)
model.load_state_dict(sd, strict=True)
config.out_categories = out_size

model.eval()

Datasets

This model was trained on:

Dataset Description
PRAIG/JAZZMUS 293 handwritten jazz lead sheet page images with ground truth
PRAIG/JAZZMUS_staffLevel 2021 region-level staff crops used for training
PRAIG/JAZZMUS_Synthetic 326 synthetic lead sheet images (Classical + MuseJazz fonts)

Limitations

  • The model operates at staff-region level; full-page inference depends on the bundled YOLO staff detector.
  • Chord recognition is harder than melody recognition — most errors involve chord symbols (see qualitative analysis in the paper).
  • Implicit accidentals from key signatures are sometimes missed.
  • Performance may degrade on score styles not represented in the training set (e.g., printed commercial lead sheets, non-jazz genres).

Citation

@inproceedings{juan_carlos_martinez_sevilla_2025_17811464,
  author       = {Juan Carlos Martinez-Sevilla and
                  Francesco Foscarin and
                  Patricia Garcia-Iasci and
                  David Rizo and
                  Jorge Calvo-Zaragoza and
                  Gerhard Widmer},
  title        = {Optical Music Recognition of Jazz Lead Sheets},
  booktitle    = {Proceedings of the 26th International Society for
                   Music Information Retrieval Conference},
  year         = 2025,
  pages        = {710-716},
  publisher    = {ISMIR},
  month        = sep,
  venue        = {Daejeon, South Korea and Online},
  doi          = {10.5281/zenodo.17811464},
  url          = {https://doi.org/10.5281/zenodo.17811464},
}

Links

Downloads last month
188
Safetensors
Model size
16.1M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for JuanCarlosMartinezSevilla/jazzmus-model

Finetuned
(1)
this model

Datasets used to train JuanCarlosMartinezSevilla/jazzmus-model

Paper for JuanCarlosMartinezSevilla/jazzmus-model