MassSpecGym Ranker: Contrastive Spectral Transformer
This model is a Spectral Transformer trained for high-precision molecular identification from MS/MS spectra. It is part of the MassSpecGym benchmark.
Model Details
- Architecture: Transformer Encoder (2 layers, 4 heads) with Fourier Feature m/z encoding.
- Objective: Contrastive Learning (InfoNCE) with a temperature of 0.1.
- Input: MS/MS fragment peaks (m/z and intensity) + Precursor mass.
- Output: 4096-dimensional molecular embedding for candidate ranking.
Performance (MassSpecGym Test Set)
The model significantly outperforms standard MLP baselines:
- Hit@1: 8.38%
- Hit@5: 20.35%
- Hit@20: 42.00%
Key Features
- Fourier Features: Captures high-precision mass differences essential for isotope identification.
- Precursor Injection: Provides global context to every spectral peak.
- Attention Pooling: Dynamically weights diagnostic peaks while down-weighting noise.
Usage
For full implementation, training scripts, and inference notebooks, visit the GitHub Repository.