Music Descriptor Module 3 v2

Dual-stream cross-attention model over:

  • M1 DeBERTa-small scene_vector (256-d) from wrathofgod/scene-perception-m1-unfreeze-deberta-small
  • M2 small-BERT narrative context_vector (256-d) from wrathofgod/narrative-context-m2

Architecture improvements over v1

Feature v1 v2
Fusion cat+Linear(512→256) CrossAttentionFusion (4-head, bidirectional)
Head depth 2-layer 3-layer residual
Orch threshold fixed 0.5 learned per-instrument (14 params)
Aux supervision none M2 tension/arousal/valence (weight=0.15)
Label smoothing no ε=0.1 on all CLS heads
LR schedule CosineAnnealing Warmup(3ep) + CosineAnnealing
SWA no last 5 epochs

8 Music Descriptor Heads

# Head Type Output
1 tempo_bpm regression 45–170 BPM
2 musical_valence regression -1.0 to 1.0
3 tonality 3-class atonal, major, minor
4 harmonic_style 7-class atonal…whole_tone
5 dynamic_shape_m4 8-class crescendo…terraced
6 rhythm_style 6-class drive…sparse
7 texture 5-class ambient…solo
8 orchestration 14-label ambient_pad…woodwinds
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support