Music Descriptor Module 3 v2
Dual-stream cross-attention model over:
- M1 DeBERTa-small scene_vector (256-d) from
wrathofgod/scene-perception-m1-unfreeze-deberta-small - M2 small-BERT narrative context_vector (256-d) from
wrathofgod/narrative-context-m2
Architecture improvements over v1
| Feature | v1 | v2 |
|---|---|---|
| Fusion | cat+Linear(512→256) | CrossAttentionFusion (4-head, bidirectional) |
| Head depth | 2-layer | 3-layer residual |
| Orch threshold | fixed 0.5 | learned per-instrument (14 params) |
| Aux supervision | none | M2 tension/arousal/valence (weight=0.15) |
| Label smoothing | no | ε=0.1 on all CLS heads |
| LR schedule | CosineAnnealing | Warmup(3ep) + CosineAnnealing |
| SWA | no | last 5 epochs |
8 Music Descriptor Heads
| # | Head | Type | Output |
|---|---|---|---|
| 1 | tempo_bpm | regression | 45–170 BPM |
| 2 | musical_valence | regression | -1.0 to 1.0 |
| 3 | tonality | 3-class | atonal, major, minor |
| 4 | harmonic_style | 7-class | atonal…whole_tone |
| 5 | dynamic_shape_m4 | 8-class | crescendo…terraced |
| 6 | rhythm_style | 6-class | drive…sparse |
| 7 | texture | 5-class | ambient…solo |
| 8 | orchestration | 14-label | ambient_pad…woodwinds |
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support