earthlyframes commited on
Commit
8463d89
Β·
verified Β·
1 Parent(s): 20bf3f2

fix: rewrite README with clean line endings and corrected YAML frontmatter

Browse files
Files changed (1) hide show
  1. README.md +94 -89
README.md CHANGED
@@ -1,108 +1,113 @@
1
- ---
2
- language: en
3
- tags:
4
- - audio
5
- - music
6
- - classification
 
 
 
 
7
  - onnx
8
- - chromatic
9
- license: other
10
  base_model:
11
- - laion/larger_clap_music
12
  - microsoft/deberta-v3-base
13
- ---
14
-
15
  # Refractor CDM
16
 
17
- **Refractor CDM** (Compact Disc Module) is a lightweight MLP calibration head that classifies full-mix audio recordings into one of nine "rainbow colors" β€” a chromatic taxonomy used in *The Rainbow
18
- Table*, an AI-assisted album series.
19
-
20
- The CDM is a companion to the base Refractor ONNX model (a multimodal fusion network trained on short catalog segments). The base model works well for MIDI and short audio clips but predicts poorly on
21
- full-mix audio because CLAP embeddings are optimized for short segments. The CDM corrects this by training directly on chunked full-mix audio.
22
-
23
- ## Model Details
24
 
25
  | Property | Value |
26
  |---|---|
27
  | Architecture | 2-layer MLP (256 β†’ 128 β†’ 9) |
28
- | Parameters | 361,993 |
29
  | Input | CLAP audio (512-dim) + DeBERTa concept (768-dim) = 1280-dim |
30
- | Output | Softmax probabilities over 9 colors (`color_probs`, shape `[batch, 9]`) |
31
- | Format | ONNX (`refractor_cdm.onnx`, 1.4 MB) |
32
- | Training data | 3,450 chunks from 78 full-mix songs across all 9 colors |
33
- | Loss | CrossEntropyLoss with label smoothing (0.1) + inverse-frequency class weights |
34
-
35
- ## Color Classes
36
-
37
- Index Color CHROMATIC_TARGETS (temporal / spatial / ontological)
 
38
  0 Red Past-heavy / Thing-heavy / Known-heavy
39
- 1 Orange Present-heavy / Thing-heavy / Known-heavy
40
- 2 Yellow Present-heavy / Place-heavy / Known-heavy
41
- 3 Green Present-heavy / Place-heavy / Known-heavy ← same as Yellow
42
- 4 Blue Future-heavy / Place-heavy / Forgotten-heavy
43
- 5 Indigo Future-heavy / Future-heavy / Forgotten-heavy
44
  6 Violet Future-heavy / Future-heavy / Imagined-heavy
45
- 7 White Uniform across all axes
46
- 8 Black Present-heavy / Thing-heavy / Imagined-heavy
47
-
48
- ## Validation Results
49
-
50
- Evaluated on 78 labeled songs from `staged_raw_material` using 30s/5s-stride chunked scoring with confidence-weighted aggregation.
51
-
52
- | Color | Correct | Total | Accuracy |
53
- |---|---|---|---|
54
- | Red | 11 | 12 | 91.7% |
55
- | Orange | 4 | 4 | 100.0% |
56
- | Yellow | 10 | 10 | 100.0% |
 
57
  | Green | 0 | 8 | 0.0% ⚠️ |
58
- | Blue | 11 | 11 | 100.0% |
59
- | Indigo | 10 | 11 | 90.9% |
60
  | Violet | 11 | 12 | 91.7% |
61
- | White | 0 | 10 | 0.0% ⚠️ |
62
- | **Overall** | **57** | **78** | **73.1%** |
63
-
64
- **Green (0%)** β€” all predicted as Yellow. This is pipeline-safe: Green and Yellow share identical CHROMATIC_TARGETS distributions, so downstream chromatic match and drift scores are unaffected.
65
-
66
- **White (0%)** β€” all predicted as Yellow or Blue. White's uniform `[0.33, 0.34, 0.33]` targets are meaningfully different, so this is a known open issue. White albums are musically intentionally
67
- diverse, which makes them acoustically diffuse in CLAP's feature space.
68
-
69
- ## Usage
70
-
71
- The CDM is used via the `Refractor` wrapper in `training/refractor.py`. It auto-loads when `training/data/refractor_cdm.onnx` is present.
72
-
73
- ```python
74
- from training.refractor import Refractor
75
-
76
- scorer = Refractor() # CDM auto-detected
77
-
78
- # Score a full-mix wav with concept text
79
- result = scorer.score(
80
  audio_emb=scorer.prepare_audio(waveform, sr=48000),
81
- concept_emb=scorer.prepare_concept("A song about forgetting the future"),
82
- )
83
- # result: {"temporal": {...}, "spatial": {...}, "ontological": {...}, "confidence": 0.93}
84
-
85
- For full-mix WAV files, use chunk_audio + aggregate_chunk_scores from app/generators/midi/production/score_mix.py to score in overlapping windows and pool results.
86
-
87
- Training
88
-
 
 
89
  # Phase 1 β€” extract CLAP + concept embeddings from staged_raw_material/
90
- python training/extract_cdm_embeddings.py
91
-
92
- # Phase 2 β€” train on Modal (A10G)
93
- modal run training/modal_train_refractor_cdm.py
94
-
95
- # Validate
96
  python training/validate_mix_scoring.py
97
-
98
- Limitations
 
99
 
100
- - CLAP embeddings have a maximum internal window of ~10s; chunking is essential for full-length tracks
101
  - Green and White classification are unreliable (see validation results above)
102
- - Training data is drawn from a single artist's catalog β€” generalization to other music is untested
103
- - The concept embedding path requires a DeBERTa-v3-base inference pass (~600 MB model)
104
-
105
- Citation
106
-
107
- Part of The Rainbow Table generative music pipeline. See https://github.com/brotherclone/white.
108
- ```
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: other
5
+ license_name: earthlyframes-collaborative-intelligence-license
6
+ license_link: https://github.com/brotherclone/white/blob/main/COLLABORATIVE_INTELLIGENCE_LICENSE.md
7
+ pipeline_tag: audio-classification
8
+ tags:
9
+ - audio
10
+ - music
11
  - onnx
12
+ - chromatic
13
+ - rainbow-table
14
  base_model:
15
+ - laion/larger_clap_music
16
  - microsoft/deberta-v3-base
17
+ ---
18
+
19
  # Refractor CDM
20
 
21
+ **Refractor CDM** (Compact Disc Module) is a lightweight MLP calibration head that classifies full-mix audio recordings into one of nine "rainbow colors" β€” a chromatic taxonomy used in *The Rainbow Table*, an AI-assisted album series.
22
+
23
+ The CDM is a companion to the base Refractor ONNX model (a multimodal fusion network trained on short catalog segments). The base model works well for MIDI and short audio clips but predicts poorly on full-mix audio because CLAP embeddings are optimized for short segments. The CDM corrects this by training directly on chunked full-mix audio.
24
+
25
+ ## Model Details
 
 
26
 
27
  | Property | Value |
28
  |---|---|
29
  | Architecture | 2-layer MLP (256 β†’ 128 β†’ 9) |
30
+ | Parameters | 361,993 |
31
  | Input | CLAP audio (512-dim) + DeBERTa concept (768-dim) = 1280-dim |
32
+ | Output | Softmax probabilities over 9 colors (`color_probs`, shape `[batch, 9]`) |
33
+ | Format | ONNX (`refractor_cdm.onnx`, 1.4 MB) |
34
+ | Training data | 3,450 chunks from 78 full-mix songs across all 9 colors |
35
+ | Loss | CrossEntropyLoss with label smoothing (0.1) + inverse-frequency class weights |
36
+
37
+ ## Color Classes
38
+
39
+ ```
40
+ Index Color CHROMATIC_TARGETS (temporal / spatial / ontological)
41
  0 Red Past-heavy / Thing-heavy / Known-heavy
42
+ 1 Orange Present-heavy / Thing-heavy / Known-heavy
43
+ 2 Yellow Present-heavy / Place-heavy / Known-heavy
44
+ 3 Green Present-heavy / Place-heavy / Known-heavy <- same targets as Yellow
45
+ 4 Blue Future-heavy / Place-heavy / Forgotten-heavy
46
+ 5 Indigo Future-heavy / Future-heavy / Forgotten-heavy
47
  6 Violet Future-heavy / Future-heavy / Imagined-heavy
48
+ 7 White Uniform across all axes
49
+ 8 Black Present-heavy / Thing-heavy / Imagined-heavy
50
+ ```
51
+
52
+ ## Validation Results
53
+
54
+ Evaluated on 78 labeled songs from `staged_raw_material` using 30s/5s-stride chunked scoring with confidence-weighted aggregation.
55
+
56
+ | Color | Correct | Total | Accuracy |
57
+ |---|---|---|---|
58
+ | Red | 11 | 12 | 91.7% |
59
+ | Orange | 4 | 4 | 100.0% |
60
+ | Yellow | 10 | 10 | 100.0% |
61
  | Green | 0 | 8 | 0.0% ⚠️ |
62
+ | Blue | 11 | 11 | 100.0% |
63
+ | Indigo | 10 | 11 | 90.9% |
64
  | Violet | 11 | 12 | 91.7% |
65
+ | White | 0 | 10 | 0.0% ⚠️ |
66
+ | **Overall** | **57** | **78** | **73.1%** |
67
+
68
+ **Green (0%)** β€” all predicted as Yellow. This is pipeline-safe: Green and Yellow share identical CHROMATIC_TARGETS distributions, so downstream chromatic match and drift scores are unaffected.
69
+
70
+ **White (0%)** β€” all predicted as Yellow or Blue. White's uniform `[0.33, 0.34, 0.33]` targets are meaningfully different, so this is a known open issue. White albums are musically intentionally diverse, which makes them acoustically diffuse in CLAP's feature space.
71
+
72
+ ## Usage
73
+
74
+ The CDM is used via the `Refractor` wrapper. It auto-loads when `refractor_cdm.onnx` is present alongside `refractor.onnx`.
75
+
76
+ ```python
77
+ from training.refractor import Refractor
78
+
79
+ scorer = Refractor() # CDM auto-detected
80
+
81
+ result = scorer.score(
 
 
82
  audio_emb=scorer.prepare_audio(waveform, sr=48000),
83
+ concept_emb=scorer.prepare_concept("A song about forgetting the future"),
84
+ )
85
+ # result: {"temporal": {...}, "spatial": {...}, "ontological": {...}, "confidence": 0.93}
86
+ ```
87
+
88
+ For full-mix WAV files, use `chunk_audio` + `aggregate_chunk_scores` from `score_mix.py` to score in overlapping windows and pool results.
89
+
90
+ ## Training
91
+
92
+ ```bash
93
  # Phase 1 β€” extract CLAP + concept embeddings from staged_raw_material/
94
+ python training/extract_cdm_embeddings.py
95
+
96
+ # Phase 2 β€” train on Modal (A10G GPU)
97
+ modal run training/modal_train_refractor_cdm.py
98
+
99
+ # Validate
100
  python training/validate_mix_scoring.py
101
+ ```
102
+
103
+ ## Limitations
104
 
105
+ - CLAP embeddings have a maximum internal window of ~10s; chunked scoring is essential for full-length tracks
106
  - Green and White classification are unreliable (see validation results above)
107
+ - Training data is drawn from a single artist's catalog β€” generalization to other music is untested
108
+ - The concept embedding path requires a DeBERTa-v3-base inference pass (~600 MB model)
109
+
110
+ ## Citation
111
+
112
+ Part of *The Rainbow Table* generative music pipeline.
113
+ See [brotherclone/white](https://github.com/brotherclone/white) and [earthlyframes/white-training-data](https://huggingface.co/datasets/earthlyframes/white-training-data).