fix: rewrite README with clean line endings and corrected YAML frontmatter
Browse files
README.md
CHANGED
|
@@ -1,108 +1,113 @@
|
|
| 1 |
-
---
|
| 2 |
-
language:
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
- onnx
|
| 8 |
-
- chromatic
|
| 9 |
-
|
| 10 |
base_model:
|
| 11 |
-
- laion/larger_clap_music
|
| 12 |
- microsoft/deberta-v3-base
|
| 13 |
-
---
|
| 14 |
-
|
| 15 |
# Refractor CDM
|
| 16 |
|
| 17 |
-
**Refractor CDM** (Compact Disc Module) is a lightweight MLP calibration head that classifies full-mix audio recordings into one of nine "rainbow colors" β a chromatic taxonomy used in *The Rainbow
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
## Model Details
|
| 24 |
|
| 25 |
| Property | Value |
|
| 26 |
|---|---|
|
| 27 |
| Architecture | 2-layer MLP (256 β 128 β 9) |
|
| 28 |
-
| Parameters | 361,993 |
|
| 29 |
| Input | CLAP audio (512-dim) + DeBERTa concept (768-dim) = 1280-dim |
|
| 30 |
-
| Output | Softmax probabilities over 9 colors (`color_probs`, shape `[batch, 9]`) |
|
| 31 |
-
| Format | ONNX (`refractor_cdm.onnx`, 1.4 MB) |
|
| 32 |
-
| Training data | 3,450 chunks from 78 full-mix songs across all 9 colors |
|
| 33 |
-
| Loss | CrossEntropyLoss with label smoothing (0.1) + inverse-frequency class weights |
|
| 34 |
-
|
| 35 |
-
## Color Classes
|
| 36 |
-
|
| 37 |
-
|
|
|
|
| 38 |
0 Red Past-heavy / Thing-heavy / Known-heavy
|
| 39 |
-
1 Orange Present-heavy / Thing-heavy / Known-heavy
|
| 40 |
-
2 Yellow Present-heavy / Place-heavy / Known-heavy
|
| 41 |
-
3 Green Present-heavy / Place-heavy / Known-heavy
|
| 42 |
-
4 Blue Future-heavy / Place-heavy / Forgotten-heavy
|
| 43 |
-
5 Indigo Future-heavy / Future-heavy / Forgotten-heavy
|
| 44 |
6 Violet Future-heavy / Future-heavy / Imagined-heavy
|
| 45 |
-
7 White Uniform across all axes
|
| 46 |
-
8 Black Present-heavy / Thing-heavy / Imagined-heavy
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
|
| 54 |
-
|
|
| 55 |
-
|
|
| 56 |
-
|
|
|
|
|
| 57 |
| Green | 0 | 8 | 0.0% β οΈ |
|
| 58 |
-
| Blue | 11 | 11 | 100.0% |
|
| 59 |
-
| Indigo | 10 | 11 | 90.9% |
|
| 60 |
| Violet | 11 | 12 | 91.7% |
|
| 61 |
-
| White | 0 | 10 | 0.0% β οΈ |
|
| 62 |
-
| **Overall** | **57** | **78** | **73.1%** |
|
| 63 |
-
|
| 64 |
-
**Green (0%)** β all predicted as Yellow. This is pipeline-safe: Green and Yellow share identical CHROMATIC_TARGETS distributions, so downstream chromatic match and drift scores are unaffected.
|
| 65 |
-
|
| 66 |
-
**White (0%)** β all predicted as Yellow or Blue. White's uniform `[0.33, 0.34, 0.33]` targets are meaningfully different, so this is a known open issue. White albums are musically intentionally
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
|
| 78 |
-
# Score a full-mix wav with concept text
|
| 79 |
-
result = scorer.score(
|
| 80 |
audio_emb=scorer.prepare_audio(waveform, sr=48000),
|
| 81 |
-
concept_emb=scorer.prepare_concept("A song about forgetting the future"),
|
| 82 |
-
)
|
| 83 |
-
# result: {"temporal": {...}, "spatial": {...}, "ontological": {...}, "confidence": 0.93}
|
| 84 |
-
|
| 85 |
-
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
|
|
|
|
|
|
|
| 89 |
# Phase 1 β extract CLAP + concept embeddings from staged_raw_material/
|
| 90 |
-
python training/extract_cdm_embeddings.py
|
| 91 |
-
|
| 92 |
-
# Phase 2 β train on Modal (A10G)
|
| 93 |
-
modal run training/modal_train_refractor_cdm.py
|
| 94 |
-
|
| 95 |
-
# Validate
|
| 96 |
python training/validate_mix_scoring.py
|
| 97 |
-
|
| 98 |
-
|
|
|
|
| 99 |
|
| 100 |
-
- CLAP embeddings have a maximum internal window of ~10s;
|
| 101 |
- Green and White classification are unreliable (see validation results above)
|
| 102 |
-
- Training data is drawn from a single artist's catalog β generalization to other music is untested
|
| 103 |
-
- The concept embedding path requires a DeBERTa-v3-base inference pass (~600 MB model)
|
| 104 |
-
|
| 105 |
-
Citation
|
| 106 |
-
|
| 107 |
-
Part of The Rainbow Table generative music pipeline.
|
| 108 |
-
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language:
|
| 3 |
+
- en
|
| 4 |
+
license: other
|
| 5 |
+
license_name: earthlyframes-collaborative-intelligence-license
|
| 6 |
+
license_link: https://github.com/brotherclone/white/blob/main/COLLABORATIVE_INTELLIGENCE_LICENSE.md
|
| 7 |
+
pipeline_tag: audio-classification
|
| 8 |
+
tags:
|
| 9 |
+
- audio
|
| 10 |
+
- music
|
| 11 |
- onnx
|
| 12 |
+
- chromatic
|
| 13 |
+
- rainbow-table
|
| 14 |
base_model:
|
| 15 |
+
- laion/larger_clap_music
|
| 16 |
- microsoft/deberta-v3-base
|
| 17 |
+
---
|
| 18 |
+
|
| 19 |
# Refractor CDM
|
| 20 |
|
| 21 |
+
**Refractor CDM** (Compact Disc Module) is a lightweight MLP calibration head that classifies full-mix audio recordings into one of nine "rainbow colors" β a chromatic taxonomy used in *The Rainbow Table*, an AI-assisted album series.
|
| 22 |
+
|
| 23 |
+
The CDM is a companion to the base Refractor ONNX model (a multimodal fusion network trained on short catalog segments). The base model works well for MIDI and short audio clips but predicts poorly on full-mix audio because CLAP embeddings are optimized for short segments. The CDM corrects this by training directly on chunked full-mix audio.
|
| 24 |
+
|
| 25 |
+
## Model Details
|
|
|
|
|
|
|
| 26 |
|
| 27 |
| Property | Value |
|
| 28 |
|---|---|
|
| 29 |
| Architecture | 2-layer MLP (256 β 128 β 9) |
|
| 30 |
+
| Parameters | 361,993 |
|
| 31 |
| Input | CLAP audio (512-dim) + DeBERTa concept (768-dim) = 1280-dim |
|
| 32 |
+
| Output | Softmax probabilities over 9 colors (`color_probs`, shape `[batch, 9]`) |
|
| 33 |
+
| Format | ONNX (`refractor_cdm.onnx`, 1.4 MB) |
|
| 34 |
+
| Training data | 3,450 chunks from 78 full-mix songs across all 9 colors |
|
| 35 |
+
| Loss | CrossEntropyLoss with label smoothing (0.1) + inverse-frequency class weights |
|
| 36 |
+
|
| 37 |
+
## Color Classes
|
| 38 |
+
|
| 39 |
+
```
|
| 40 |
+
Index Color CHROMATIC_TARGETS (temporal / spatial / ontological)
|
| 41 |
0 Red Past-heavy / Thing-heavy / Known-heavy
|
| 42 |
+
1 Orange Present-heavy / Thing-heavy / Known-heavy
|
| 43 |
+
2 Yellow Present-heavy / Place-heavy / Known-heavy
|
| 44 |
+
3 Green Present-heavy / Place-heavy / Known-heavy <- same targets as Yellow
|
| 45 |
+
4 Blue Future-heavy / Place-heavy / Forgotten-heavy
|
| 46 |
+
5 Indigo Future-heavy / Future-heavy / Forgotten-heavy
|
| 47 |
6 Violet Future-heavy / Future-heavy / Imagined-heavy
|
| 48 |
+
7 White Uniform across all axes
|
| 49 |
+
8 Black Present-heavy / Thing-heavy / Imagined-heavy
|
| 50 |
+
```
|
| 51 |
+
|
| 52 |
+
## Validation Results
|
| 53 |
+
|
| 54 |
+
Evaluated on 78 labeled songs from `staged_raw_material` using 30s/5s-stride chunked scoring with confidence-weighted aggregation.
|
| 55 |
+
|
| 56 |
+
| Color | Correct | Total | Accuracy |
|
| 57 |
+
|---|---|---|---|
|
| 58 |
+
| Red | 11 | 12 | 91.7% |
|
| 59 |
+
| Orange | 4 | 4 | 100.0% |
|
| 60 |
+
| Yellow | 10 | 10 | 100.0% |
|
| 61 |
| Green | 0 | 8 | 0.0% β οΈ |
|
| 62 |
+
| Blue | 11 | 11 | 100.0% |
|
| 63 |
+
| Indigo | 10 | 11 | 90.9% |
|
| 64 |
| Violet | 11 | 12 | 91.7% |
|
| 65 |
+
| White | 0 | 10 | 0.0% β οΈ |
|
| 66 |
+
| **Overall** | **57** | **78** | **73.1%** |
|
| 67 |
+
|
| 68 |
+
**Green (0%)** β all predicted as Yellow. This is pipeline-safe: Green and Yellow share identical CHROMATIC_TARGETS distributions, so downstream chromatic match and drift scores are unaffected.
|
| 69 |
+
|
| 70 |
+
**White (0%)** β all predicted as Yellow or Blue. White's uniform `[0.33, 0.34, 0.33]` targets are meaningfully different, so this is a known open issue. White albums are musically intentionally diverse, which makes them acoustically diffuse in CLAP's feature space.
|
| 71 |
+
|
| 72 |
+
## Usage
|
| 73 |
+
|
| 74 |
+
The CDM is used via the `Refractor` wrapper. It auto-loads when `refractor_cdm.onnx` is present alongside `refractor.onnx`.
|
| 75 |
+
|
| 76 |
+
```python
|
| 77 |
+
from training.refractor import Refractor
|
| 78 |
+
|
| 79 |
+
scorer = Refractor() # CDM auto-detected
|
| 80 |
+
|
| 81 |
+
result = scorer.score(
|
|
|
|
|
|
|
| 82 |
audio_emb=scorer.prepare_audio(waveform, sr=48000),
|
| 83 |
+
concept_emb=scorer.prepare_concept("A song about forgetting the future"),
|
| 84 |
+
)
|
| 85 |
+
# result: {"temporal": {...}, "spatial": {...}, "ontological": {...}, "confidence": 0.93}
|
| 86 |
+
```
|
| 87 |
+
|
| 88 |
+
For full-mix WAV files, use `chunk_audio` + `aggregate_chunk_scores` from `score_mix.py` to score in overlapping windows and pool results.
|
| 89 |
+
|
| 90 |
+
## Training
|
| 91 |
+
|
| 92 |
+
```bash
|
| 93 |
# Phase 1 β extract CLAP + concept embeddings from staged_raw_material/
|
| 94 |
+
python training/extract_cdm_embeddings.py
|
| 95 |
+
|
| 96 |
+
# Phase 2 β train on Modal (A10G GPU)
|
| 97 |
+
modal run training/modal_train_refractor_cdm.py
|
| 98 |
+
|
| 99 |
+
# Validate
|
| 100 |
python training/validate_mix_scoring.py
|
| 101 |
+
```
|
| 102 |
+
|
| 103 |
+
## Limitations
|
| 104 |
|
| 105 |
+
- CLAP embeddings have a maximum internal window of ~10s; chunked scoring is essential for full-length tracks
|
| 106 |
- Green and White classification are unreliable (see validation results above)
|
| 107 |
+
- Training data is drawn from a single artist's catalog β generalization to other music is untested
|
| 108 |
+
- The concept embedding path requires a DeBERTa-v3-base inference pass (~600 MB model)
|
| 109 |
+
|
| 110 |
+
## Citation
|
| 111 |
+
|
| 112 |
+
Part of *The Rainbow Table* generative music pipeline.
|
| 113 |
+
See [brotherclone/white](https://github.com/brotherclone/white) and [earthlyframes/white-training-data](https://huggingface.co/datasets/earthlyframes/white-training-data).
|