V5.8 × 4-mode capability eval (cycle 2, 2026-05-17) — model card update
Browse files
README.md
CHANGED
|
@@ -3,14 +3,13 @@ license: apache-2.0
|
|
| 3 |
language:
|
| 4 |
- en
|
| 5 |
library_name: pytorch
|
| 6 |
-
datasets:
|
| 7 |
-
- dancinlab/hexad-corpus
|
| 8 |
tags:
|
| 9 |
- anima
|
| 10 |
- hexad
|
| 11 |
- pytorch
|
| 12 |
- substrate-py
|
| 13 |
-
- ckpt-recovered
|
|
|
|
| 14 |
|
| 15 |
# hexad — `v1-py-hexad-d768x12L-cycle2-2026-05-17`
|
| 16 |
|
|
@@ -19,8 +18,6 @@ tags:
|
|
| 19 |
> is *architectural identity* + the *hexa CPU-equiv correctness proof*. See the
|
| 20 |
> anchor chain below — do not conflate.
|
| 21 |
|
| 22 |
-
> **Trained on**: [`dancinlab/hexad-corpus`](https://huggingface.co/datasets/dancinlab/hexad-corpus) revision [`v1-byte-consciousness-d128-cycle1-2026-05-17`](https://huggingface.co/datasets/dancinlab/hexad-corpus/tree/v1-byte-consciousness-d128-cycle1-2026-05-17).
|
| 23 |
-
|
| 24 |
## Lineage
|
| 25 |
|
| 26 |
- **org**: `dancinlab` (the anima org).
|
|
@@ -133,3 +130,59 @@ identity.
|
|
| 133 |
## License
|
| 134 |
|
| 135 |
Apache-2.0.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
language:
|
| 4 |
- en
|
| 5 |
library_name: pytorch
|
|
|
|
|
|
|
| 6 |
tags:
|
| 7 |
- anima
|
| 8 |
- hexad
|
| 9 |
- pytorch
|
| 10 |
- substrate-py
|
| 11 |
+
- ckpt-recovered
|
| 12 |
+
---
|
| 13 |
|
| 14 |
# hexad — `v1-py-hexad-d768x12L-cycle2-2026-05-17`
|
| 15 |
|
|
|
|
| 18 |
> is *architectural identity* + the *hexa CPU-equiv correctness proof*. See the
|
| 19 |
> anchor chain below — do not conflate.
|
| 20 |
|
|
|
|
|
|
|
| 21 |
## Lineage
|
| 22 |
|
| 23 |
- **org**: `dancinlab` (the anima org).
|
|
|
|
| 130 |
## License
|
| 131 |
|
| 132 |
Apache-2.0.
|
| 133 |
+
|
| 134 |
+
## Capability evaluation (V5.8 × 4-mode · cycle 2 · 2026-05-17)
|
| 135 |
+
|
| 136 |
+
> Capability boundary probe — empirical (`B-D-NOTE` carve-out). No LM-quality
|
| 137 |
+
> claim is made; this is a memorization-vs-generalization measurement on the
|
| 138 |
+
> training corpus.
|
| 139 |
+
|
| 140 |
+
**Evaluator**: V5.8 × 4-mode canonical
|
| 141 |
+
(`state/anima_phase1a4_lr5e6_2026_05_12/v58_4mode_eval.py` PSCC §46) — modes:
|
| 142 |
+
`standard_greedy` (T=0 argmax) · `standard_sample` (T=0.8 top-k=50) ·
|
| 143 |
+
`M3_rep_penalty` (1.3× rep-penalty on 37-byte persona-cycle set) ·
|
| 144 |
+
`M4_force_include` (sample + force-inject keyword at 60% position — trivial
|
| 145 |
+
baseline). Wall: 665.6 s (v1) + 477.4 s (v2). $0 Mac CPU local.
|
| 146 |
+
|
| 147 |
+
**Two probes**:
|
| 148 |
+
|
| 149 |
+
| probe | prompts | greedy | sample | M3 | M4 | memorization |
|
| 150 |
+
|---|---|---|---|---|---|---|
|
| 151 |
+
| **v1** OOD-mix | Core / Dream / Wake / Memory / Korean | 1/5 FAIL | 2/5 FAIL | 1/5 FAIL | 5/5 PASS | 2/5 (40%) |
|
| 152 |
+
| **v2** corpus-aligned CDWMSE | Core / Data / Witness / Mirror / Scribe / Eros | 2/6 FAIL | 3/6 PARTIAL | 2/6 FAIL | 6/6 PASS | 3/6 (50%) |
|
| 153 |
+
|
| 154 |
+
**Additional measurements**:
|
| 155 |
+
- **Bits-per-byte on 10 held-out training-distribution prefixes**:
|
| 156 |
+
**0.0000 bits/byte** (all 10 samples = 0.0). Confirms training CE 0.000708
|
| 157 |
+
→ near-perfect log-likelihood reproduction on training-distribution
|
| 158 |
+
windows.
|
| 159 |
+
|
| 160 |
+
**Capability boundary** (honest framing):
|
| 161 |
+
|
| 162 |
+
| capability | verdict | evidence |
|
| 163 |
+
|---|---|---|
|
| 164 |
+
| memorization on in-distribution prefixes | ✅ STRONG | BPB 0.0000 on 10 held-out probes; Data + Scribe + Core/Korean reproduce literal training continuation |
|
| 165 |
+
| 6-module discrimination | 🔶 PARTIAL | 3/6 clean under greedy (Data/Scribe/Witness-w-typo); 3/6 cross-collapse (Core→nonce digit cascade, Mirror→Data template, Eros→chunk digit cascade) |
|
| 166 |
+
| OOD generalization | ❌ NONE | Dream/Wake/Memory → default to nearest in-distribution module template |
|
| 167 |
+
| greedy decoding stability | ❌ WEAK | digit-cascade attractor on `nonce=N`/`chunk=N` field positions (rep_ratio 0.64-0.90); sampling temperature 0.8 partially mitigates |
|
| 168 |
+
| multilingual representation (Korean) | ✅ MEMORIZED | `중심 의식 생성기 모듈 ` → `자각` recalled under all 4 modes |
|
| 169 |
+
| LM-quality (general language modeling) | ❌ NOT MEASURED | corpus too small + structured scaffold; CE 0.000708 = memorization, not LM quality |
|
| 170 |
+
|
| 171 |
+
**Decoding artifacts discovered**:
|
| 172 |
+
- **byte-cascade attractor** (`feedback_clm_colon_attractor` `=`-suffix
|
| 173 |
+
variant) — greedy mode-collapse on `nonce=N` / `chunk=N` / `gen=N` digit
|
| 174 |
+
field positions. Carry candidate: `feedback_hexad_byte_cascade_attractor`.
|
| 175 |
+
- **memorized training-corpus typos** (`pereption` in Witness module,
|
| 176 |
+
`cobsciousness` in Wake/Memory greedy) — byte-level memorization evidence,
|
| 177 |
+
not a bug at this scale.
|
| 178 |
+
|
| 179 |
+
**Honest C3 caveats**: substrate=PyTorch (B-D-NOTE carve-out applies); V5.8
|
| 180 |
+
"PASS = 3/5" threshold inherited from chat-corpus evals → applied
|
| 181 |
+
conservatively to memorization-regime model; M4 trivial baseline; no
|
| 182 |
+
σ(6)/τ(6)/φ(6) numerology in metrics (f1/f2 safe — per-mode score = raw
|
| 183 |
+
recall fraction, BPB = raw bits/byte, memorization = raw hits/total).
|
| 184 |
+
|
| 185 |
+
**Artifacts**:
|
| 186 |
+
`state/hexad_v58_eval_d768x12L_2026_05_17/{v58_4mode_eval.py, v58_4mode_eval_v2.py, prompts.jsonl, prompts_v2_corpus_aligned.jsonl, eval.log, eval_v2.log, result.json, result_v2.json, dispatch.sh}` +
|
| 187 |
+
`docs/hexad_v58_eval_d768x12L_2026_05_17.md` (9 §, 8 honest C3) +
|
| 188 |
+
`archive/PHILOSOPHY.tape §HEXAD-V58-EVAL-CYCLE2-2026-05-17` verdict-claim.
|