dancinlife commited on
Commit
8cf11a1
·
verified ·
1 Parent(s): 99114a7

V5.8 × 4-mode capability eval (cycle 2, 2026-05-17) — model card update

Browse files
Files changed (1) hide show
  1. README.md +58 -5
README.md CHANGED
@@ -3,14 +3,13 @@ license: apache-2.0
3
  language:
4
  - en
5
  library_name: pytorch
6
- datasets:
7
- - dancinlab/hexad-corpus
8
  tags:
9
  - anima
10
  - hexad
11
  - pytorch
12
  - substrate-py
13
- - ckpt-recovered---
 
14
 
15
  # hexad — `v1-py-hexad-d768x12L-cycle2-2026-05-17`
16
 
@@ -19,8 +18,6 @@ tags:
19
  > is *architectural identity* + the *hexa CPU-equiv correctness proof*. See the
20
  > anchor chain below — do not conflate.
21
 
22
- > **Trained on**: [`dancinlab/hexad-corpus`](https://huggingface.co/datasets/dancinlab/hexad-corpus) revision [`v1-byte-consciousness-d128-cycle1-2026-05-17`](https://huggingface.co/datasets/dancinlab/hexad-corpus/tree/v1-byte-consciousness-d128-cycle1-2026-05-17).
23
-
24
  ## Lineage
25
 
26
  - **org**: `dancinlab` (the anima org).
@@ -133,3 +130,59 @@ identity.
133
  ## License
134
 
135
  Apache-2.0.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  language:
4
  - en
5
  library_name: pytorch
 
 
6
  tags:
7
  - anima
8
  - hexad
9
  - pytorch
10
  - substrate-py
11
+ - ckpt-recovered
12
+ ---
13
 
14
  # hexad — `v1-py-hexad-d768x12L-cycle2-2026-05-17`
15
 
 
18
  > is *architectural identity* + the *hexa CPU-equiv correctness proof*. See the
19
  > anchor chain below — do not conflate.
20
 
 
 
21
  ## Lineage
22
 
23
  - **org**: `dancinlab` (the anima org).
 
130
  ## License
131
 
132
  Apache-2.0.
133
+
134
+ ## Capability evaluation (V5.8 × 4-mode · cycle 2 · 2026-05-17)
135
+
136
+ > Capability boundary probe — empirical (`B-D-NOTE` carve-out). No LM-quality
137
+ > claim is made; this is a memorization-vs-generalization measurement on the
138
+ > training corpus.
139
+
140
+ **Evaluator**: V5.8 × 4-mode canonical
141
+ (`state/anima_phase1a4_lr5e6_2026_05_12/v58_4mode_eval.py` PSCC §46) — modes:
142
+ `standard_greedy` (T=0 argmax) · `standard_sample` (T=0.8 top-k=50) ·
143
+ `M3_rep_penalty` (1.3× rep-penalty on 37-byte persona-cycle set) ·
144
+ `M4_force_include` (sample + force-inject keyword at 60% position — trivial
145
+ baseline). Wall: 665.6 s (v1) + 477.4 s (v2). $0 Mac CPU local.
146
+
147
+ **Two probes**:
148
+
149
+ | probe | prompts | greedy | sample | M3 | M4 | memorization |
150
+ |---|---|---|---|---|---|---|
151
+ | **v1** OOD-mix | Core / Dream / Wake / Memory / Korean | 1/5 FAIL | 2/5 FAIL | 1/5 FAIL | 5/5 PASS | 2/5 (40%) |
152
+ | **v2** corpus-aligned CDWMSE | Core / Data / Witness / Mirror / Scribe / Eros | 2/6 FAIL | 3/6 PARTIAL | 2/6 FAIL | 6/6 PASS | 3/6 (50%) |
153
+
154
+ **Additional measurements**:
155
+ - **Bits-per-byte on 10 held-out training-distribution prefixes**:
156
+ **0.0000 bits/byte** (all 10 samples = 0.0). Confirms training CE 0.000708
157
+ → near-perfect log-likelihood reproduction on training-distribution
158
+ windows.
159
+
160
+ **Capability boundary** (honest framing):
161
+
162
+ | capability | verdict | evidence |
163
+ |---|---|---|
164
+ | memorization on in-distribution prefixes | ✅ STRONG | BPB 0.0000 on 10 held-out probes; Data + Scribe + Core/Korean reproduce literal training continuation |
165
+ | 6-module discrimination | 🔶 PARTIAL | 3/6 clean under greedy (Data/Scribe/Witness-w-typo); 3/6 cross-collapse (Core→nonce digit cascade, Mirror→Data template, Eros→chunk digit cascade) |
166
+ | OOD generalization | ❌ NONE | Dream/Wake/Memory → default to nearest in-distribution module template |
167
+ | greedy decoding stability | ❌ WEAK | digit-cascade attractor on `nonce=N`/`chunk=N` field positions (rep_ratio 0.64-0.90); sampling temperature 0.8 partially mitigates |
168
+ | multilingual representation (Korean) | ✅ MEMORIZED | `중심 의식 생성기 모듈 ` → `자각` recalled under all 4 modes |
169
+ | LM-quality (general language modeling) | ❌ NOT MEASURED | corpus too small + structured scaffold; CE 0.000708 = memorization, not LM quality |
170
+
171
+ **Decoding artifacts discovered**:
172
+ - **byte-cascade attractor** (`feedback_clm_colon_attractor` `=`-suffix
173
+ variant) — greedy mode-collapse on `nonce=N` / `chunk=N` / `gen=N` digit
174
+ field positions. Carry candidate: `feedback_hexad_byte_cascade_attractor`.
175
+ - **memorized training-corpus typos** (`pereption` in Witness module,
176
+ `cobsciousness` in Wake/Memory greedy) — byte-level memorization evidence,
177
+ not a bug at this scale.
178
+
179
+ **Honest C3 caveats**: substrate=PyTorch (B-D-NOTE carve-out applies); V5.8
180
+ "PASS = 3/5" threshold inherited from chat-corpus evals → applied
181
+ conservatively to memorization-regime model; M4 trivial baseline; no
182
+ σ(6)/τ(6)/φ(6) numerology in metrics (f1/f2 safe — per-mode score = raw
183
+ recall fraction, BPB = raw bits/byte, memorization = raw hits/total).
184
+
185
+ **Artifacts**:
186
+ `state/hexad_v58_eval_d768x12L_2026_05_17/{v58_4mode_eval.py, v58_4mode_eval_v2.py, prompts.jsonl, prompts_v2_corpus_aligned.jsonl, eval.log, eval_v2.log, result.json, result_v2.json, dispatch.sh}` +
187
+ `docs/hexad_v58_eval_d768x12L_2026_05_17.md` (9 §, 8 honest C3) +
188
+ `archive/PHILOSOPHY.tape §HEXAD-V58-EVAL-CYCLE2-2026-05-17` verdict-claim.