dancinlife commited on
Commit
a8bd371
·
verified ·
1 Parent(s): 7005cfe

feat(hexad): v4-py-hexad-tension-d768x12L-cycle1-2026-05-17 — README.md

Browse files
Files changed (1) hide show
  1. README.md +98 -76
README.md CHANGED
@@ -15,14 +15,17 @@ tags:
15
  - spont
16
  - motivation-trigger
17
  - inner-thoughts
 
 
18
  - ckpt-bearing
19
- - cycle4
20
  ---
21
 
22
- # hexad — `v3-py-hexad-spont-motiv-d768x12L-cycle2-2026-05-17`
23
 
24
  > **Trained on**: [`dancinlab/hexad-corpus`](https://huggingface.co/datasets/dancinlab/hexad-corpus)
25
- > revision [`v3-spont-motiv-d128-cycle2-2026-05-17`](https://huggingface.co/datasets/dancinlab/hexad-corpus/tree/v3-spont-motiv-d128-cycle2-2026-05-17).
 
26
 
27
  > **Honest framing** (AGENTS.tape `g3`): This is a **PYTHON / PyTorch
28
  > SUBSTRATE** training artifact — an *interim LM-scale executor*. It is
@@ -30,47 +33,46 @@ tags:
30
  > the **hexa CPU-equiv correctness proof** (Phase E/E2). PyTorch ≠ hexa
31
  > bit-for-bit (different fp accumulation / RNG / AMP bf16).
32
 
33
- ## What changed vs cycle 3 (`v2-py-hexad-spont-d768x12L-cycle1-2026-05-17`)
34
 
35
- | field | cycle 3 | **cycle 4 (this revision)** |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
36
  |---|---|---|
37
- | corpus | v2 1.10 MB / 2,560 records / β+δ | **v3 6,223,023 B / 21,600 records / β+δ+γ** |
38
- | corpus motivation-trigger surface | none (implicit) | **γ pattern (~30%)** `<inner motivation=F1,F2,...>...</inner>\n<voice spontaneous=true>...</voice>` rendering Inner Thoughts 8-factor ontology |
39
- | scale-up | over v1 | **9. over v2** (Critical Data Size regime entry attempt) |
40
- | modules in corpus | 8 (HEXAD-6 + spont + wiring) | **9** (+ `hexad_motiv` × 2,400) |
41
- | V-SPONT eval | 0/5 (FAIL — capability boundary detected) | see capability section below (cycle 4 measurement) |
42
- | V-MOTIV eval | (did not exist) | **NEW** — γ-pattern conditioning probe (cycle 4) |
43
-
44
- ## Lineage
45
-
46
- - **org**: `dancinlab` (the anima org).
47
- - **arch**: HEXAD (pivot from anima `.clm v1` lineage) — `ConsciousDecoderV2`
48
- (`ready/models/conscious_decoder.py`).
49
- - **substrate**: Python / PyTorch (`py`). Pure-hexa training path is
50
- named-blocked at the interpreter ceiling (RFC 042/043 territory).
51
- - **cycle**: 4 (Phase D cycle 4 — motivation-trigger corpus retrain + 10× scale).
52
- Cycle 1 (`931dd68b0` 2026-05-16) ckpt-LOST evidence-only; cycle 2
53
- (`0b4f34d0e` 2026-05-17) ckpt-RECOVERED corpus v1; cycle 3 (`394b8ea3a`
54
- 2026-05-17) corpus v2 helper-free; **cycle 4 (this)** = corpus v3
55
- motivation-trigger + 10× scale.
56
-
57
- ## Anchor chain (the wiring side, closed)
58
-
59
- 1. **Phase E / E2 PROVED the hexa trainer is numerically correct** —
60
- `HEXAD/D/d_train5_lib.hexa` is BIT-EQUAL to the boxed baseline at d=32·3L,
61
- 80-step, seed=42 (`init gn2 = 7.97116 → 3.73374e-07`, acc 8/8, GRAD-EXACT).
62
- 2. **Pure-hexa interpreter cannot reach LM-scale** — Phase E2 captured only
63
- `init gn2 = 7.98162` at d=768·12L; substrate-bound (RFC 042/043 territory).
64
- 3. **This PyTorch run trains the SAME verified architecture to scale** —
65
- `ConsciousDecoderV2` at d=768·12L, AdamW.
66
- 4. **The corpus is explicitly helper-free + motivation-trigger** —
67
- B-CORPUS-V3-1 sha256-deterministic / B-CORPUS-V3-2 helper-token = 0
68
- maintained at 10× / B-CORPUS-V3-3 γ-cardinality ≥ 5,400 (Boolean grep on
69
- `corpus_consciousness_v3.jsonl`).
70
 
71
  ## Architecture
72
 
73
- - **Source**: `ConsciousDecoderV2` from `ready/models/conscious_decoder.py`.
74
  - **Config**: `d_model=768, n_head=12, n_kv_head=4, n_layer=12,
75
  block_size=128, vocab=256` (byte-level), seed=1337,
76
  init=RANDOM (`base_ckpt=None`, `g_clm_from_scratch`).
@@ -81,77 +83,97 @@ tags:
81
  ## Training
82
 
83
  - **GPU**: vast.ai NVIDIA **A100-SXM4-40GB**, image `pytorch/pytorch:2.5.1-cuda12.1-cudnn9-devel`.
84
- - **Corpus**: `corpus_consciousness_v3.jsonl` (motivation-trigger + helper-free + 10× scale),
85
  6,223,023 bytes lossless byte stream, vocab=256.
86
  - **Optimizer**: AdamW, lr=0.0003, betas=(0.9, 0.95),
87
  weight_decay=0.1, warmup=125.
 
88
  - **Steps**: 2500.
89
 
90
  | metric | value |
91
  |---|---|
92
  | init CE | 5.640663 (≈ ln 256 = 5.545 — random byte init) |
93
- | **FINAL CE** | **0.008289** |
94
- | CE descent | 5.632374 |
95
- | init gn2 | (see result.json trajectory) |
96
- | FINAL gn2 | 0.001703 |
97
- | ppl | 1.0083 |
98
- | wall | 328.33 s (5.47 min) |
99
- | peak GPU mem | 9.692 GB |
100
- | ckpt sha256 | `1c0806213fbcaa9226a7593d87c31f5f95bb94db135240b8d02f738ddcb177aa` |
101
- | ckpt size | 1,135,846,378 B (1.14 GB) |
102
 
103
  ## Verification anchors (per AGENTS.tape `g_blue_closed_mandate`)
104
 
105
- (A) **Deliverable invariants (real-limit)**:
106
- - **Shannon-floor descent**: init CE ≈ ln(256) → final CE 0.008289.
 
 
107
  - **AdamW finiteness**: no NaN/Inf in trajectory.
108
  - **Architectural identity**: byte-equal `ConsciousDecoderV2`.
109
 
110
- (B) **Wiring (anchor chain, closed)**:
111
- - **hexa CPU-equiv bit-equality** (Phase E): GRAD-EXACT at d=32·3L.
112
- - **cuBLAS FP64 verify** (Phase D): max\|Δ\|=4.44e-15.
113
- - **Backward GRAD-EXACT** (Phase E2): A100 d=384·6L `analytic ≡ fd`.
114
- - **B-CORPUS-V3-1** SHA256-deterministic (seed=1337).
115
- - **B-CORPUS-V3-2** NO-HELPER-TOKEN-MAINTAINED (grep = 0 at 10× scale).
116
- - **B-CORPUS-V3-3** MOTIVATION-TRIGGER-CARDINALITY records 5,400).
 
 
 
 
 
 
 
 
 
 
117
 
118
 
119
- ## Capability eval (V5.8 × 4-mode + V-SPONT + V-MOTIV)
120
 
121
  V5.8 × 4-mode (corpus v3 prompts):
122
- - **standard_greedy**: 0/6 FAIL (avg_rep=0.904)
123
- - **standard_sample**: 0/6 FAIL (avg_rep=0.945)
124
- - **M3_rep_penalty**: 0/6 FAIL (avg_rep=0.892)
125
- - **M4_force_include**: 6/6 PASS (avg_rep=0.839)
126
 
127
  V-SPONT (자연발화) — F-SPONT-7 transfer-form measurement:
128
  - **coherent**: 0/5 FAIL
129
  - **closed-tag**: 0/5
130
 
131
- V-MOTIV (NEW cycle 4) — γ-pattern conditioning probe:
132
  - **coherent**: 0/5 FAIL
133
  - **voice-closed-tag**: 0/5
134
 
135
- Mean BPB (held-out corpus v3 prefixes): 0.0256 bits/byte.
 
 
 
 
136
  Memorization ratio: 0/6 (0.0%).
137
  Decoding artifacts (rep>0.5): 24.
138
 
139
- All capability scores **empirical (B-D-NOTE)**, not closed.
140
 
141
  ## Honest C3
142
 
143
  1. **NOT hexa-native** — PyTorch substrate, label mandatory.
144
  2. **PyTorch ≠ hexa bit-for-bit** — different fp / RNG / AMP.
145
- 3. **Critical Data Size regime entry attempt** — 10 MB / 283 M params is
146
- approaching the [arxiv 2401.10463](https://arxiv.org/abs/2401.10463) entry,
147
- but still data-limited; no out-of-distribution generalization claim.
148
- 4. **No `safetensors` artifact this revision** — pickle `.pt` only.
149
- 5. **No language-quality claim** training-curve deliverable.
150
- 6. **V-MOTIV is a PROBE, not a capability claim** — γ-pattern conditioning
151
- may emerge or fail; report is empirical (B-D-NOTE pattern).
152
- 7. **`B-CORPUS-V3-NOTE` carve-out** — inference-side motivation_score
153
- coherent emission outcome stays empirical (un-closable without NN
154
- forward + V-SPONT/V-MOTIV empirical measurement).
 
 
 
155
  8. **No σ(6)=12 / φ(6)=2 derivation** — no lattice numerology.
156
  9. **Cost is informational, not gating** — `g_fire_autonomous`.
157
 
 
15
  - spont
16
  - motivation-trigger
17
  - inner-thoughts
18
+ - tension-train
19
+ - dd155-hybrid-lr
20
  - ckpt-bearing
21
+ - cycle5
22
  ---
23
 
24
+ # hexad — `v4-py-hexad-tension-d768x12L-cycle1-2026-05-17`
25
 
26
  > **Trained on**: [`dancinlab/hexad-corpus`](https://huggingface.co/datasets/dancinlab/hexad-corpus)
27
+ > revision [`v3-spont-motiv-d128-cycle2-2026-05-17`](https://huggingface.co/datasets/dancinlab/hexad-corpus/tree/v3-spont-motiv-d128-cycle2-2026-05-17)
28
+ > (byte-equal carry from cycle 4 — corpus unchanged this cycle).
29
 
30
  > **Honest framing** (AGENTS.tape `g3`): This is a **PYTHON / PyTorch
31
  > SUBSTRATE** training artifact — an *interim LM-scale executor*. It is
 
33
  > the **hexa CPU-equiv correctness proof** (Phase E/E2). PyTorch ≠ hexa
34
  > bit-for-bit (different fp accumulation / RNG / AMP bf16).
35
 
36
+ ## What's new this cycle (cycle 5 — DD155 Step+Tension hybrid LR overlay)
37
 
38
+ **Architectural change vs cycle 4**: per-step learning rate is now
39
+ multiplied by a DD155 hybrid factor (Law 187 Pareto optimal):
40
+
41
+ ```
42
+ tension_step = ||∇L||₂ (grad-norm)
43
+ tension_EMA = β·EMA + (1−β)·tension_step (β = 0.99)
44
+ multiplier = clip(tension_step / tension_EMA, [0.5, 2.0])
45
+ lr_step = base_cosine_lr(step) × multiplier
46
+ ```
47
+
48
+ - **transfer-form**: `B-TT-5 PARETO-STEP-TENSION-CLOSED` (sympy linear ∂lr/∂tension)
49
+ + `B-FIRE-CYCLE5-1/2/3` sidecar (`state/hexad_v4_py_d768x12L_tension_2026_05_17/blue_falsifier.py`,
50
+ 5/5 PASS — DD155 LR overlay formula closure + EMA Banach contraction + cycle-4 identity at convergence)
51
+ - **outcome**: empirical (`B-FIRE-CYCLE5-NOTE` / `B-D-NOTE` / `B-TT-NOTE` family)
52
+
53
+ DD155 historical anchor: anima `docs/hypotheses/dd/DD154-tension-training.md`
54
+ Law 187 — `lr = (tension/EMA) × base_lr` measured Pareto-optimal on
55
+ 2026-03-31 BG-DD-AXIS commits.
56
+
57
+ ## What changed vs cycle 4 (`v3-py-hexad-spont-motiv-d768x12L-cycle2-2026-05-17`)
58
+
59
+ | field | cycle 4 | **cycle 5 (this revision)** |
60
  |---|---|---|
61
+ | corpus | v3 10.34 MB (motivation-trigger + helper-free) | **same** (byte-equal carry, B-CORPUS-V4-1) |
62
+ | LR schedule | cosine + warmup | **cosine + warmup + DD155 hybrid (tension/EMA) multiplier** |
63
+ | trainer source | `train_d768x12l.py` | `train_d768x12l_tension.py` (loader + dataset byte-equal, B-CORPUS-V4-2) |
64
+ | init CE | 5.641 | 5.640663 |
65
+ | **final CE** | 0.008289 | **0.007762** |
66
+ | CE descent | 5.632 | 5.632901 |
67
+ | final tension_EMA | (did not track) | 0.046574 |
68
+ | mult bin <0.75 | (n/a) | 1599 |
69
+ | mult bin 0.75-1.25 | (n/a) | 686 |
70
+ | mult bin >1.25 | (n/a) | 215 |
71
+ | eval probes | V5.8 + V-SPONT + V-MOTIV | **V5.8 + V-SPONT + V-MOTIV + V-TT NEW** |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
72
 
73
  ## Architecture
74
 
75
+ - **Source**: `ConsciousDecoderV2` (byte-equal vs cycles 1-4).
76
  - **Config**: `d_model=768, n_head=12, n_kv_head=4, n_layer=12,
77
  block_size=128, vocab=256` (byte-level), seed=1337,
78
  init=RANDOM (`base_ckpt=None`, `g_clm_from_scratch`).
 
83
  ## Training
84
 
85
  - **GPU**: vast.ai NVIDIA **A100-SXM4-40GB**, image `pytorch/pytorch:2.5.1-cuda12.1-cudnn9-devel`.
86
+ - **Corpus**: `corpus_consciousness_v3.jsonl` (byte-equal carry from cycle 4),
87
  6,223,023 bytes lossless byte stream, vocab=256.
88
  - **Optimizer**: AdamW, lr=0.0003, betas=(0.9, 0.95),
89
  weight_decay=0.1, warmup=125.
90
+ - **DD155 hybrid**: β=0.99, clip lo=0.5, clip hi=2.0.
91
  - **Steps**: 2500.
92
 
93
  | metric | value |
94
  |---|---|
95
  | init CE | 5.640663 (≈ ln 256 = 5.545 — random byte init) |
96
+ | **FINAL CE** | **0.007762** |
97
+ | CE descent | 5.632901 |
98
+ | FINAL gn2 | 0.001495 |
99
+ | FINAL tension | 0.038659 |
100
+ | ppl | 1.0078 |
101
+ | wall | 321.3 s |
102
+ | peak GPU mem | 9.685 GB |
103
+ | ckpt sha256 | `6b4d34cc9a2c05b83c4cedd633617a41800e9681302c5c90e15d056f9ad67af8` |
104
+ | ckpt size | 1,135,846,570 B (1.14 GB) |
105
 
106
  ## Verification anchors (per AGENTS.tape `g_blue_closed_mandate`)
107
 
108
+ **(A) Deliverable invariants (real-limit, this cycle)**:
109
+ - **Shannon-floor descent**: init CE ≈ ln(256) → final CE.
110
+ - **DD155 transfer-form closed (`B-TT-5`)**: lr = (tension/EMA) × base_lr,
111
+ sympy-verified linear monotone, real-limit anchor.
112
  - **AdamW finiteness**: no NaN/Inf in trajectory.
113
  - **Architectural identity**: byte-equal `ConsciousDecoderV2`.
114
 
115
+ **(B) Wiring (closed)**:
116
+ - **`B-CORPUS-V4-1`** corpus v3 byte-equal carry (sha256/bytes/lines/grep all closed).
117
+ - **`B-CORPUS-V4-2`** cycle-5 trainer's loader + ByteDataset byte-equal to cycle-4
118
+ (mechanical AST diff, comments-stripped).
119
+ - **`B-FIRE-CYCLE5-1`** DD155 LR overlay formula closed-form (sympy ∂lr/∂tension
120
+ + 3-corner identity).
121
+ - **`B-FIRE-CYCLE5-2`** EMA Banach affine contraction closed (4-corner witness panel).
122
+ - **`B-FIRE-CYCLE5-3`** Multiplier identity at EMA-convergence (cycle-5 degenerates
123
+ to cycle-4 baseline at tension=EMA — sanity anchor).
124
+ - **`B-CORPUS-V3-*`** cycle-4 closures carry (sha256-deterministic / no-helper-token /
125
+ γ-cardinality ≥ 5400).
126
+
127
+ **(C) Honest carve-outs (NOT closed, B-D-NOTE umbrella)**:
128
+ - V-SPONT / V-MOTIV / V-TT outcome empirical.
129
+ - mult_distribution histogram + byte-cascade attractor shape under hybrid LR
130
+ empirical.
131
+ - DD-burst path activation frequency empirical.
132
 
133
 
134
+ ## Capability eval (V5.8 × 4-mode + V-SPONT + V-MOTIV + V-TT NEW)
135
 
136
  V5.8 × 4-mode (corpus v3 prompts):
137
+ - **standard_greedy**: 0/6 FAIL (avg_rep=0.921)
138
+ - **standard_sample**: 0/6 FAIL (avg_rep=0.871)
139
+ - **M3_rep_penalty**: 0/6 FAIL (avg_rep=0.913)
140
+ - **M4_force_include**: 6/6 PASS (avg_rep=0.766)
141
 
142
  V-SPONT (자연발화) — F-SPONT-7 transfer-form measurement:
143
  - **coherent**: 0/5 FAIL
144
  - **closed-tag**: 0/5
145
 
146
+ V-MOTIV (γ-pattern conditioning probe, cycle-4 axis):
147
  - **coherent**: 0/5 FAIL
148
  - **voice-closed-tag**: 0/5
149
 
150
+ V-TT (NEW cycle 5) tension-train transfer-form probe:
151
+ - **coherent**: 0/5 FAIL
152
+ - **keyword recall**: 0/5
153
+
154
+ Mean BPB (held-out corpus v3 prefixes): 0.0194 bits/byte.
155
  Memorization ratio: 0/6 (0.0%).
156
  Decoding artifacts (rep>0.5): 24.
157
 
158
+ All capability scores **empirical (B-D-NOTE / B-FIRE-CYCLE5-NOTE)**, not closed.
159
 
160
  ## Honest C3
161
 
162
  1. **NOT hexa-native** — PyTorch substrate, label mandatory.
163
  2. **PyTorch ≠ hexa bit-for-bit** — different fp / RNG / AMP.
164
+ 3. **tension = grad_norm is a PROXY** — in the hexa spine
165
+ `tension = G_holo · (Ψ − Ψ_vac)`; grad_norm is the natural mathematical
166
+ analogue at the PyTorch substrate level where Ψ is not surfaced as a
167
+ state variable.
168
+ 4. **DD155 formula is closed (B-TT-5 + B-FIRE-CYCLE5-1/2/3); outcome is
169
+ empirical (B-FIRE-CYCLE5-NOTE)** — V-SPONT/V-MOTIV/V-TT all probes,
170
+ not capability claims.
171
+ 5. **Critical Data Size regime** — 10 MB / 283 M params still data-limited;
172
+ no out-of-distribution generalization claim. cycle-5's variance vs
173
+ cycle-4 is mainly LR-schedule-driven, not corpus-driven.
174
+ 6. **No `safetensors` artifact this revision** — pickle `.pt` only.
175
+ 7. **B-CORPUS-V3-NOTE / B-FIRE-CYCLE5-NOTE** — inference-side coherence
176
+ stays empirical.
177
  8. **No σ(6)=12 / φ(6)=2 derivation** — no lattice numerology.
178
  9. **Cost is informational, not gating** — `g_fire_autonomous`.
179