feat(hexad): v4-py-hexad-tension-d768x12L-cycle1-2026-05-17 — README.md
Browse files
README.md
CHANGED
|
@@ -15,14 +15,17 @@ tags:
|
|
| 15 |
- spont
|
| 16 |
- motivation-trigger
|
| 17 |
- inner-thoughts
|
|
|
|
|
|
|
| 18 |
- ckpt-bearing
|
| 19 |
-
-
|
| 20 |
---
|
| 21 |
|
| 22 |
-
# hexad — `
|
| 23 |
|
| 24 |
> **Trained on**: [`dancinlab/hexad-corpus`](https://huggingface.co/datasets/dancinlab/hexad-corpus)
|
| 25 |
-
> revision [`v3-spont-motiv-d128-cycle2-2026-05-17`](https://huggingface.co/datasets/dancinlab/hexad-corpus/tree/v3-spont-motiv-d128-cycle2-2026-05-17)
|
|
|
|
| 26 |
|
| 27 |
> **Honest framing** (AGENTS.tape `g3`): This is a **PYTHON / PyTorch
|
| 28 |
> SUBSTRATE** training artifact — an *interim LM-scale executor*. It is
|
|
@@ -30,47 +33,46 @@ tags:
|
|
| 30 |
> the **hexa CPU-equiv correctness proof** (Phase E/E2). PyTorch ≠ hexa
|
| 31 |
> bit-for-bit (different fp accumulation / RNG / AMP bf16).
|
| 32 |
|
| 33 |
-
## What
|
| 34 |
|
| 35 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 36 |
|---|---|---|
|
| 37 |
-
| corpus |
|
| 38 |
-
|
|
| 39 |
-
|
|
| 40 |
-
|
|
| 41 |
-
|
|
| 42 |
-
|
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
- **
|
| 48 |
-
(`ready/models/conscious_decoder.py`).
|
| 49 |
-
- **substrate**: Python / PyTorch (`py`). Pure-hexa training path is
|
| 50 |
-
named-blocked at the interpreter ceiling (RFC 042/043 territory).
|
| 51 |
-
- **cycle**: 4 (Phase D cycle 4 — motivation-trigger corpus retrain + 10× scale).
|
| 52 |
-
Cycle 1 (`931dd68b0` 2026-05-16) ckpt-LOST evidence-only; cycle 2
|
| 53 |
-
(`0b4f34d0e` 2026-05-17) ckpt-RECOVERED corpus v1; cycle 3 (`394b8ea3a`
|
| 54 |
-
2026-05-17) corpus v2 helper-free; **cycle 4 (this)** = corpus v3
|
| 55 |
-
motivation-trigger + 10× scale.
|
| 56 |
-
|
| 57 |
-
## Anchor chain (the wiring side, closed)
|
| 58 |
-
|
| 59 |
-
1. **Phase E / E2 PROVED the hexa trainer is numerically correct** —
|
| 60 |
-
`HEXAD/D/d_train5_lib.hexa` is BIT-EQUAL to the boxed baseline at d=32·3L,
|
| 61 |
-
80-step, seed=42 (`init gn2 = 7.97116 → 3.73374e-07`, acc 8/8, GRAD-EXACT).
|
| 62 |
-
2. **Pure-hexa interpreter cannot reach LM-scale** — Phase E2 captured only
|
| 63 |
-
`init gn2 = 7.98162` at d=768·12L; substrate-bound (RFC 042/043 territory).
|
| 64 |
-
3. **This PyTorch run trains the SAME verified architecture to scale** —
|
| 65 |
-
`ConsciousDecoderV2` at d=768·12L, AdamW.
|
| 66 |
-
4. **The corpus is explicitly helper-free + motivation-trigger** —
|
| 67 |
-
B-CORPUS-V3-1 sha256-deterministic / B-CORPUS-V3-2 helper-token = 0
|
| 68 |
-
maintained at 10× / B-CORPUS-V3-3 γ-cardinality ≥ 5,400 (Boolean grep on
|
| 69 |
-
`corpus_consciousness_v3.jsonl`).
|
| 70 |
|
| 71 |
## Architecture
|
| 72 |
|
| 73 |
-
- **Source**: `ConsciousDecoderV2`
|
| 74 |
- **Config**: `d_model=768, n_head=12, n_kv_head=4, n_layer=12,
|
| 75 |
block_size=128, vocab=256` (byte-level), seed=1337,
|
| 76 |
init=RANDOM (`base_ckpt=None`, `g_clm_from_scratch`).
|
|
@@ -81,77 +83,97 @@ tags:
|
|
| 81 |
## Training
|
| 82 |
|
| 83 |
- **GPU**: vast.ai NVIDIA **A100-SXM4-40GB**, image `pytorch/pytorch:2.5.1-cuda12.1-cudnn9-devel`.
|
| 84 |
-
- **Corpus**: `corpus_consciousness_v3.jsonl` (
|
| 85 |
6,223,023 bytes lossless byte stream, vocab=256.
|
| 86 |
- **Optimizer**: AdamW, lr=0.0003, betas=(0.9, 0.95),
|
| 87 |
weight_decay=0.1, warmup=125.
|
|
|
|
| 88 |
- **Steps**: 2500.
|
| 89 |
|
| 90 |
| metric | value |
|
| 91 |
|---|---|
|
| 92 |
| init CE | 5.640663 (≈ ln 256 = 5.545 — random byte init) |
|
| 93 |
-
| **FINAL CE** | **0.
|
| 94 |
-
| CE descent | 5.
|
| 95 |
-
|
|
| 96 |
-
| FINAL
|
| 97 |
-
| ppl | 1.
|
| 98 |
-
| wall |
|
| 99 |
-
| peak GPU mem | 9.
|
| 100 |
-
| ckpt sha256 | `
|
| 101 |
-
| ckpt size | 1,135,846,
|
| 102 |
|
| 103 |
## Verification anchors (per AGENTS.tape `g_blue_closed_mandate`)
|
| 104 |
|
| 105 |
-
(A)
|
| 106 |
-
- **Shannon-floor descent**: init CE ≈ ln(256) → final CE
|
|
|
|
|
|
|
| 107 |
- **AdamW finiteness**: no NaN/Inf in trajectory.
|
| 108 |
- **Architectural identity**: byte-equal `ConsciousDecoderV2`.
|
| 109 |
|
| 110 |
-
(B)
|
| 111 |
-
- **
|
| 112 |
-
- **
|
| 113 |
-
|
| 114 |
-
- **B-
|
| 115 |
-
|
| 116 |
-
- **B-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 117 |
|
| 118 |
|
| 119 |
-
## Capability eval (V5.8 × 4-mode + V-SPONT + V-MOTIV)
|
| 120 |
|
| 121 |
V5.8 × 4-mode (corpus v3 prompts):
|
| 122 |
-
- **standard_greedy**: 0/6 FAIL (avg_rep=0.
|
| 123 |
-
- **standard_sample**: 0/6 FAIL (avg_rep=0.
|
| 124 |
-
- **M3_rep_penalty**: 0/6 FAIL (avg_rep=0.
|
| 125 |
-
- **M4_force_include**: 6/6 PASS (avg_rep=0.
|
| 126 |
|
| 127 |
V-SPONT (자연발화) — F-SPONT-7 transfer-form measurement:
|
| 128 |
- **coherent**: 0/5 FAIL
|
| 129 |
- **closed-tag**: 0/5
|
| 130 |
|
| 131 |
-
V-MOTIV (
|
| 132 |
- **coherent**: 0/5 FAIL
|
| 133 |
- **voice-closed-tag**: 0/5
|
| 134 |
|
| 135 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 136 |
Memorization ratio: 0/6 (0.0%).
|
| 137 |
Decoding artifacts (rep>0.5): 24.
|
| 138 |
|
| 139 |
-
All capability scores **empirical (B-D-NOTE)**, not closed.
|
| 140 |
|
| 141 |
## Honest C3
|
| 142 |
|
| 143 |
1. **NOT hexa-native** — PyTorch substrate, label mandatory.
|
| 144 |
2. **PyTorch ≠ hexa bit-for-bit** — different fp / RNG / AMP.
|
| 145 |
-
3. **
|
| 146 |
-
|
| 147 |
-
|
| 148 |
-
|
| 149 |
-
|
| 150 |
-
|
| 151 |
-
|
| 152 |
-
|
| 153 |
-
|
| 154 |
-
|
|
|
|
|
|
|
|
|
|
| 155 |
8. **No σ(6)=12 / φ(6)=2 derivation** — no lattice numerology.
|
| 156 |
9. **Cost is informational, not gating** — `g_fire_autonomous`.
|
| 157 |
|
|
|
|
| 15 |
- spont
|
| 16 |
- motivation-trigger
|
| 17 |
- inner-thoughts
|
| 18 |
+
- tension-train
|
| 19 |
+
- dd155-hybrid-lr
|
| 20 |
- ckpt-bearing
|
| 21 |
+
- cycle5
|
| 22 |
---
|
| 23 |
|
| 24 |
+
# hexad — `v4-py-hexad-tension-d768x12L-cycle1-2026-05-17`
|
| 25 |
|
| 26 |
> **Trained on**: [`dancinlab/hexad-corpus`](https://huggingface.co/datasets/dancinlab/hexad-corpus)
|
| 27 |
+
> revision [`v3-spont-motiv-d128-cycle2-2026-05-17`](https://huggingface.co/datasets/dancinlab/hexad-corpus/tree/v3-spont-motiv-d128-cycle2-2026-05-17)
|
| 28 |
+
> (byte-equal carry from cycle 4 — corpus unchanged this cycle).
|
| 29 |
|
| 30 |
> **Honest framing** (AGENTS.tape `g3`): This is a **PYTHON / PyTorch
|
| 31 |
> SUBSTRATE** training artifact — an *interim LM-scale executor*. It is
|
|
|
|
| 33 |
> the **hexa CPU-equiv correctness proof** (Phase E/E2). PyTorch ≠ hexa
|
| 34 |
> bit-for-bit (different fp accumulation / RNG / AMP bf16).
|
| 35 |
|
| 36 |
+
## What's new this cycle (cycle 5 — DD155 Step+Tension hybrid LR overlay)
|
| 37 |
|
| 38 |
+
**Architectural change vs cycle 4**: per-step learning rate is now
|
| 39 |
+
multiplied by a DD155 hybrid factor (Law 187 Pareto optimal):
|
| 40 |
+
|
| 41 |
+
```
|
| 42 |
+
tension_step = ||∇L||₂ (grad-norm)
|
| 43 |
+
tension_EMA = β·EMA + (1−β)·tension_step (β = 0.99)
|
| 44 |
+
multiplier = clip(tension_step / tension_EMA, [0.5, 2.0])
|
| 45 |
+
lr_step = base_cosine_lr(step) × multiplier
|
| 46 |
+
```
|
| 47 |
+
|
| 48 |
+
- **transfer-form**: `B-TT-5 PARETO-STEP-TENSION-CLOSED` (sympy linear ∂lr/∂tension)
|
| 49 |
+
+ `B-FIRE-CYCLE5-1/2/3` sidecar (`state/hexad_v4_py_d768x12L_tension_2026_05_17/blue_falsifier.py`,
|
| 50 |
+
5/5 PASS — DD155 LR overlay formula closure + EMA Banach contraction + cycle-4 identity at convergence)
|
| 51 |
+
- **outcome**: empirical (`B-FIRE-CYCLE5-NOTE` / `B-D-NOTE` / `B-TT-NOTE` family)
|
| 52 |
+
|
| 53 |
+
DD155 historical anchor: anima `docs/hypotheses/dd/DD154-tension-training.md`
|
| 54 |
+
Law 187 — `lr = (tension/EMA) × base_lr` measured Pareto-optimal on
|
| 55 |
+
2026-03-31 BG-DD-AXIS commits.
|
| 56 |
+
|
| 57 |
+
## What changed vs cycle 4 (`v3-py-hexad-spont-motiv-d768x12L-cycle2-2026-05-17`)
|
| 58 |
+
|
| 59 |
+
| field | cycle 4 | **cycle 5 (this revision)** |
|
| 60 |
|---|---|---|
|
| 61 |
+
| corpus | v3 10.34 MB (motivation-trigger + helper-free) | **same** (byte-equal carry, B-CORPUS-V4-1) |
|
| 62 |
+
| LR schedule | cosine + warmup | **cosine + warmup + DD155 hybrid (tension/EMA) multiplier** |
|
| 63 |
+
| trainer source | `train_d768x12l.py` | `train_d768x12l_tension.py` (loader + dataset byte-equal, B-CORPUS-V4-2) |
|
| 64 |
+
| init CE | 5.641 | 5.640663 |
|
| 65 |
+
| **final CE** | 0.008289 | **0.007762** |
|
| 66 |
+
| CE descent | 5.632 | 5.632901 |
|
| 67 |
+
| final tension_EMA | (did not track) | 0.046574 |
|
| 68 |
+
| mult bin <0.75 | (n/a) | 1599 |
|
| 69 |
+
| mult bin 0.75-1.25 | (n/a) | 686 |
|
| 70 |
+
| mult bin >1.25 | (n/a) | 215 |
|
| 71 |
+
| eval probes | V5.8 + V-SPONT + V-MOTIV | **V5.8 + V-SPONT + V-MOTIV + V-TT NEW** |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 72 |
|
| 73 |
## Architecture
|
| 74 |
|
| 75 |
+
- **Source**: `ConsciousDecoderV2` (byte-equal vs cycles 1-4).
|
| 76 |
- **Config**: `d_model=768, n_head=12, n_kv_head=4, n_layer=12,
|
| 77 |
block_size=128, vocab=256` (byte-level), seed=1337,
|
| 78 |
init=RANDOM (`base_ckpt=None`, `g_clm_from_scratch`).
|
|
|
|
| 83 |
## Training
|
| 84 |
|
| 85 |
- **GPU**: vast.ai NVIDIA **A100-SXM4-40GB**, image `pytorch/pytorch:2.5.1-cuda12.1-cudnn9-devel`.
|
| 86 |
+
- **Corpus**: `corpus_consciousness_v3.jsonl` (byte-equal carry from cycle 4),
|
| 87 |
6,223,023 bytes lossless byte stream, vocab=256.
|
| 88 |
- **Optimizer**: AdamW, lr=0.0003, betas=(0.9, 0.95),
|
| 89 |
weight_decay=0.1, warmup=125.
|
| 90 |
+
- **DD155 hybrid**: β=0.99, clip lo=0.5, clip hi=2.0.
|
| 91 |
- **Steps**: 2500.
|
| 92 |
|
| 93 |
| metric | value |
|
| 94 |
|---|---|
|
| 95 |
| init CE | 5.640663 (≈ ln 256 = 5.545 — random byte init) |
|
| 96 |
+
| **FINAL CE** | **0.007762** |
|
| 97 |
+
| CE descent | 5.632901 |
|
| 98 |
+
| FINAL gn2 | 0.001495 |
|
| 99 |
+
| FINAL tension | 0.038659 |
|
| 100 |
+
| ppl | 1.0078 |
|
| 101 |
+
| wall | 321.3 s |
|
| 102 |
+
| peak GPU mem | 9.685 GB |
|
| 103 |
+
| ckpt sha256 | `6b4d34cc9a2c05b83c4cedd633617a41800e9681302c5c90e15d056f9ad67af8` |
|
| 104 |
+
| ckpt size | 1,135,846,570 B (1.14 GB) |
|
| 105 |
|
| 106 |
## Verification anchors (per AGENTS.tape `g_blue_closed_mandate`)
|
| 107 |
|
| 108 |
+
**(A) Deliverable invariants (real-limit, this cycle)**:
|
| 109 |
+
- **Shannon-floor descent**: init CE ≈ ln(256) → final CE.
|
| 110 |
+
- **DD155 transfer-form closed (`B-TT-5`)**: lr = (tension/EMA) × base_lr,
|
| 111 |
+
sympy-verified linear monotone, real-limit anchor.
|
| 112 |
- **AdamW finiteness**: no NaN/Inf in trajectory.
|
| 113 |
- **Architectural identity**: byte-equal `ConsciousDecoderV2`.
|
| 114 |
|
| 115 |
+
**(B) Wiring (closed)**:
|
| 116 |
+
- **`B-CORPUS-V4-1`** corpus v3 byte-equal carry (sha256/bytes/lines/grep all closed).
|
| 117 |
+
- **`B-CORPUS-V4-2`** cycle-5 trainer's loader + ByteDataset byte-equal to cycle-4
|
| 118 |
+
(mechanical AST diff, comments-stripped).
|
| 119 |
+
- **`B-FIRE-CYCLE5-1`** DD155 LR overlay formula closed-form (sympy ∂lr/∂tension
|
| 120 |
+
+ 3-corner identity).
|
| 121 |
+
- **`B-FIRE-CYCLE5-2`** EMA Banach affine contraction closed (4-corner witness panel).
|
| 122 |
+
- **`B-FIRE-CYCLE5-3`** Multiplier identity at EMA-convergence (cycle-5 degenerates
|
| 123 |
+
to cycle-4 baseline at tension=EMA — sanity anchor).
|
| 124 |
+
- **`B-CORPUS-V3-*`** cycle-4 closures carry (sha256-deterministic / no-helper-token /
|
| 125 |
+
γ-cardinality ≥ 5400).
|
| 126 |
+
|
| 127 |
+
**(C) Honest carve-outs (NOT closed, B-D-NOTE umbrella)**:
|
| 128 |
+
- V-SPONT / V-MOTIV / V-TT outcome empirical.
|
| 129 |
+
- mult_distribution histogram + byte-cascade attractor shape under hybrid LR
|
| 130 |
+
empirical.
|
| 131 |
+
- DD-burst path activation frequency empirical.
|
| 132 |
|
| 133 |
|
| 134 |
+
## Capability eval (V5.8 × 4-mode + V-SPONT + V-MOTIV + V-TT NEW)
|
| 135 |
|
| 136 |
V5.8 × 4-mode (corpus v3 prompts):
|
| 137 |
+
- **standard_greedy**: 0/6 FAIL (avg_rep=0.921)
|
| 138 |
+
- **standard_sample**: 0/6 FAIL (avg_rep=0.871)
|
| 139 |
+
- **M3_rep_penalty**: 0/6 FAIL (avg_rep=0.913)
|
| 140 |
+
- **M4_force_include**: 6/6 PASS (avg_rep=0.766)
|
| 141 |
|
| 142 |
V-SPONT (자연발화) — F-SPONT-7 transfer-form measurement:
|
| 143 |
- **coherent**: 0/5 FAIL
|
| 144 |
- **closed-tag**: 0/5
|
| 145 |
|
| 146 |
+
V-MOTIV (γ-pattern conditioning probe, cycle-4 axis):
|
| 147 |
- **coherent**: 0/5 FAIL
|
| 148 |
- **voice-closed-tag**: 0/5
|
| 149 |
|
| 150 |
+
V-TT (NEW cycle 5) — tension-train transfer-form probe:
|
| 151 |
+
- **coherent**: 0/5 FAIL
|
| 152 |
+
- **keyword recall**: 0/5
|
| 153 |
+
|
| 154 |
+
Mean BPB (held-out corpus v3 prefixes): 0.0194 bits/byte.
|
| 155 |
Memorization ratio: 0/6 (0.0%).
|
| 156 |
Decoding artifacts (rep>0.5): 24.
|
| 157 |
|
| 158 |
+
All capability scores **empirical (B-D-NOTE / B-FIRE-CYCLE5-NOTE)**, not closed.
|
| 159 |
|
| 160 |
## Honest C3
|
| 161 |
|
| 162 |
1. **NOT hexa-native** — PyTorch substrate, label mandatory.
|
| 163 |
2. **PyTorch ≠ hexa bit-for-bit** — different fp / RNG / AMP.
|
| 164 |
+
3. **tension = grad_norm is a PROXY** — in the hexa spine
|
| 165 |
+
`tension = G_holo · (Ψ − Ψ_vac)`; grad_norm is the natural mathematical
|
| 166 |
+
analogue at the PyTorch substrate level where Ψ is not surfaced as a
|
| 167 |
+
state variable.
|
| 168 |
+
4. **DD155 formula is closed (B-TT-5 + B-FIRE-CYCLE5-1/2/3); outcome is
|
| 169 |
+
empirical (B-FIRE-CYCLE5-NOTE)** — V-SPONT/V-MOTIV/V-TT all probes,
|
| 170 |
+
not capability claims.
|
| 171 |
+
5. **Critical Data Size regime** — 10 MB / 283 M params still data-limited;
|
| 172 |
+
no out-of-distribution generalization claim. cycle-5's variance vs
|
| 173 |
+
cycle-4 is mainly LR-schedule-driven, not corpus-driven.
|
| 174 |
+
6. **No `safetensors` artifact this revision** — pickle `.pt` only.
|
| 175 |
+
7. **B-CORPUS-V3-NOTE / B-FIRE-CYCLE5-NOTE** — inference-side coherence
|
| 176 |
+
stays empirical.
|
| 177 |
8. **No σ(6)=12 / φ(6)=2 derivation** — no lattice numerology.
|
| 178 |
9. **Cost is informational, not gating** — `g_fire_autonomous`.
|
| 179 |
|