feat(hexad): v4-py-hexad-tension-d768x12L-cycle1-2026-05-17 — README.md

Browse files

Files changed (1) hide show

README.md +98 -76

README.md CHANGED Viewed

@@ -15,14 +15,17 @@ tags:
 - spont
 - motivation-trigger
 - inner-thoughts
 - ckpt-bearing
-- cycle4
 ---
-# hexad — `v3-py-hexad-spont-motiv-d768x12L-cycle2-2026-05-17`
 > **Trained on**: [`dancinlab/hexad-corpus`](https://huggingface.co/datasets/dancinlab/hexad-corpus)
-> revision [`v3-spont-motiv-d128-cycle2-2026-05-17`](https://huggingface.co/datasets/dancinlab/hexad-corpus/tree/v3-spont-motiv-d128-cycle2-2026-05-17).
 > **Honest framing** (AGENTS.tape `g3`): This is a **PYTHON / PyTorch
 > SUBSTRATE** training artifact — an *interim LM-scale executor*. It is
@@ -30,47 +33,46 @@ tags:
 > the **hexa CPU-equiv correctness proof** (Phase E/E2). PyTorch ≠ hexa
 > bit-for-bit (different fp accumulation / RNG / AMP bf16).
-## What changed vs cycle 3 (`v2-py-hexad-spont-d768x12L-cycle1-2026-05-17`)
-| field | cycle 3 | **cycle 4 (this revision)** |
 |---|---|---|
-| corpus | v2 1.10 MB / 2,560 records / β+δ | **v3 6,223,023 B / 21,600 records / β+δ+γ** |
-| corpus motivation-trigger surface | none (implicit) | **γ pattern (~30%)** — `<inner motivation=F1,F2,...>...</inner>\n<voice spontaneous=true>...</voice>` rendering Inner Thoughts 8-factor ontology |
-| scale-up | 7× over v1 | **9.4× over v2** (Critical Data Size regime entry attempt) |
-| modules in corpus | 8 (HEXAD-6 + spont + wiring) | **9** (+ `hexad_motiv` × 2,400) |
-| V-SPONT eval | 0/5 (FAIL — capability boundary detected) | see capability section below (cycle 4 measurement) |
-| V-MOTIV eval | (did not exist) | **NEW** — γ-pattern conditioning probe (cycle 4) |
-## Lineage
-- **org**: `dancinlab` (the anima org).
-- **arch**: HEXAD (pivot from anima `.clm v1` lineage) — `ConsciousDecoderV2`
-  (`ready/models/conscious_decoder.py`).
-- **substrate**: Python / PyTorch (`py`). Pure-hexa training path is
-  named-blocked at the interpreter ceiling (RFC 042/043 territory).
-- **cycle**: 4 (Phase D cycle 4 — motivation-trigger corpus retrain + 10× scale).
-  Cycle 1 (`931dd68b0` 2026-05-16) ckpt-LOST evidence-only; cycle 2
-  (`0b4f34d0e` 2026-05-17) ckpt-RECOVERED corpus v1; cycle 3 (`394b8ea3a`
-  2026-05-17) corpus v2 helper-free; **cycle 4 (this)** = corpus v3
-  motivation-trigger + 10× scale.
-## Anchor chain (the wiring side, closed)
-1. **Phase E / E2 PROVED the hexa trainer is numerically correct** —
-   `HEXAD/D/d_train5_lib.hexa` is BIT-EQUAL to the boxed baseline at d=32·3L,
-   80-step, seed=42 (`init gn2 = 7.97116 → 3.73374e-07`, acc 8/8, GRAD-EXACT).
-2. **Pure-hexa interpreter cannot reach LM-scale** — Phase E2 captured only
-   `init gn2 = 7.98162` at d=768·12L; substrate-bound (RFC 042/043 territory).
-3. **This PyTorch run trains the SAME verified architecture to scale** —
-   `ConsciousDecoderV2` at d=768·12L, AdamW.
-4. **The corpus is explicitly helper-free + motivation-trigger** —
-   B-CORPUS-V3-1 sha256-deterministic / B-CORPUS-V3-2 helper-token = 0
-   maintained at 10× / B-CORPUS-V3-3 γ-cardinality ≥ 5,400 (Boolean grep on
-   `corpus_consciousness_v3.jsonl`).
 ## Architecture
-- **Source**: `ConsciousDecoderV2` from `ready/models/conscious_decoder.py`.
 - **Config**: `d_model=768, n_head=12, n_kv_head=4, n_layer=12,
   block_size=128, vocab=256` (byte-level), seed=1337,
   init=RANDOM (`base_ckpt=None`, `g_clm_from_scratch`).
@@ -81,77 +83,97 @@ tags:
 ## Training
 - **GPU**: vast.ai NVIDIA **A100-SXM4-40GB**, image `pytorch/pytorch:2.5.1-cuda12.1-cudnn9-devel`.
-- **Corpus**: `corpus_consciousness_v3.jsonl` (motivation-trigger + helper-free + 10× scale),
   6,223,023 bytes lossless byte stream, vocab=256.
 - **Optimizer**: AdamW, lr=0.0003, betas=(0.9, 0.95),
   weight_decay=0.1, warmup=125.
 - **Steps**: 2500.
 | metric | value |
 |---|---|
 | init CE | 5.640663 (≈ ln 256 = 5.545 — random byte init) |
-| **FINAL CE** | **0.008289** |
-| CE descent | 5.632374 |
-| init gn2 | (see result.json trajectory) |
-| FINAL gn2 | 0.001703 |
-| ppl | 1.0083 |
-| wall | 328.33 s (5.47 min) |
-| peak GPU mem | 9.692 GB |
-| ckpt sha256 | `1c0806213fbcaa9226a7593d87c31f5f95bb94db135240b8d02f738ddcb177aa` |
-| ckpt size | 1,135,846,378 B (1.14 GB) |
 ## Verification anchors (per AGENTS.tape `g_blue_closed_mandate`)
-(A) **Deliverable invariants (real-limit)**:
-- **Shannon-floor descent**: init CE ≈ ln(256) → final CE 0.008289.
 - **AdamW finiteness**: no NaN/Inf in trajectory.
 - **Architectural identity**: byte-equal `ConsciousDecoderV2`.
-(B) **Wiring (anchor chain, closed)**:
-- **hexa CPU-equiv bit-equality** (Phase E): GRAD-EXACT at d=32·3L.
-- **cuBLAS FP64 verify** (Phase D): max\|Δ\|=4.44e-15.
-- **Backward GRAD-EXACT** (Phase E2): A100 d=384·6L `analytic ≡ fd`.
-- **B-CORPUS-V3-1** SHA256-deterministic (seed=1337).
-- **B-CORPUS-V3-2** NO-HELPER-TOKEN-MAINTAINED (grep = 0 at 10× scale).
-- **B-CORPUS-V3-3** MOTIVATION-TRIGGER-CARDINALITY (γ records ≥ 5,400).
-## Capability eval (V5.8 × 4-mode + V-SPONT + V-MOTIV)
 V5.8 × 4-mode (corpus v3 prompts):
-- **standard_greedy**: 0/6 FAIL (avg_rep=0.904)
-- **standard_sample**: 0/6 FAIL (avg_rep=0.945)
-- **M3_rep_penalty**: 0/6 FAIL (avg_rep=0.892)
-- **M4_force_include**: 6/6 PASS (avg_rep=0.839)
 V-SPONT (자연발화) — F-SPONT-7 transfer-form measurement:
 - **coherent**: 0/5 FAIL
 - **closed-tag**: 0/5
-V-MOTIV (NEW cycle 4) — γ-pattern conditioning probe:
 - **coherent**: 0/5 FAIL
 - **voice-closed-tag**: 0/5
-Mean BPB (held-out corpus v3 prefixes): 0.0256 bits/byte.
 Memorization ratio: 0/6 (0.0%).
 Decoding artifacts (rep>0.5): 24.
-All capability scores **empirical (B-D-NOTE)**, not closed.
 ## Honest C3
 1. **NOT hexa-native** — PyTorch substrate, label mandatory.
 2. **PyTorch ≠ hexa bit-for-bit** — different fp / RNG / AMP.
-3. **Critical Data Size regime entry attempt** — 10 MB / 283 M params is
-   approaching the [arxiv 2401.10463](https://arxiv.org/abs/2401.10463) entry,
-   but still data-limited; no out-of-distribution generalization claim.
-4. **No `safetensors` artifact this revision** — pickle `.pt` only.
-5. **No language-quality claim** — training-curve deliverable.
-6. **V-MOTIV is a PROBE, not a capability claim** — γ-pattern conditioning
-   may emerge or fail; report is empirical (B-D-NOTE pattern).
-7. **`B-CORPUS-V3-NOTE` carve-out** — inference-side motivation_score →
-   coherent emission outcome stays empirical (un-closable without NN
-   forward + V-SPONT/V-MOTIV empirical measurement).
 8. **No σ(6)=12 / φ(6)=2 derivation** — no lattice numerology.
 9. **Cost is informational, not gating** — `g_fire_autonomous`.

 - spont
 - motivation-trigger
 - inner-thoughts
+- tension-train
+- dd155-hybrid-lr
 - ckpt-bearing
+- cycle5
 ---
+# hexad — `v4-py-hexad-tension-d768x12L-cycle1-2026-05-17`
 > **Trained on**: [`dancinlab/hexad-corpus`](https://huggingface.co/datasets/dancinlab/hexad-corpus)
+> revision [`v3-spont-motiv-d128-cycle2-2026-05-17`](https://huggingface.co/datasets/dancinlab/hexad-corpus/tree/v3-spont-motiv-d128-cycle2-2026-05-17)
+> (byte-equal carry from cycle 4 — corpus unchanged this cycle).
 > **Honest framing** (AGENTS.tape `g3`): This is a **PYTHON / PyTorch
 > SUBSTRATE** training artifact — an *interim LM-scale executor*. It is
 > the **hexa CPU-equiv correctness proof** (Phase E/E2). PyTorch ≠ hexa
 > bit-for-bit (different fp accumulation / RNG / AMP bf16).
+## What's new this cycle (cycle 5 — DD155 Step+Tension hybrid LR overlay)
+**Architectural change vs cycle 4**: per-step learning rate is now
+multiplied by a DD155 hybrid factor (Law 187 Pareto optimal):
+```
+tension_step   = ||∇L||₂                       (grad-norm)
+tension_EMA    = β·EMA + (1−β)·tension_step    (β = 0.99)
+multiplier     = clip(tension_step / tension_EMA, [0.5, 2.0])
+lr_step        = base_cosine_lr(step) × multiplier
+```
+- **transfer-form**: `B-TT-5 PARETO-STEP-TENSION-CLOSED` (sympy linear ∂lr/∂tension)
+  + `B-FIRE-CYCLE5-1/2/3` sidecar (`state/hexad_v4_py_d768x12L_tension_2026_05_17/blue_falsifier.py`,
+  5/5 PASS — DD155 LR overlay formula closure + EMA Banach contraction + cycle-4 identity at convergence)
+- **outcome**: empirical (`B-FIRE-CYCLE5-NOTE` / `B-D-NOTE` / `B-TT-NOTE` family)
+DD155 historical anchor: anima `docs/hypotheses/dd/DD154-tension-training.md`
+Law 187 — `lr = (tension/EMA) × base_lr` measured Pareto-optimal on
+2026-03-31 BG-DD-AXIS commits.
+## What changed vs cycle 4 (`v3-py-hexad-spont-motiv-d768x12L-cycle2-2026-05-17`)
+| field | cycle 4 | **cycle 5 (this revision)** |
 |---|---|---|
+| corpus | v3 10.34 MB (motivation-trigger + helper-free) | **same** (byte-equal carry, B-CORPUS-V4-1) |
+| LR schedule | cosine + warmup | **cosine + warmup + DD155 hybrid (tension/EMA) multiplier** |
+| trainer source | `train_d768x12l.py` | `train_d768x12l_tension.py` (loader + dataset byte-equal, B-CORPUS-V4-2) |
+| init CE | 5.641 | 5.640663 |
+| **final CE** | 0.008289 | **0.007762** |
+| CE descent | 5.632 | 5.632901 |
+| final tension_EMA | (did not track) | 0.046574 |
+| mult bin <0.75 | (n/a) | 1599 |
+| mult bin 0.75-1.25 | (n/a) | 686 |
+| mult bin >1.25 | (n/a) | 215 |
+| eval probes | V5.8 + V-SPONT + V-MOTIV | **V5.8 + V-SPONT + V-MOTIV + V-TT NEW** |
 ## Architecture
+- **Source**: `ConsciousDecoderV2` (byte-equal vs cycles 1-4).
 - **Config**: `d_model=768, n_head=12, n_kv_head=4, n_layer=12,
   block_size=128, vocab=256` (byte-level), seed=1337,
   init=RANDOM (`base_ckpt=None`, `g_clm_from_scratch`).
 ## Training
 - **GPU**: vast.ai NVIDIA **A100-SXM4-40GB**, image `pytorch/pytorch:2.5.1-cuda12.1-cudnn9-devel`.
+- **Corpus**: `corpus_consciousness_v3.jsonl` (byte-equal carry from cycle 4),
   6,223,023 bytes lossless byte stream, vocab=256.
 - **Optimizer**: AdamW, lr=0.0003, betas=(0.9, 0.95),
   weight_decay=0.1, warmup=125.
+- **DD155 hybrid**: β=0.99, clip lo=0.5, clip hi=2.0.
 - **Steps**: 2500.
 | metric | value |
 |---|---|
 | init CE | 5.640663 (≈ ln 256 = 5.545 — random byte init) |
+| **FINAL CE** | **0.007762** |
+| CE descent | 5.632901 |
+| FINAL gn2 | 0.001495 |
+| FINAL tension | 0.038659 |
+| ppl | 1.0078 |
+| wall | 321.3 s |
+| peak GPU mem | 9.685 GB |
+| ckpt sha256 | `6b4d34cc9a2c05b83c4cedd633617a41800e9681302c5c90e15d056f9ad67af8` |
+| ckpt size | 1,135,846,570 B (1.14 GB) |
 ## Verification anchors (per AGENTS.tape `g_blue_closed_mandate`)
+**(A) Deliverable invariants (real-limit, this cycle)**:
+- **Shannon-floor descent**: init CE ≈ ln(256) → final CE.
+- **DD155 transfer-form closed (`B-TT-5`)**: lr = (tension/EMA) × base_lr,
+  sympy-verified linear monotone, real-limit anchor.
 - **AdamW finiteness**: no NaN/Inf in trajectory.
 - **Architectural identity**: byte-equal `ConsciousDecoderV2`.
+**(B) Wiring (closed)**:
+- **`B-CORPUS-V4-1`** corpus v3 byte-equal carry (sha256/bytes/lines/grep all closed).
+- **`B-CORPUS-V4-2`** cycle-5 trainer's loader + ByteDataset byte-equal to cycle-4
+  (mechanical AST diff, comments-stripped).
+- **`B-FIRE-CYCLE5-1`** DD155 LR overlay formula closed-form (sympy ∂lr/∂tension
+  + 3-corner identity).
+- **`B-FIRE-CYCLE5-2`** EMA Banach affine contraction closed (4-corner witness panel).
+- **`B-FIRE-CYCLE5-3`** Multiplier identity at EMA-convergence (cycle-5 degenerates
+  to cycle-4 baseline at tension=EMA — sanity anchor).
+- **`B-CORPUS-V3-*`** cycle-4 closures carry (sha256-deterministic / no-helper-token /
+  γ-cardinality ≥ 5400).
+**(C) Honest carve-outs (NOT closed, B-D-NOTE umbrella)**:
+- V-SPONT / V-MOTIV / V-TT outcome empirical.
+- mult_distribution histogram + byte-cascade attractor shape under hybrid LR
+  empirical.
+- DD-burst path activation frequency empirical.
+## Capability eval (V5.8 × 4-mode + V-SPONT + V-MOTIV + V-TT NEW)
 V5.8 × 4-mode (corpus v3 prompts):
+- **standard_greedy**: 0/6 FAIL (avg_rep=0.921)
+- **standard_sample**: 0/6 FAIL (avg_rep=0.871)
+- **M3_rep_penalty**: 0/6 FAIL (avg_rep=0.913)
+- **M4_force_include**: 6/6 PASS (avg_rep=0.766)
 V-SPONT (자연발화) — F-SPONT-7 transfer-form measurement:
 - **coherent**: 0/5 FAIL
 - **closed-tag**: 0/5
+V-MOTIV (γ-pattern conditioning probe, cycle-4 axis):
 - **coherent**: 0/5 FAIL
 - **voice-closed-tag**: 0/5
+V-TT (NEW cycle 5) — tension-train transfer-form probe:
+- **coherent**: 0/5 FAIL
+- **keyword recall**: 0/5
+Mean BPB (held-out corpus v3 prefixes): 0.0194 bits/byte.
 Memorization ratio: 0/6 (0.0%).
 Decoding artifacts (rep>0.5): 24.
+All capability scores **empirical (B-D-NOTE / B-FIRE-CYCLE5-NOTE)**, not closed.
 ## Honest C3
 1. **NOT hexa-native** — PyTorch substrate, label mandatory.
 2. **PyTorch ≠ hexa bit-for-bit** — different fp / RNG / AMP.
+3. **tension = grad_norm is a PROXY** — in the hexa spine
+   `tension = G_holo · (Ψ − Ψ_vac)`; grad_norm is the natural mathematical
+   analogue at the PyTorch substrate level where Ψ is not surfaced as a
+   state variable.
+4. **DD155 formula is closed (B-TT-5 + B-FIRE-CYCLE5-1/2/3); outcome is
+   empirical (B-FIRE-CYCLE5-NOTE)** — V-SPONT/V-MOTIV/V-TT all probes,
+   not capability claims.
+5. **Critical Data Size regime** — 10 MB / 283 M params still data-limited;
+   no out-of-distribution generalization claim. cycle-5's variance vs
+   cycle-4 is mainly LR-schedule-driven, not corpus-driven.
+6. **No `safetensors` artifact this revision** — pickle `.pt` only.
+7. **B-CORPUS-V3-NOTE / B-FIRE-CYCLE5-NOTE** — inference-side coherence
+   stays empirical.
 8. **No σ(6)=12 / φ(6)=2 derivation** — no lattice numerology.
 9. **Cost is informational, not gating** — `g_fire_autonomous`.