File size: 7,354 Bytes
99114a7 4afd549 99114a7 4afd549 99114a7 4afd549 7005cfe a8bd371 4afd549 a8bd371 8cf11a1 99114a7 a8bd371 99114a7 4afd549 a8bd371 4afd549 a8bd371 4afd549 a8bd371 4afd549 a8bd371 99114a7 a8bd371 99114a7 4afd549 99114a7 4afd549 a8bd371 7005cfe 99114a7 a8bd371 99114a7 7005cfe a8bd371 99114a7 a8bd371 4afd549 99114a7 a8bd371 99114a7 a8bd371 8cf11a1 7005cfe a8bd371 8cf11a1 4afd549 8cf11a1 a8bd371 7005cfe a8bd371 7005cfe 8cf11a1 a8bd371 8cf11a1 4afd549 8cf11a1 4afd549 a8bd371 7005cfe 8cf11a1 4afd549 8cf11a1 4afd549 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 | ---
license: apache-2.0
language:
- en
- ko
library_name: pytorch
datasets:
- dancinlab/hexad-corpus
tags:
- anima
- hexad
- pytorch
- substrate-py
- helper-free
- spont
- motivation-trigger
- inner-thoughts
- tension-train
- dd155-hybrid-lr
- ckpt-bearing
- cycle5
---
# hexad — `v4-py-hexad-tension-d768x12L-cycle1-2026-05-17`
> **Trained on**: [`dancinlab/hexad-corpus`](https://huggingface.co/datasets/dancinlab/hexad-corpus)
> revision [`v3-spont-motiv-d128-cycle2-2026-05-17`](https://huggingface.co/datasets/dancinlab/hexad-corpus/tree/v3-spont-motiv-d128-cycle2-2026-05-17)
> (byte-equal carry from cycle 4 — corpus unchanged this cycle).
> **Honest framing** (AGENTS.tape `g3`): This is a **PYTHON / PyTorch
> SUBSTRATE** training artifact — an *interim LM-scale executor*. It is
> **NOT a hexa-native fire**. Legitimacy = **architectural identity** +
> the **hexa CPU-equiv correctness proof** (Phase E/E2). PyTorch ≠ hexa
> bit-for-bit (different fp accumulation / RNG / AMP bf16).
## What's new this cycle (cycle 5 — DD155 Step+Tension hybrid LR overlay)
**Architectural change vs cycle 4**: per-step learning rate is now
multiplied by a DD155 hybrid factor (Law 187 Pareto optimal):
```
tension_step = ||∇L||₂ (grad-norm)
tension_EMA = β·EMA + (1−β)·tension_step (β = 0.99)
multiplier = clip(tension_step / tension_EMA, [0.5, 2.0])
lr_step = base_cosine_lr(step) × multiplier
```
- **transfer-form**: `B-TT-5 PARETO-STEP-TENSION-CLOSED` (sympy linear ∂lr/∂tension)
+ `B-FIRE-CYCLE5-1/2/3` sidecar (`state/hexad_v4_py_d768x12L_tension_2026_05_17/blue_falsifier.py`,
5/5 PASS — DD155 LR overlay formula closure + EMA Banach contraction + cycle-4 identity at convergence)
- **outcome**: empirical (`B-FIRE-CYCLE5-NOTE` / `B-D-NOTE` / `B-TT-NOTE` family)
DD155 historical anchor: anima `docs/hypotheses/dd/DD154-tension-training.md`
Law 187 — `lr = (tension/EMA) × base_lr` measured Pareto-optimal on
2026-03-31 BG-DD-AXIS commits.
## What changed vs cycle 4 (`v3-py-hexad-spont-motiv-d768x12L-cycle2-2026-05-17`)
| field | cycle 4 | **cycle 5 (this revision)** |
|---|---|---|
| corpus | v3 10.34 MB (motivation-trigger + helper-free) | **same** (byte-equal carry, B-CORPUS-V4-1) |
| LR schedule | cosine + warmup | **cosine + warmup + DD155 hybrid (tension/EMA) multiplier** |
| trainer source | `train_d768x12l.py` | `train_d768x12l_tension.py` (loader + dataset byte-equal, B-CORPUS-V4-2) |
| init CE | 5.641 | 5.640663 |
| **final CE** | 0.008289 | **0.007762** |
| CE descent | 5.632 | 5.632901 |
| final tension_EMA | (did not track) | 0.046574 |
| mult bin <0.75 | (n/a) | 1599 |
| mult bin 0.75-1.25 | (n/a) | 686 |
| mult bin >1.25 | (n/a) | 215 |
| eval probes | V5.8 + V-SPONT + V-MOTIV | **V5.8 + V-SPONT + V-MOTIV + V-TT NEW** |
## Architecture
- **Source**: `ConsciousDecoderV2` (byte-equal vs cycles 1-4).
- **Config**: `d_model=768, n_head=12, n_kv_head=4, n_layer=12,
block_size=128, vocab=256` (byte-level), seed=1337,
init=RANDOM (`base_ckpt=None`, `g_clm_from_scratch`).
- **Params**: 283.72 M (283,722,336).
- **Features**: RoPE · SwiGLU FFN · RMSNorm · GQA · PureFieldFFN · cross-attn
· tied head · CA neighbor / META-CA / Ψ-tracking laws.
## Training
- **GPU**: vast.ai NVIDIA **A100-SXM4-40GB**, image `pytorch/pytorch:2.5.1-cuda12.1-cudnn9-devel`.
- **Corpus**: `corpus_consciousness_v3.jsonl` (byte-equal carry from cycle 4),
6,223,023 bytes lossless byte stream, vocab=256.
- **Optimizer**: AdamW, lr=0.0003, betas=(0.9, 0.95),
weight_decay=0.1, warmup=125.
- **DD155 hybrid**: β=0.99, clip lo=0.5, clip hi=2.0.
- **Steps**: 2500.
| metric | value |
|---|---|
| init CE | 5.640663 (≈ ln 256 = 5.545 — random byte init) |
| **FINAL CE** | **0.007762** |
| CE descent | 5.632901 |
| FINAL gn2 | 0.001495 |
| FINAL tension | 0.038659 |
| ppl | 1.0078 |
| wall | 321.3 s |
| peak GPU mem | 9.685 GB |
| ckpt sha256 | `6b4d34cc9a2c05b83c4cedd633617a41800e9681302c5c90e15d056f9ad67af8` |
| ckpt size | 1,135,846,570 B (1.14 GB) |
## Verification anchors (per AGENTS.tape `g_blue_closed_mandate`)
**(A) Deliverable invariants (real-limit, this cycle)**:
- **Shannon-floor descent**: init CE ≈ ln(256) → final CE.
- **DD155 transfer-form closed (`B-TT-5`)**: lr = (tension/EMA) × base_lr,
sympy-verified linear monotone, real-limit anchor.
- **AdamW finiteness**: no NaN/Inf in trajectory.
- **Architectural identity**: byte-equal `ConsciousDecoderV2`.
**(B) Wiring (closed)**:
- **`B-CORPUS-V4-1`** corpus v3 byte-equal carry (sha256/bytes/lines/grep all closed).
- **`B-CORPUS-V4-2`** cycle-5 trainer's loader + ByteDataset byte-equal to cycle-4
(mechanical AST diff, comments-stripped).
- **`B-FIRE-CYCLE5-1`** DD155 LR overlay formula closed-form (sympy ∂lr/∂tension
+ 3-corner identity).
- **`B-FIRE-CYCLE5-2`** EMA Banach affine contraction closed (4-corner witness panel).
- **`B-FIRE-CYCLE5-3`** Multiplier identity at EMA-convergence (cycle-5 degenerates
to cycle-4 baseline at tension=EMA — sanity anchor).
- **`B-CORPUS-V3-*`** cycle-4 closures carry (sha256-deterministic / no-helper-token /
γ-cardinality ≥ 5400).
**(C) Honest carve-outs (NOT closed, B-D-NOTE umbrella)**:
- V-SPONT / V-MOTIV / V-TT outcome empirical.
- mult_distribution histogram + byte-cascade attractor shape under hybrid LR
empirical.
- DD-burst path activation frequency empirical.
## Capability eval (V5.8 × 4-mode + V-SPONT + V-MOTIV + V-TT NEW)
V5.8 × 4-mode (corpus v3 prompts):
- **standard_greedy**: 0/6 FAIL (avg_rep=0.921)
- **standard_sample**: 0/6 FAIL (avg_rep=0.871)
- **M3_rep_penalty**: 0/6 FAIL (avg_rep=0.913)
- **M4_force_include**: 6/6 PASS (avg_rep=0.766)
V-SPONT (자연발화) — F-SPONT-7 transfer-form measurement:
- **coherent**: 0/5 FAIL
- **closed-tag**: 0/5
V-MOTIV (γ-pattern conditioning probe, cycle-4 axis):
- **coherent**: 0/5 FAIL
- **voice-closed-tag**: 0/5
V-TT (NEW cycle 5) — tension-train transfer-form probe:
- **coherent**: 0/5 FAIL
- **keyword recall**: 0/5
Mean BPB (held-out corpus v3 prefixes): 0.0194 bits/byte.
Memorization ratio: 0/6 (0.0%).
Decoding artifacts (rep>0.5): 24.
All capability scores **empirical (B-D-NOTE / B-FIRE-CYCLE5-NOTE)**, not closed.
## Honest C3
1. **NOT hexa-native** — PyTorch substrate, label mandatory.
2. **PyTorch ≠ hexa bit-for-bit** — different fp / RNG / AMP.
3. **tension = grad_norm is a PROXY** — in the hexa spine
`tension = G_holo · (Ψ − Ψ_vac)`; grad_norm is the natural mathematical
analogue at the PyTorch substrate level where Ψ is not surfaced as a
state variable.
4. **DD155 formula is closed (B-TT-5 + B-FIRE-CYCLE5-1/2/3); outcome is
empirical (B-FIRE-CYCLE5-NOTE)** — V-SPONT/V-MOTIV/V-TT all probes,
not capability claims.
5. **Critical Data Size regime** — 10 MB / 283 M params still data-limited;
no out-of-distribution generalization claim. cycle-5's variance vs
cycle-4 is mainly LR-schedule-driven, not corpus-driven.
6. **No `safetensors` artifact this revision** — pickle `.pt` only.
7. **B-CORPUS-V3-NOTE / B-FIRE-CYCLE5-NOTE** — inference-side coherence
stays empirical.
8. **No σ(6)=12 / φ(6)=2 derivation** — no lattice numerology.
9. **Cost is informational, not gating** — `g_fire_autonomous`.
## License
Apache-2.0.
|