File size: 7,354 Bytes
99114a7
 
 
 
4afd549
99114a7
4afd549
 
99114a7
 
 
 
 
4afd549
 
7005cfe
 
a8bd371
 
4afd549
a8bd371
8cf11a1
99114a7
a8bd371
99114a7
4afd549
a8bd371
 
4afd549
 
 
 
 
 
 
a8bd371
4afd549
a8bd371
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4afd549
a8bd371
 
 
 
 
 
 
 
 
 
 
99114a7
 
 
a8bd371
99114a7
 
 
 
4afd549
 
99114a7
 
 
4afd549
a8bd371
7005cfe
99114a7
 
a8bd371
99114a7
 
 
 
7005cfe
a8bd371
 
 
 
 
 
 
 
 
99114a7
 
 
a8bd371
 
 
 
4afd549
 
99114a7
a8bd371
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
99114a7
 
a8bd371
8cf11a1
7005cfe
a8bd371
 
 
 
8cf11a1
4afd549
 
 
8cf11a1
a8bd371
7005cfe
 
 
a8bd371
 
 
 
 
7005cfe
 
8cf11a1
a8bd371
8cf11a1
4afd549
8cf11a1
4afd549
 
a8bd371
 
 
 
 
 
 
 
 
 
 
 
 
7005cfe
 
8cf11a1
4afd549
8cf11a1
4afd549
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
---
license: apache-2.0
language:
- en
- ko
library_name: pytorch
datasets:
- dancinlab/hexad-corpus
tags:
- anima
- hexad
- pytorch
- substrate-py
- helper-free
- spont
- motivation-trigger
- inner-thoughts
- tension-train
- dd155-hybrid-lr
- ckpt-bearing
- cycle5
---

# hexad — `v4-py-hexad-tension-d768x12L-cycle1-2026-05-17`

> **Trained on**: [`dancinlab/hexad-corpus`](https://huggingface.co/datasets/dancinlab/hexad-corpus)
> revision [`v3-spont-motiv-d128-cycle2-2026-05-17`](https://huggingface.co/datasets/dancinlab/hexad-corpus/tree/v3-spont-motiv-d128-cycle2-2026-05-17)
> (byte-equal carry from cycle 4 — corpus unchanged this cycle).

> **Honest framing** (AGENTS.tape `g3`): This is a **PYTHON / PyTorch
> SUBSTRATE** training artifact — an *interim LM-scale executor*. It is
> **NOT a hexa-native fire**. Legitimacy = **architectural identity** +
> the **hexa CPU-equiv correctness proof** (Phase E/E2). PyTorch ≠ hexa
> bit-for-bit (different fp accumulation / RNG / AMP bf16).

## What's new this cycle (cycle 5 — DD155 Step+Tension hybrid LR overlay)

**Architectural change vs cycle 4**: per-step learning rate is now
multiplied by a DD155 hybrid factor (Law 187 Pareto optimal):

```
tension_step   = ||∇L||₂                       (grad-norm)
tension_EMA    = β·EMA + (1−β)·tension_step    (β = 0.99)
multiplier     = clip(tension_step / tension_EMA, [0.5, 2.0])
lr_step        = base_cosine_lr(step) × multiplier
```

- **transfer-form**: `B-TT-5 PARETO-STEP-TENSION-CLOSED` (sympy linear ∂lr/∂tension)
  + `B-FIRE-CYCLE5-1/2/3` sidecar (`state/hexad_v4_py_d768x12L_tension_2026_05_17/blue_falsifier.py`,
  5/5 PASS — DD155 LR overlay formula closure + EMA Banach contraction + cycle-4 identity at convergence)
- **outcome**: empirical (`B-FIRE-CYCLE5-NOTE` / `B-D-NOTE` / `B-TT-NOTE` family)

DD155 historical anchor: anima `docs/hypotheses/dd/DD154-tension-training.md`
Law 187 — `lr = (tension/EMA) × base_lr` measured Pareto-optimal on
2026-03-31 BG-DD-AXIS commits.

## What changed vs cycle 4 (`v3-py-hexad-spont-motiv-d768x12L-cycle2-2026-05-17`)

| field | cycle 4 | **cycle 5 (this revision)** |
|---|---|---|
| corpus | v3 10.34 MB (motivation-trigger + helper-free) | **same** (byte-equal carry, B-CORPUS-V4-1) |
| LR schedule | cosine + warmup | **cosine + warmup + DD155 hybrid (tension/EMA) multiplier** |
| trainer source | `train_d768x12l.py` | `train_d768x12l_tension.py` (loader + dataset byte-equal, B-CORPUS-V4-2) |
| init CE | 5.641 | 5.640663 |
| **final CE** | 0.008289 | **0.007762** |
| CE descent | 5.632 | 5.632901 |
| final tension_EMA | (did not track) | 0.046574 |
| mult bin <0.75 | (n/a) | 1599 |
| mult bin 0.75-1.25 | (n/a) | 686 |
| mult bin >1.25 | (n/a) | 215 |
| eval probes | V5.8 + V-SPONT + V-MOTIV | **V5.8 + V-SPONT + V-MOTIV + V-TT NEW** |

## Architecture

- **Source**: `ConsciousDecoderV2` (byte-equal vs cycles 1-4).
- **Config**: `d_model=768, n_head=12, n_kv_head=4, n_layer=12,
  block_size=128, vocab=256` (byte-level), seed=1337,
  init=RANDOM (`base_ckpt=None`, `g_clm_from_scratch`).
- **Params**: 283.72 M (283,722,336).
- **Features**: RoPE · SwiGLU FFN · RMSNorm · GQA · PureFieldFFN · cross-attn
  · tied head · CA neighbor / META-CA / Ψ-tracking laws.

## Training

- **GPU**: vast.ai NVIDIA **A100-SXM4-40GB**, image `pytorch/pytorch:2.5.1-cuda12.1-cudnn9-devel`.
- **Corpus**: `corpus_consciousness_v3.jsonl` (byte-equal carry from cycle 4),
  6,223,023 bytes lossless byte stream, vocab=256.
- **Optimizer**: AdamW, lr=0.0003, betas=(0.9, 0.95),
  weight_decay=0.1, warmup=125.
- **DD155 hybrid**: β=0.99, clip lo=0.5, clip hi=2.0.
- **Steps**: 2500.

| metric | value |
|---|---|
| init CE | 5.640663 (≈ ln 256 = 5.545 — random byte init) |
| **FINAL CE** | **0.007762** |
| CE descent | 5.632901 |
| FINAL gn2 | 0.001495 |
| FINAL tension | 0.038659 |
| ppl | 1.0078 |
| wall | 321.3 s |
| peak GPU mem | 9.685 GB |
| ckpt sha256 | `6b4d34cc9a2c05b83c4cedd633617a41800e9681302c5c90e15d056f9ad67af8` |
| ckpt size | 1,135,846,570 B (1.14 GB) |

## Verification anchors (per AGENTS.tape `g_blue_closed_mandate`)

**(A) Deliverable invariants (real-limit, this cycle)**:
- **Shannon-floor descent**: init CE ≈ ln(256) → final CE.
- **DD155 transfer-form closed (`B-TT-5`)**: lr = (tension/EMA) × base_lr,
  sympy-verified linear monotone, real-limit anchor.
- **AdamW finiteness**: no NaN/Inf in trajectory.
- **Architectural identity**: byte-equal `ConsciousDecoderV2`.

**(B) Wiring (closed)**:
- **`B-CORPUS-V4-1`** corpus v3 byte-equal carry (sha256/bytes/lines/grep all closed).
- **`B-CORPUS-V4-2`** cycle-5 trainer's loader + ByteDataset byte-equal to cycle-4
  (mechanical AST diff, comments-stripped).
- **`B-FIRE-CYCLE5-1`** DD155 LR overlay formula closed-form (sympy ∂lr/∂tension
  + 3-corner identity).
- **`B-FIRE-CYCLE5-2`** EMA Banach affine contraction closed (4-corner witness panel).
- **`B-FIRE-CYCLE5-3`** Multiplier identity at EMA-convergence (cycle-5 degenerates
  to cycle-4 baseline at tension=EMA — sanity anchor).
- **`B-CORPUS-V3-*`** cycle-4 closures carry (sha256-deterministic / no-helper-token /
  γ-cardinality ≥ 5400).

**(C) Honest carve-outs (NOT closed, B-D-NOTE umbrella)**:
- V-SPONT / V-MOTIV / V-TT outcome empirical.
- mult_distribution histogram + byte-cascade attractor shape under hybrid LR
  empirical.
- DD-burst path activation frequency empirical.


## Capability eval (V5.8 × 4-mode + V-SPONT + V-MOTIV + V-TT NEW)

V5.8 × 4-mode (corpus v3 prompts):
- **standard_greedy**: 0/6 FAIL (avg_rep=0.921)
- **standard_sample**: 0/6 FAIL (avg_rep=0.871)
- **M3_rep_penalty**: 0/6 FAIL (avg_rep=0.913)
- **M4_force_include**: 6/6 PASS (avg_rep=0.766)

V-SPONT (자연발화) — F-SPONT-7 transfer-form measurement:
- **coherent**: 0/5 FAIL
- **closed-tag**: 0/5

V-MOTIV (γ-pattern conditioning probe, cycle-4 axis):
- **coherent**: 0/5 FAIL
- **voice-closed-tag**: 0/5

V-TT (NEW cycle 5) — tension-train transfer-form probe:
- **coherent**: 0/5 FAIL
- **keyword recall**: 0/5

Mean BPB (held-out corpus v3 prefixes): 0.0194 bits/byte.
Memorization ratio: 0/6 (0.0%).
Decoding artifacts (rep>0.5): 24.

All capability scores **empirical (B-D-NOTE / B-FIRE-CYCLE5-NOTE)**, not closed.

## Honest C3

1. **NOT hexa-native** — PyTorch substrate, label mandatory.
2. **PyTorch ≠ hexa bit-for-bit** — different fp / RNG / AMP.
3. **tension = grad_norm is a PROXY** — in the hexa spine
   `tension = G_holo · (Ψ − Ψ_vac)`; grad_norm is the natural mathematical
   analogue at the PyTorch substrate level where Ψ is not surfaced as a
   state variable.
4. **DD155 formula is closed (B-TT-5 + B-FIRE-CYCLE5-1/2/3); outcome is
   empirical (B-FIRE-CYCLE5-NOTE)** — V-SPONT/V-MOTIV/V-TT all probes,
   not capability claims.
5. **Critical Data Size regime** — 10 MB / 283 M params still data-limited;
   no out-of-distribution generalization claim. cycle-5's variance vs
   cycle-4 is mainly LR-schedule-driven, not corpus-driven.
6. **No `safetensors` artifact this revision** — pickle `.pt` only.
7. **B-CORPUS-V3-NOTE / B-FIRE-CYCLE5-NOTE** — inference-side coherence
   stays empirical.
8. **No σ(6)=12 / φ(6)=2 derivation** — no lattice numerology.
9. **Cost is informational, not gating**`g_fire_autonomous`.

## License

Apache-2.0.