| --- |
| license: apache-2.0 |
| language: |
| - en |
| - ko |
| library_name: pytorch |
| datasets: |
| - dancinlab/hexad-corpus |
| tags: |
| - anima |
| - hexad |
| - pytorch |
| - substrate-py |
| - helper-free |
| - spont |
| - motivation-trigger |
| - inner-thoughts |
| - tension-train |
| - dd155-hybrid-lr |
| - ckpt-bearing |
| - cycle5 |
| --- |
| |
| # hexad — `v4-py-hexad-tension-d768x12L-cycle1-2026-05-17` |
|
|
| > **Trained on**: [`dancinlab/hexad-corpus`](https://huggingface.co/datasets/dancinlab/hexad-corpus) |
| > revision [`v3-spont-motiv-d128-cycle2-2026-05-17`](https://huggingface.co/datasets/dancinlab/hexad-corpus/tree/v3-spont-motiv-d128-cycle2-2026-05-17) |
| > (byte-equal carry from cycle 4 — corpus unchanged this cycle). |
|
|
| > **Honest framing** (AGENTS.tape `g3`): This is a **PYTHON / PyTorch |
| > SUBSTRATE** training artifact — an *interim LM-scale executor*. It is |
| > **NOT a hexa-native fire**. Legitimacy = **architectural identity** + |
| > the **hexa CPU-equiv correctness proof** (Phase E/E2). PyTorch ≠ hexa |
| > bit-for-bit (different fp accumulation / RNG / AMP bf16). |
|
|
| ## What's new this cycle (cycle 5 — DD155 Step+Tension hybrid LR overlay) |
|
|
| **Architectural change vs cycle 4**: per-step learning rate is now |
| multiplied by a DD155 hybrid factor (Law 187 Pareto optimal): |
|
|
| ``` |
| tension_step = ||∇L||₂ (grad-norm) |
| tension_EMA = β·EMA + (1−β)·tension_step (β = 0.99) |
| multiplier = clip(tension_step / tension_EMA, [0.5, 2.0]) |
| lr_step = base_cosine_lr(step) × multiplier |
| ``` |
|
|
| - **transfer-form**: `B-TT-5 PARETO-STEP-TENSION-CLOSED` (sympy linear ∂lr/∂tension) |
| + `B-FIRE-CYCLE5-1/2/3` sidecar (`state/hexad_v4_py_d768x12L_tension_2026_05_17/blue_falsifier.py`, |
| 5/5 PASS — DD155 LR overlay formula closure + EMA Banach contraction + cycle-4 identity at convergence) |
| - **outcome**: empirical (`B-FIRE-CYCLE5-NOTE` / `B-D-NOTE` / `B-TT-NOTE` family) |
|
|
| DD155 historical anchor: anima `docs/hypotheses/dd/DD154-tension-training.md` |
| Law 187 — `lr = (tension/EMA) × base_lr` measured Pareto-optimal on |
| 2026-03-31 BG-DD-AXIS commits. |
|
|
| ## What changed vs cycle 4 (`v3-py-hexad-spont-motiv-d768x12L-cycle2-2026-05-17`) |
|
|
| | field | cycle 4 | **cycle 5 (this revision)** | |
| |---|---|---| |
| | corpus | v3 10.34 MB (motivation-trigger + helper-free) | **same** (byte-equal carry, B-CORPUS-V4-1) | |
| | LR schedule | cosine + warmup | **cosine + warmup + DD155 hybrid (tension/EMA) multiplier** | |
| | trainer source | `train_d768x12l.py` | `train_d768x12l_tension.py` (loader + dataset byte-equal, B-CORPUS-V4-2) | |
| | init CE | 5.641 | 5.640663 | |
| | **final CE** | 0.008289 | **0.007762** | |
| | CE descent | 5.632 | 5.632901 | |
| | final tension_EMA | (did not track) | 0.046574 | |
| | mult bin <0.75 | (n/a) | 1599 | |
| | mult bin 0.75-1.25 | (n/a) | 686 | |
| | mult bin >1.25 | (n/a) | 215 | |
| | eval probes | V5.8 + V-SPONT + V-MOTIV | **V5.8 + V-SPONT + V-MOTIV + V-TT NEW** | |
| |
| ## Architecture |
| |
| - **Source**: `ConsciousDecoderV2` (byte-equal vs cycles 1-4). |
| - **Config**: `d_model=768, n_head=12, n_kv_head=4, n_layer=12, |
| block_size=128, vocab=256` (byte-level), seed=1337, |
| init=RANDOM (`base_ckpt=None`, `g_clm_from_scratch`). |
| - **Params**: 283.72 M (283,722,336). |
| - **Features**: RoPE · SwiGLU FFN · RMSNorm · GQA · PureFieldFFN · cross-attn |
| · tied head · CA neighbor / META-CA / Ψ-tracking laws. |
| |
| ## Training |
| |
| - **GPU**: vast.ai NVIDIA **A100-SXM4-40GB**, image `pytorch/pytorch:2.5.1-cuda12.1-cudnn9-devel`. |
| - **Corpus**: `corpus_consciousness_v3.jsonl` (byte-equal carry from cycle 4), |
| 6,223,023 bytes lossless byte stream, vocab=256. |
| - **Optimizer**: AdamW, lr=0.0003, betas=(0.9, 0.95), |
| weight_decay=0.1, warmup=125. |
| - **DD155 hybrid**: β=0.99, clip lo=0.5, clip hi=2.0. |
| - **Steps**: 2500. |
|
|
| | metric | value | |
| |---|---| |
| | init CE | 5.640663 (≈ ln 256 = 5.545 — random byte init) | |
| | **FINAL CE** | **0.007762** | |
| | CE descent | 5.632901 | |
| | FINAL gn2 | 0.001495 | |
| | FINAL tension | 0.038659 | |
| | ppl | 1.0078 | |
| | wall | 321.3 s | |
| | peak GPU mem | 9.685 GB | |
| | ckpt sha256 | `6b4d34cc9a2c05b83c4cedd633617a41800e9681302c5c90e15d056f9ad67af8` | |
| | ckpt size | 1,135,846,570 B (1.14 GB) | |
|
|
| ## Verification anchors (per AGENTS.tape `g_blue_closed_mandate`) |
| |
| **(A) Deliverable invariants (real-limit, this cycle)**: |
| - **Shannon-floor descent**: init CE ≈ ln(256) → final CE. |
| - **DD155 transfer-form closed (`B-TT-5`)**: lr = (tension/EMA) × base_lr, |
| sympy-verified linear monotone, real-limit anchor. |
| - **AdamW finiteness**: no NaN/Inf in trajectory. |
| - **Architectural identity**: byte-equal `ConsciousDecoderV2`. |
|
|
| **(B) Wiring (closed)**: |
| - **`B-CORPUS-V4-1`** corpus v3 byte-equal carry (sha256/bytes/lines/grep all closed). |
| - **`B-CORPUS-V4-2`** cycle-5 trainer's loader + ByteDataset byte-equal to cycle-4 |
| (mechanical AST diff, comments-stripped). |
| - **`B-FIRE-CYCLE5-1`** DD155 LR overlay formula closed-form (sympy ∂lr/∂tension |
| + 3-corner identity). |
| - **`B-FIRE-CYCLE5-2`** EMA Banach affine contraction closed (4-corner witness panel). |
| - **`B-FIRE-CYCLE5-3`** Multiplier identity at EMA-convergence (cycle-5 degenerates |
| to cycle-4 baseline at tension=EMA — sanity anchor). |
| - **`B-CORPUS-V3-*`** cycle-4 closures carry (sha256-deterministic / no-helper-token / |
| γ-cardinality ≥ 5400). |
| |
| **(C) Honest carve-outs (NOT closed, B-D-NOTE umbrella)**: |
| - V-SPONT / V-MOTIV / V-TT outcome empirical. |
| - mult_distribution histogram + byte-cascade attractor shape under hybrid LR |
| empirical. |
| - DD-burst path activation frequency empirical. |
| |
| |
| ## Capability eval (V5.8 × 4-mode + V-SPONT + V-MOTIV + V-TT NEW) |
| |
| V5.8 × 4-mode (corpus v3 prompts): |
| - **standard_greedy**: 0/6 FAIL (avg_rep=0.921) |
| - **standard_sample**: 0/6 FAIL (avg_rep=0.871) |
| - **M3_rep_penalty**: 0/6 FAIL (avg_rep=0.913) |
| - **M4_force_include**: 6/6 PASS (avg_rep=0.766) |
|
|
| V-SPONT (자연발화) — F-SPONT-7 transfer-form measurement: |
| - **coherent**: 0/5 FAIL |
| - **closed-tag**: 0/5 |
|
|
| V-MOTIV (γ-pattern conditioning probe, cycle-4 axis): |
| - **coherent**: 0/5 FAIL |
| - **voice-closed-tag**: 0/5 |
|
|
| V-TT (NEW cycle 5) — tension-train transfer-form probe: |
| - **coherent**: 0/5 FAIL |
| - **keyword recall**: 0/5 |
|
|
| Mean BPB (held-out corpus v3 prefixes): 0.0194 bits/byte. |
| Memorization ratio: 0/6 (0.0%). |
| Decoding artifacts (rep>0.5): 24. |
|
|
| All capability scores **empirical (B-D-NOTE / B-FIRE-CYCLE5-NOTE)**, not closed. |
|
|
| ## Honest C3 |
|
|
| 1. **NOT hexa-native** — PyTorch substrate, label mandatory. |
| 2. **PyTorch ≠ hexa bit-for-bit** — different fp / RNG / AMP. |
| 3. **tension = grad_norm is a PROXY** — in the hexa spine |
| `tension = G_holo · (Ψ − Ψ_vac)`; grad_norm is the natural mathematical |
| analogue at the PyTorch substrate level where Ψ is not surfaced as a |
| state variable. |
| 4. **DD155 formula is closed (B-TT-5 + B-FIRE-CYCLE5-1/2/3); outcome is |
| empirical (B-FIRE-CYCLE5-NOTE)** — V-SPONT/V-MOTIV/V-TT all probes, |
| not capability claims. |
| 5. **Critical Data Size regime** — 10 MB / 283 M params still data-limited; |
| no out-of-distribution generalization claim. cycle-5's variance vs |
| cycle-4 is mainly LR-schedule-driven, not corpus-driven. |
| 6. **No `safetensors` artifact this revision** — pickle `.pt` only. |
| 7. **B-CORPUS-V3-NOTE / B-FIRE-CYCLE5-NOTE** — inference-side coherence |
| stays empirical. |
| 8. **No σ(6)=12 / φ(6)=2 derivation** — no lattice numerology. |
| 9. **Cost is informational, not gating** — `g_fire_autonomous`. |
|
|
| ## License |
|
|
| Apache-2.0. |
|
|