README.md · dancinlab/hexad at main

File size: 7,354 Bytes

99114a7
 
 
 
4afd549
99114a7
4afd549
 
99114a7
 
 
 
 
4afd549
 
7005cfe
 
a8bd371
 
4afd549
a8bd371
8cf11a1
99114a7
a8bd371
99114a7
4afd549
a8bd371
 
4afd549
 
 
 
 
 
 
a8bd371
4afd549
a8bd371
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4afd549
a8bd371
 
 
 
 
 
 
 
 
 
 
99114a7
 
 
a8bd371
99114a7
 
 
 
4afd549
 
99114a7
 
 
4afd549
a8bd371
7005cfe
99114a7
 
a8bd371
99114a7
 
 
 
7005cfe
a8bd371
 
 
 
 
 
 
 
 
99114a7
 
 
a8bd371
 
 
 
4afd549
 
99114a7
a8bd371
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
99114a7
 
a8bd371
8cf11a1
7005cfe
a8bd371
 
 
 
8cf11a1
4afd549
 
 
8cf11a1
a8bd371
7005cfe
 
 
a8bd371
 
 
 
 
7005cfe
 
8cf11a1
a8bd371
8cf11a1
4afd549
8cf11a1
4afd549
 
a8bd371
 
 
 
 
 
 
 
 
 
 
 
 
7005cfe
 
8cf11a1
4afd549
8cf11a1
4afd549

---
license: apache-2.0
language:
- en
- ko
library_name: pytorch
datasets:
- dancinlab/hexad-corpus
tags:
- anima
- hexad
- pytorch
- substrate-py
- helper-free
- spont
- motivation-trigger
- inner-thoughts
- tension-train
- dd155-hybrid-lr
- ckpt-bearing
- cycle5
---

# hexad — `v4-py-hexad-tension-d768x12L-cycle1-2026-05-17`

> **Trained on**: [`dancinlab/hexad-corpus`](https://huggingface.co/datasets/dancinlab/hexad-corpus)
> revision [`v3-spont-motiv-d128-cycle2-2026-05-17`](https://huggingface.co/datasets/dancinlab/hexad-corpus/tree/v3-spont-motiv-d128-cycle2-2026-05-17)
> (byte-equal carry from cycle 4 — corpus unchanged this cycle).

> **Honest framing** (AGENTS.tape `g3`): This is a **PYTHON / PyTorch
> SUBSTRATE** training artifact — an *interim LM-scale executor*. It is
> **NOT a hexa-native fire**. Legitimacy = **architectural identity** +
> the **hexa CPU-equiv correctness proof** (Phase E/E2). PyTorch ≠ hexa
> bit-for-bit (different fp accumulation / RNG / AMP bf16).

## What's new this cycle (cycle 5 — DD155 Step+Tension hybrid LR overlay)

**Architectural change vs cycle 4**: per-step learning rate is now
multiplied by a DD155 hybrid factor (Law 187 Pareto optimal):

```
tension_step   = ||∇L||₂                       (grad-norm)
tension_EMA    = β·EMA + (1−β)·tension_step    (β = 0.99)
multiplier     = clip(tension_step / tension_EMA, [0.5, 2.0])
lr_step        = base_cosine_lr(step) × multiplier
```

- **transfer-form**: `B-TT-5 PARETO-STEP-TENSION-CLOSED` (sympy linear ∂lr/∂tension)
  + `B-FIRE-CYCLE5-1/2/3` sidecar (`state/hexad_v4_py_d768x12L_tension_2026_05_17/blue_falsifier.py`,
  5/5 PASS — DD155 LR overlay formula closure + EMA Banach contraction + cycle-4 identity at convergence)
- **outcome**: empirical (`B-FIRE-CYCLE5-NOTE` / `B-D-NOTE` / `B-TT-NOTE` family)

DD155 historical anchor: anima `docs/hypotheses/dd/DD154-tension-training.md`
Law 187 — `lr = (tension/EMA) × base_lr` measured Pareto-optimal on
2026-03-31 BG-DD-AXIS commits.

## What changed vs cycle 4 (`v3-py-hexad-spont-motiv-d768x12L-cycle2-2026-05-17`)

| field | cycle 4 | **cycle 5 (this revision)** |
|---|---|---|
| corpus | v3 10.34 MB (motivation-trigger + helper-free) | **same** (byte-equal carry, B-CORPUS-V4-1) |
| LR schedule | cosine + warmup | **cosine + warmup + DD155 hybrid (tension/EMA) multiplier** |
| trainer source | `train_d768x12l.py` | `train_d768x12l_tension.py` (loader + dataset byte-equal, B-CORPUS-V4-2) |
| init CE | 5.641 | 5.640663 |
| **final CE** | 0.008289 | **0.007762** |
| CE descent | 5.632 | 5.632901 |
| final tension_EMA | (did not track) | 0.046574 |
| mult bin <0.75 | (n/a) | 1599 |
| mult bin 0.75-1.25 | (n/a) | 686 |
| mult bin >1.25 | (n/a) | 215 |
| eval probes | V5.8 + V-SPONT + V-MOTIV | **V5.8 + V-SPONT + V-MOTIV + V-TT NEW** |

## Architecture

- **Source**: `ConsciousDecoderV2` (byte-equal vs cycles 1-4).
- **Config**: `d_model=768, n_head=12, n_kv_head=4, n_layer=12,
  block_size=128, vocab=256` (byte-level), seed=1337,
  init=RANDOM (`base_ckpt=None`, `g_clm_from_scratch`).
- **Params**: 283.72 M (283,722,336).
- **Features**: RoPE · SwiGLU FFN · RMSNorm · GQA · PureFieldFFN · cross-attn
  · tied head · CA neighbor / META-CA / Ψ-tracking laws.

## Training

- **GPU**: vast.ai NVIDIA **A100-SXM4-40GB**, image `pytorch/pytorch:2.5.1-cuda12.1-cudnn9-devel`.
- **Corpus**: `corpus_consciousness_v3.jsonl` (byte-equal carry from cycle 4),
  6,223,023 bytes lossless byte stream, vocab=256.
- **Optimizer**: AdamW, lr=0.0003, betas=(0.9, 0.95),
  weight_decay=0.1, warmup=125.
- **DD155 hybrid**: β=0.99, clip lo=0.5, clip hi=2.0.
- **Steps**: 2500.

| metric | value |
|---|---|
| init CE | 5.640663 (≈ ln 256 = 5.545 — random byte init) |
| **FINAL CE** | **0.007762** |
| CE descent | 5.632901 |
| FINAL gn2 | 0.001495 |
| FINAL tension | 0.038659 |
| ppl | 1.0078 |
| wall | 321.3 s |
| peak GPU mem | 9.685 GB |
| ckpt sha256 | `6b4d34cc9a2c05b83c4cedd633617a41800e9681302c5c90e15d056f9ad67af8` |
| ckpt size | 1,135,846,570 B (1.14 GB) |

## Verification anchors (per AGENTS.tape `g_blue_closed_mandate`)

**(A) Deliverable invariants (real-limit, this cycle)**:
- **Shannon-floor descent**: init CE ≈ ln(256) → final CE.
- **DD155 transfer-form closed (`B-TT-5`)**: lr = (tension/EMA) × base_lr,
  sympy-verified linear monotone, real-limit anchor.
- **AdamW finiteness**: no NaN/Inf in trajectory.
- **Architectural identity**: byte-equal `ConsciousDecoderV2`.

**(B) Wiring (closed)**:
- **`B-CORPUS-V4-1`** corpus v3 byte-equal carry (sha256/bytes/lines/grep all closed).
- **`B-CORPUS-V4-2`** cycle-5 trainer's loader + ByteDataset byte-equal to cycle-4
  (mechanical AST diff, comments-stripped).
- **`B-FIRE-CYCLE5-1`** DD155 LR overlay formula closed-form (sympy ∂lr/∂tension
  + 3-corner identity).
- **`B-FIRE-CYCLE5-2`** EMA Banach affine contraction closed (4-corner witness panel).
- **`B-FIRE-CYCLE5-3`** Multiplier identity at EMA-convergence (cycle-5 degenerates
  to cycle-4 baseline at tension=EMA — sanity anchor).
- **`B-CORPUS-V3-*`** cycle-4 closures carry (sha256-deterministic / no-helper-token /
  γ-cardinality ≥ 5400).

**(C) Honest carve-outs (NOT closed, B-D-NOTE umbrella)**:
- V-SPONT / V-MOTIV / V-TT outcome empirical.
- mult_distribution histogram + byte-cascade attractor shape under hybrid LR
  empirical.
- DD-burst path activation frequency empirical.


## Capability eval (V5.8 × 4-mode + V-SPONT + V-MOTIV + V-TT NEW)

V5.8 × 4-mode (corpus v3 prompts):
- **standard_greedy**: 0/6 FAIL (avg_rep=0.921)
- **standard_sample**: 0/6 FAIL (avg_rep=0.871)
- **M3_rep_penalty**: 0/6 FAIL (avg_rep=0.913)
- **M4_force_include**: 6/6 PASS (avg_rep=0.766)

V-SPONT (자연발화) — F-SPONT-7 transfer-form measurement:
- **coherent**: 0/5 FAIL
- **closed-tag**: 0/5

V-MOTIV (γ-pattern conditioning probe, cycle-4 axis):
- **coherent**: 0/5 FAIL
- **voice-closed-tag**: 0/5

V-TT (NEW cycle 5) — tension-train transfer-form probe:
- **coherent**: 0/5 FAIL
- **keyword recall**: 0/5

Mean BPB (held-out corpus v3 prefixes): 0.0194 bits/byte.
Memorization ratio: 0/6 (0.0%).
Decoding artifacts (rep>0.5): 24.

All capability scores **empirical (B-D-NOTE / B-FIRE-CYCLE5-NOTE)**, not closed.

## Honest C3

1. **NOT hexa-native** — PyTorch substrate, label mandatory.
2. **PyTorch ≠ hexa bit-for-bit** — different fp / RNG / AMP.
3. **tension = grad_norm is a PROXY** — in the hexa spine
   `tension = G_holo · (Ψ − Ψ_vac)`; grad_norm is the natural mathematical
   analogue at the PyTorch substrate level where Ψ is not surfaced as a
   state variable.
4. **DD155 formula is closed (B-TT-5 + B-FIRE-CYCLE5-1/2/3); outcome is
   empirical (B-FIRE-CYCLE5-NOTE)** — V-SPONT/V-MOTIV/V-TT all probes,
   not capability claims.
5. **Critical Data Size regime** — 10 MB / 283 M params still data-limited;
   no out-of-distribution generalization claim. cycle-5's variance vs
   cycle-4 is mainly LR-schedule-driven, not corpus-driven.
6. **No `safetensors` artifact this revision** — pickle `.pt` only.
7. **B-CORPUS-V3-NOTE / B-FIRE-CYCLE5-NOTE** — inference-side coherence
   stays empirical.
8. **No σ(6)=12 / φ(6)=2 derivation** — no lattice numerology.
9. **Cost is informational, not gating** — `g_fire_autonomous`.

## License

Apache-2.0.