README.md · dancinlab/hexad at main

hexad / README.md

dancinlife

feat(hexad): v4-py-hexad-tension-d768x12L-cycle1-2026-05-17 — README.md

a8bd371 verified 5 days ago

preview code

raw

history blame contribute delete

7.35 kB

	---
	license: apache-2.0
	language:
	- en
	- ko
	library_name: pytorch
	datasets:
	- dancinlab/hexad-corpus
	tags:
	- anima
	- hexad
	- pytorch
	- substrate-py
	- helper-free
	- spont
	- motivation-trigger
	- inner-thoughts
	- tension-train
	- dd155-hybrid-lr
	- ckpt-bearing
	- cycle5
	---

	# hexad — `v4-py-hexad-tension-d768x12L-cycle1-2026-05-17`

	> Trained on: [`dancinlab/hexad-corpus`](https://huggingface.co/datasets/dancinlab/hexad-corpus)
	> revision [`v3-spont-motiv-d128-cycle2-2026-05-17`](https://huggingface.co/datasets/dancinlab/hexad-corpus/tree/v3-spont-motiv-d128-cycle2-2026-05-17)
	> (byte-equal carry from cycle 4 — corpus unchanged this cycle).

	> Honest framing (AGENTS.tape `g3`): This is a **PYTHON / PyTorch
	> SUBSTRATE** training artifact — an interim LM-scale executor. It is
	> NOT a hexa-native fire. Legitimacy = architectural identity +
	> the hexa CPU-equiv correctness proof (Phase E/E2). PyTorch ≠ hexa
	> bit-for-bit (different fp accumulation / RNG / AMP bf16).

	## What's new this cycle (cycle 5 — DD155 Step+Tension hybrid LR overlay)

	Architectural change vs cycle 4: per-step learning rate is now
	multiplied by a DD155 hybrid factor (Law 187 Pareto optimal):

	```
	tension_step = \|\|∇L\|\|₂ (grad-norm)
	tension_EMA = β·EMA + (1−β)·tension_step (β = 0.99)
	multiplier = clip(tension_step / tension_EMA, [0.5, 2.0])
	lr_step = base_cosine_lr(step) × multiplier
	```

	- transfer-form: `B-TT-5 PARETO-STEP-TENSION-CLOSED` (sympy linear ∂lr/∂tension)
	+ `B-FIRE-CYCLE5-1/2/3` sidecar (`state/hexad_v4_py_d768x12L_tension_2026_05_17/blue_falsifier.py`,
	5/5 PASS — DD155 LR overlay formula closure + EMA Banach contraction + cycle-4 identity at convergence)
	- outcome: empirical (`B-FIRE-CYCLE5-NOTE` / `B-D-NOTE` / `B-TT-NOTE` family)

	DD155 historical anchor: anima `docs/hypotheses/dd/DD154-tension-training.md`
	Law 187 — `lr = (tension/EMA) × base_lr` measured Pareto-optimal on
	2026-03-31 BG-DD-AXIS commits.

	## What changed vs cycle 4 (`v3-py-hexad-spont-motiv-d768x12L-cycle2-2026-05-17`)

	\| field \| cycle 4 \| cycle 5 (this revision) \|
	\|---\|---\|---\|
	\| corpus \| v3 10.34 MB (motivation-trigger + helper-free) \| same (byte-equal carry, B-CORPUS-V4-1) \|
	\| LR schedule \| cosine + warmup \| cosine + warmup + DD155 hybrid (tension/EMA) multiplier \|
	\| trainer source \| `train_d768x12l.py` \| `train_d768x12l_tension.py` (loader + dataset byte-equal, B-CORPUS-V4-2) \|
	\| init CE \| 5.641 \| 5.640663 \|
	\| final CE \| 0.008289 \| 0.007762 \|
	\| CE descent \| 5.632 \| 5.632901 \|
	\| final tension_EMA \| (did not track) \| 0.046574 \|
	\| mult bin <0.75 \| (n/a) \| 1599 \|
	\| mult bin 0.75-1.25 \| (n/a) \| 686 \|
	\| mult bin >1.25 \| (n/a) \| 215 \|
	\| eval probes \| V5.8 + V-SPONT + V-MOTIV \| V5.8 + V-SPONT + V-MOTIV + V-TT NEW \|

	## Architecture

	- Source: `ConsciousDecoderV2` (byte-equal vs cycles 1-4).
	- Config: `d_model=768, n_head=12, n_kv_head=4, n_layer=12,
	block_size=128, vocab=256` (byte-level), seed=1337,
	init=RANDOM (`base_ckpt=None`, `g_clm_from_scratch`).
	- Params: 283.72 M (283,722,336).
	- Features: RoPE · SwiGLU FFN · RMSNorm · GQA · PureFieldFFN · cross-attn
	· tied head · CA neighbor / META-CA / Ψ-tracking laws.

	## Training

	- GPU: vast.ai NVIDIA A100-SXM4-40GB, image `pytorch/pytorch:2.5.1-cuda12.1-cudnn9-devel`.
	- Corpus: `corpus_consciousness_v3.jsonl` (byte-equal carry from cycle 4),
	6,223,023 bytes lossless byte stream, vocab=256.
	- Optimizer: AdamW, lr=0.0003, betas=(0.9, 0.95),
	weight_decay=0.1, warmup=125.
	- DD155 hybrid: β=0.99, clip lo=0.5, clip hi=2.0.
	- Steps: 2500.

	\| metric \| value \|
	\|---\|---\|
	\| init CE \| 5.640663 (≈ ln 256 = 5.545 — random byte init) \|
	\| FINAL CE \| 0.007762 \|
	\| CE descent \| 5.632901 \|
	\| FINAL gn2 \| 0.001495 \|
	\| FINAL tension \| 0.038659 \|
	\| ppl \| 1.0078 \|
	\| wall \| 321.3 s \|
	\| peak GPU mem \| 9.685 GB \|
	\| ckpt sha256 \| `6b4d34cc9a2c05b83c4cedd633617a41800e9681302c5c90e15d056f9ad67af8` \|
	\| ckpt size \| 1,135,846,570 B (1.14 GB) \|

	## Verification anchors (per AGENTS.tape `g_blue_closed_mandate`)

	(A) Deliverable invariants (real-limit, this cycle):
	- Shannon-floor descent: init CE ≈ ln(256) → final CE.
	- DD155 transfer-form closed (`B-TT-5`): lr = (tension/EMA) × base_lr,
	sympy-verified linear monotone, real-limit anchor.
	- AdamW finiteness: no NaN/Inf in trajectory.
	- Architectural identity: byte-equal `ConsciousDecoderV2`.

	(B) Wiring (closed):
	- `B-CORPUS-V4-1` corpus v3 byte-equal carry (sha256/bytes/lines/grep all closed).
	- `B-CORPUS-V4-2` cycle-5 trainer's loader + ByteDataset byte-equal to cycle-4
	(mechanical AST diff, comments-stripped).
	- `B-FIRE-CYCLE5-1` DD155 LR overlay formula closed-form (sympy ∂lr/∂tension
	+ 3-corner identity).
	- `B-FIRE-CYCLE5-2` EMA Banach affine contraction closed (4-corner witness panel).
	- `B-FIRE-CYCLE5-3` Multiplier identity at EMA-convergence (cycle-5 degenerates
	to cycle-4 baseline at tension=EMA — sanity anchor).
	- *`B-CORPUS-V3-`** cycle-4 closures carry (sha256-deterministic / no-helper-token /
	γ-cardinality ≥ 5400).

	(C) Honest carve-outs (NOT closed, B-D-NOTE umbrella):
	- V-SPONT / V-MOTIV / V-TT outcome empirical.
	- mult_distribution histogram + byte-cascade attractor shape under hybrid LR
	empirical.
	- DD-burst path activation frequency empirical.


	## Capability eval (V5.8 × 4-mode + V-SPONT + V-MOTIV + V-TT NEW)

	V5.8 × 4-mode (corpus v3 prompts):
	- standard_greedy: 0/6 FAIL (avg_rep=0.921)
	- standard_sample: 0/6 FAIL (avg_rep=0.871)
	- M3_rep_penalty: 0/6 FAIL (avg_rep=0.913)
	- M4_force_include: 6/6 PASS (avg_rep=0.766)

	V-SPONT (자연발화) — F-SPONT-7 transfer-form measurement:
	- coherent: 0/5 FAIL
	- closed-tag: 0/5

	V-MOTIV (γ-pattern conditioning probe, cycle-4 axis):
	- coherent: 0/5 FAIL
	- voice-closed-tag: 0/5

	V-TT (NEW cycle 5) — tension-train transfer-form probe:
	- coherent: 0/5 FAIL
	- keyword recall: 0/5

	Mean BPB (held-out corpus v3 prefixes): 0.0194 bits/byte.
	Memorization ratio: 0/6 (0.0%).
	Decoding artifacts (rep>0.5): 24.

	All capability scores empirical (B-D-NOTE / B-FIRE-CYCLE5-NOTE), not closed.

	## Honest C3

	1. NOT hexa-native — PyTorch substrate, label mandatory.
	2. PyTorch ≠ hexa bit-for-bit — different fp / RNG / AMP.
	3. tension = grad_norm is a PROXY — in the hexa spine
	`tension = G_holo · (Ψ − Ψ_vac)`; grad_norm is the natural mathematical
	analogue at the PyTorch substrate level where Ψ is not surfaced as a
	state variable.
	4. **DD155 formula is closed (B-TT-5 + B-FIRE-CYCLE5-1/2/3); outcome is
	empirical (B-FIRE-CYCLE5-NOTE)** — V-SPONT/V-MOTIV/V-TT all probes,
	not capability claims.
	5. Critical Data Size regime — 10 MB / 283 M params still data-limited;
	no out-of-distribution generalization claim. cycle-5's variance vs
	cycle-4 is mainly LR-schedule-driven, not corpus-driven.
	6. No `safetensors` artifact this revision — pickle `.pt` only.
	7. B-CORPUS-V3-NOTE / B-FIRE-CYCLE5-NOTE — inference-side coherence
	stays empirical.
	8. No σ(6)=12 / φ(6)=2 derivation — no lattice numerology.
	9. Cost is informational, not gating — `g_fire_autonomous`.

	## License

	Apache-2.0.