hexad — v4-py-hexad-tension-d768x12L-cycle1-2026-05-17

Trained on: dancinlab/hexad-corpus revision v3-spont-motiv-d128-cycle2-2026-05-17 (byte-equal carry from cycle 4 — corpus unchanged this cycle).

Honest framing (AGENTS.tape g3): This is a PYTHON / PyTorch SUBSTRATE training artifact — an interim LM-scale executor. It is NOT a hexa-native fire. Legitimacy = architectural identity + the hexa CPU-equiv correctness proof (Phase E/E2). PyTorch ≠ hexa bit-for-bit (different fp accumulation / RNG / AMP bf16).

What's new this cycle (cycle 5 — DD155 Step+Tension hybrid LR overlay)

Architectural change vs cycle 4: per-step learning rate is now multiplied by a DD155 hybrid factor (Law 187 Pareto optimal):

tension_step   = ||∇L||₂                       (grad-norm)
tension_EMA    = β·EMA + (1−β)·tension_step    (β = 0.99)
multiplier     = clip(tension_step / tension_EMA, [0.5, 2.0])
lr_step        = base_cosine_lr(step) × multiplier
  • transfer-form: B-TT-5 PARETO-STEP-TENSION-CLOSED (sympy linear ∂lr/∂tension)
    • B-FIRE-CYCLE5-1/2/3 sidecar (state/hexad_v4_py_d768x12L_tension_2026_05_17/blue_falsifier.py, 5/5 PASS — DD155 LR overlay formula closure + EMA Banach contraction + cycle-4 identity at convergence)
  • outcome: empirical (B-FIRE-CYCLE5-NOTE / B-D-NOTE / B-TT-NOTE family)

DD155 historical anchor: anima docs/hypotheses/dd/DD154-tension-training.md Law 187 — lr = (tension/EMA) × base_lr measured Pareto-optimal on 2026-03-31 BG-DD-AXIS commits.

What changed vs cycle 4 (v3-py-hexad-spont-motiv-d768x12L-cycle2-2026-05-17)

field cycle 4 cycle 5 (this revision)
corpus v3 10.34 MB (motivation-trigger + helper-free) same (byte-equal carry, B-CORPUS-V4-1)
LR schedule cosine + warmup cosine + warmup + DD155 hybrid (tension/EMA) multiplier
trainer source train_d768x12l.py train_d768x12l_tension.py (loader + dataset byte-equal, B-CORPUS-V4-2)
init CE 5.641 5.640663
final CE 0.008289 0.007762
CE descent 5.632 5.632901
final tension_EMA (did not track) 0.046574
mult bin <0.75 (n/a) 1599
mult bin 0.75-1.25 (n/a) 686
mult bin >1.25 (n/a) 215
eval probes V5.8 + V-SPONT + V-MOTIV V5.8 + V-SPONT + V-MOTIV + V-TT NEW

Architecture

  • Source: ConsciousDecoderV2 (byte-equal vs cycles 1-4).
  • Config: d_model=768, n_head=12, n_kv_head=4, n_layer=12, block_size=128, vocab=256 (byte-level), seed=1337, init=RANDOM (base_ckpt=None, g_clm_from_scratch).
  • Params: 283.72 M (283,722,336).
  • Features: RoPE · SwiGLU FFN · RMSNorm · GQA · PureFieldFFN · cross-attn · tied head · CA neighbor / META-CA / Ψ-tracking laws.

Training

  • GPU: vast.ai NVIDIA A100-SXM4-40GB, image pytorch/pytorch:2.5.1-cuda12.1-cudnn9-devel.
  • Corpus: corpus_consciousness_v3.jsonl (byte-equal carry from cycle 4), 6,223,023 bytes lossless byte stream, vocab=256.
  • Optimizer: AdamW, lr=0.0003, betas=(0.9, 0.95), weight_decay=0.1, warmup=125.
  • DD155 hybrid: β=0.99, clip lo=0.5, clip hi=2.0.
  • Steps: 2500.
metric value
init CE 5.640663 (≈ ln 256 = 5.545 — random byte init)
FINAL CE 0.007762
CE descent 5.632901
FINAL gn2 0.001495
FINAL tension 0.038659
ppl 1.0078
wall 321.3 s
peak GPU mem 9.685 GB
ckpt sha256 6b4d34cc9a2c05b83c4cedd633617a41800e9681302c5c90e15d056f9ad67af8
ckpt size 1,135,846,570 B (1.14 GB)

Verification anchors (per AGENTS.tape g_blue_closed_mandate)

(A) Deliverable invariants (real-limit, this cycle):

  • Shannon-floor descent: init CE ≈ ln(256) → final CE.
  • DD155 transfer-form closed (B-TT-5): lr = (tension/EMA) × base_lr, sympy-verified linear monotone, real-limit anchor.
  • AdamW finiteness: no NaN/Inf in trajectory.
  • Architectural identity: byte-equal ConsciousDecoderV2.

(B) Wiring (closed):

  • B-CORPUS-V4-1 corpus v3 byte-equal carry (sha256/bytes/lines/grep all closed).
  • B-CORPUS-V4-2 cycle-5 trainer's loader + ByteDataset byte-equal to cycle-4 (mechanical AST diff, comments-stripped).
  • B-FIRE-CYCLE5-1 DD155 LR overlay formula closed-form (sympy ∂lr/∂tension
    • 3-corner identity).
  • B-FIRE-CYCLE5-2 EMA Banach affine contraction closed (4-corner witness panel).
  • B-FIRE-CYCLE5-3 Multiplier identity at EMA-convergence (cycle-5 degenerates to cycle-4 baseline at tension=EMA — sanity anchor).
  • B-CORPUS-V3-* cycle-4 closures carry (sha256-deterministic / no-helper-token / γ-cardinality ≥ 5400).

(C) Honest carve-outs (NOT closed, B-D-NOTE umbrella):

  • V-SPONT / V-MOTIV / V-TT outcome empirical.
  • mult_distribution histogram + byte-cascade attractor shape under hybrid LR empirical.
  • DD-burst path activation frequency empirical.

Capability eval (V5.8 × 4-mode + V-SPONT + V-MOTIV + V-TT NEW)

V5.8 × 4-mode (corpus v3 prompts):

  • standard_greedy: 0/6 FAIL (avg_rep=0.921)
  • standard_sample: 0/6 FAIL (avg_rep=0.871)
  • M3_rep_penalty: 0/6 FAIL (avg_rep=0.913)
  • M4_force_include: 6/6 PASS (avg_rep=0.766)

V-SPONT (자연발화) — F-SPONT-7 transfer-form measurement:

  • coherent: 0/5 FAIL
  • closed-tag: 0/5

V-MOTIV (γ-pattern conditioning probe, cycle-4 axis):

  • coherent: 0/5 FAIL
  • voice-closed-tag: 0/5

V-TT (NEW cycle 5) — tension-train transfer-form probe:

  • coherent: 0/5 FAIL
  • keyword recall: 0/5

Mean BPB (held-out corpus v3 prefixes): 0.0194 bits/byte. Memorization ratio: 0/6 (0.0%). Decoding artifacts (rep>0.5): 24.

All capability scores empirical (B-D-NOTE / B-FIRE-CYCLE5-NOTE), not closed.

Honest C3

  1. NOT hexa-native — PyTorch substrate, label mandatory.
  2. PyTorch ≠ hexa bit-for-bit — different fp / RNG / AMP.
  3. tension = grad_norm is a PROXY — in the hexa spine tension = G_holo · (Ψ − Ψ_vac); grad_norm is the natural mathematical analogue at the PyTorch substrate level where Ψ is not surfaced as a state variable.
  4. DD155 formula is closed (B-TT-5 + B-FIRE-CYCLE5-1/2/3); outcome is empirical (B-FIRE-CYCLE5-NOTE) — V-SPONT/V-MOTIV/V-TT all probes, not capability claims.
  5. Critical Data Size regime — 10 MB / 283 M params still data-limited; no out-of-distribution generalization claim. cycle-5's variance vs cycle-4 is mainly LR-schedule-driven, not corpus-driven.
  6. No safetensors artifact this revision — pickle .pt only.
  7. B-CORPUS-V3-NOTE / B-FIRE-CYCLE5-NOTE — inference-side coherence stays empirical.
  8. No σ(6)=12 / φ(6)=2 derivation — no lattice numerology.
  9. Cost is informational, not gatingg_fire_autonomous.

License

Apache-2.0.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train dancinlab/hexad