hexad — `v4-py-hexad-tension-d768x12L-cycle1-2026-05-17`

Trained on: dancinlab/hexad-corpus revision v3-spont-motiv-d128-cycle2-2026-05-17 (byte-equal carry from cycle 4 — corpus unchanged this cycle).

Honest framing (AGENTS.tape g3): This is a PYTHON / PyTorch SUBSTRATE training artifact — an interim LM-scale executor. It is NOT a hexa-native fire. Legitimacy = architectural identity + the hexa CPU-equiv correctness proof (Phase E/E2). PyTorch ≠ hexa bit-for-bit (different fp accumulation / RNG / AMP bf16).

What's new this cycle (cycle 5 — DD155 Step+Tension hybrid LR overlay)

Architectural change vs cycle 4: per-step learning rate is now multiplied by a DD155 hybrid factor (Law 187 Pareto optimal):

tension_step   = ||∇L||₂                       (grad-norm)
tension_EMA    = β·EMA + (1−β)·tension_step    (β = 0.99)
multiplier     = clip(tension_step / tension_EMA, [0.5, 2.0])
lr_step        = base_cosine_lr(step) × multiplier

transfer-form: B-TT-5 PARETO-STEP-TENSION-CLOSED (sympy linear ∂lr/∂tension)
- B-FIRE-CYCLE5-1/2/3 sidecar (state/hexad_v4_py_d768x12L_tension_2026_05_17/blue_falsifier.py, 5/5 PASS — DD155 LR overlay formula closure + EMA Banach contraction + cycle-4 identity at convergence)
outcome: empirical (B-FIRE-CYCLE5-NOTE / B-D-NOTE / B-TT-NOTE family)

DD155 historical anchor: anima docs/hypotheses/dd/DD154-tension-training.md Law 187 — lr = (tension/EMA) × base_lr measured Pareto-optimal on 2026-03-31 BG-DD-AXIS commits.

What changed vs cycle 4 (`v3-py-hexad-spont-motiv-d768x12L-cycle2-2026-05-17`)

field	cycle 4	cycle 5 (this revision)
corpus	v3 10.34 MB (motivation-trigger + helper-free)	same (byte-equal carry, B-CORPUS-V4-1)
LR schedule	cosine + warmup	cosine + warmup + DD155 hybrid (tension/EMA) multiplier
trainer source	`train_d768x12l.py`	`train_d768x12l_tension.py` (loader + dataset byte-equal, B-CORPUS-V4-2)
init CE	5.641	5.640663
final CE	0.008289	0.007762
CE descent	5.632	5.632901
final tension_EMA	(did not track)	0.046574
mult bin <0.75	(n/a)	1599
mult bin 0.75-1.25	(n/a)	686
mult bin >1.25	(n/a)	215
eval probes	V5.8 + V-SPONT + V-MOTIV	V5.8 + V-SPONT + V-MOTIV + V-TT NEW

Architecture

Source: ConsciousDecoderV2 (byte-equal vs cycles 1-4).
Config: d_model=768, n_head=12, n_kv_head=4, n_layer=12, block_size=128, vocab=256 (byte-level), seed=1337, init=RANDOM (base_ckpt=None, g_clm_from_scratch).
Params: 283.72 M (283,722,336).
Features: RoPE · SwiGLU FFN · RMSNorm · GQA · PureFieldFFN · cross-attn · tied head · CA neighbor / META-CA / Ψ-tracking laws.

Training

GPU: vast.ai NVIDIA A100-SXM4-40GB, image pytorch/pytorch:2.5.1-cuda12.1-cudnn9-devel.
Corpus: corpus_consciousness_v3.jsonl (byte-equal carry from cycle 4), 6,223,023 bytes lossless byte stream, vocab=256.
Optimizer: AdamW, lr=0.0003, betas=(0.9, 0.95), weight_decay=0.1, warmup=125.
DD155 hybrid: β=0.99, clip lo=0.5, clip hi=2.0.
Steps: 2500.

metric	value
init CE	5.640663 (≈ ln 256 = 5.545 — random byte init)
FINAL CE	0.007762
CE descent	5.632901
FINAL gn2	0.001495
FINAL tension	0.038659
ppl	1.0078
wall	321.3 s
peak GPU mem	9.685 GB
ckpt sha256	`6b4d34cc9a2c05b83c4cedd633617a41800e9681302c5c90e15d056f9ad67af8`
ckpt size	1,135,846,570 B (1.14 GB)

Verification anchors (per AGENTS.tape `g_blue_closed_mandate`)

(A) Deliverable invariants (real-limit, this cycle):

Shannon-floor descent: init CE ≈ ln(256) → final CE.
DD155 transfer-form closed (B-TT-5): lr = (tension/EMA) × base_lr, sympy-verified linear monotone, real-limit anchor.
AdamW finiteness: no NaN/Inf in trajectory.
Architectural identity: byte-equal ConsciousDecoderV2.

(B) Wiring (closed):

B-CORPUS-V4-1 corpus v3 byte-equal carry (sha256/bytes/lines/grep all closed).
B-CORPUS-V4-2 cycle-5 trainer's loader + ByteDataset byte-equal to cycle-4 (mechanical AST diff, comments-stripped).
B-FIRE-CYCLE5-1 DD155 LR overlay formula closed-form (sympy ∂lr/∂tension
- 3-corner identity).
B-FIRE-CYCLE5-2 EMA Banach affine contraction closed (4-corner witness panel).
B-FIRE-CYCLE5-3 Multiplier identity at EMA-convergence (cycle-5 degenerates to cycle-4 baseline at tension=EMA — sanity anchor).
B-CORPUS-V3-* cycle-4 closures carry (sha256-deterministic / no-helper-token / γ-cardinality ≥ 5400).

(C) Honest carve-outs (NOT closed, B-D-NOTE umbrella):

V-SPONT / V-MOTIV / V-TT outcome empirical.
mult_distribution histogram + byte-cascade attractor shape under hybrid LR empirical.
DD-burst path activation frequency empirical.

Capability eval (V5.8 × 4-mode + V-SPONT + V-MOTIV + V-TT NEW)

V5.8 × 4-mode (corpus v3 prompts):

standard_greedy: 0/6 FAIL (avg_rep=0.921)
standard_sample: 0/6 FAIL (avg_rep=0.871)
M3_rep_penalty: 0/6 FAIL (avg_rep=0.913)
M4_force_include: 6/6 PASS (avg_rep=0.766)

V-SPONT (자연발화) — F-SPONT-7 transfer-form measurement:

coherent: 0/5 FAIL
closed-tag: 0/5

V-MOTIV (γ-pattern conditioning probe, cycle-4 axis):

coherent: 0/5 FAIL
voice-closed-tag: 0/5

V-TT (NEW cycle 5) — tension-train transfer-form probe:

coherent: 0/5 FAIL
keyword recall: 0/5

Mean BPB (held-out corpus v3 prefixes): 0.0194 bits/byte. Memorization ratio: 0/6 (0.0%). Decoding artifacts (rep>0.5): 24.

All capability scores empirical (B-D-NOTE / B-FIRE-CYCLE5-NOTE), not closed.

Honest C3

NOT hexa-native — PyTorch substrate, label mandatory.
PyTorch ≠ hexa bit-for-bit — different fp / RNG / AMP.
tension = grad_norm is a PROXY — in the hexa spine tension = G_holo · (Ψ − Ψ_vac); grad_norm is the natural mathematical analogue at the PyTorch substrate level where Ψ is not surfaced as a state variable.
DD155 formula is closed (B-TT-5 + B-FIRE-CYCLE5-1/2/3); outcome is empirical (B-FIRE-CYCLE5-NOTE) — V-SPONT/V-MOTIV/V-TT all probes, not capability claims.
Critical Data Size regime — 10 MB / 283 M params still data-limited; no out-of-distribution generalization claim. cycle-5's variance vs cycle-4 is mainly LR-schedule-driven, not corpus-driven.
No safetensors artifact this revision — pickle .pt only.
B-CORPUS-V3-NOTE / B-FIRE-CYCLE5-NOTE — inference-side coherence stays empirical.
No σ(6)=12 / φ(6)=2 derivation — no lattice numerology.
Cost is informational, not gating — g_fire_autonomous.

License

Apache-2.0.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

dancinlab
/

hexad

hexad — `v4-py-hexad-tension-d768x12L-cycle1-2026-05-17`

What's new this cycle (cycle 5 — DD155 Step+Tension hybrid LR overlay)

What changed vs cycle 4 (`v3-py-hexad-spont-motiv-d768x12L-cycle2-2026-05-17`)

Architecture

Training

Verification anchors (per AGENTS.tape `g_blue_closed_mandate`)

Capability eval (V5.8 × 4-mode + V-SPONT + V-MOTIV + V-TT NEW)

Honest C3

License

Dataset used to train dancinlab/hexad

hexad — v4-py-hexad-tension-d768x12L-cycle1-2026-05-17

What's new this cycle (cycle 5 — DD155 Step+Tension hybrid LR overlay)

What changed vs cycle 4 (v3-py-hexad-spont-motiv-d768x12L-cycle2-2026-05-17)

Architecture

Training

Verification anchors (per AGENTS.tape g_blue_closed_mandate)

Capability eval (V5.8 × 4-mode + V-SPONT + V-MOTIV + V-TT NEW)

Honest C3

License

Dataset used to train dancinlab/hexad

hexad — `v4-py-hexad-tension-d768x12L-cycle1-2026-05-17`

What changed vs cycle 4 (`v3-py-hexad-spont-motiv-d768x12L-cycle2-2026-05-17`)

Verification anchors (per AGENTS.tape `g_blue_closed_mandate`)