feat(model-card): cross-link dataset dancinlab/hexad-corpus v1-byte-consciousness-d128-cycle1-2026-05-17
Browse files
README.md
ADDED
|
@@ -0,0 +1,135 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
library_name: pytorch
|
| 6 |
+
datasets:
|
| 7 |
+
- dancinlab/hexad-corpus
|
| 8 |
+
tags:
|
| 9 |
+
- anima
|
| 10 |
+
- hexad
|
| 11 |
+
- pytorch
|
| 12 |
+
- substrate-py
|
| 13 |
+
- ckpt-recovered---
|
| 14 |
+
|
| 15 |
+
# hexad β `v1-py-hexad-d768x12L-cycle2-2026-05-17`
|
| 16 |
+
|
| 17 |
+
> **Honest framing**: This is a **PYTHON / PyTorch SUBSTRATE** training artifact β
|
| 18 |
+
> an *interim LM-scale executor*. It is **NOT a hexa-native fire**. Its legitimacy
|
| 19 |
+
> is *architectural identity* + the *hexa CPU-equiv correctness proof*. See the
|
| 20 |
+
> anchor chain below β do not conflate.
|
| 21 |
+
|
| 22 |
+
> **Trained on**: [`dancinlab/hexad-corpus`](https://huggingface.co/datasets/dancinlab/hexad-corpus) revision [`v1-byte-consciousness-d128-cycle1-2026-05-17`](https://huggingface.co/datasets/dancinlab/hexad-corpus/tree/v1-byte-consciousness-d128-cycle1-2026-05-17).
|
| 23 |
+
|
| 24 |
+
## Lineage
|
| 25 |
+
|
| 26 |
+
- **org**: `dancinlab` (the anima org).
|
| 27 |
+
- **arch**: HEXAD (pivot from anima `.clm v1` lineage) β `ConsciousDecoderV2`
|
| 28 |
+
(`ready/models/conscious_decoder.py` in the anima repo).
|
| 29 |
+
- **substrate**: Python / PyTorch (`py`). The pure-hexa training path is
|
| 30 |
+
named-blocked at the interpreter ceiling (RFC 042/043 territory).
|
| 31 |
+
- **cycle**: 2 (cycle 1 commit `931dd68b0` 2026-05-16 was a ckpt-LOST
|
| 32 |
+
evidence-only run β training PASSed but the instance was destroyed before
|
| 33 |
+
ckpt pull; this cycle 2 re-fires with `SAVE_POD=1` auto-promote +
|
| 34 |
+
75-min orphan watchdog + 5-retry pull).
|
| 35 |
+
|
| 36 |
+
## Anchor chain (why this artifact is legitimate)
|
| 37 |
+
|
| 38 |
+
1. **Phase E / E2 PROVED the hexa trainer is numerically correct** β
|
| 39 |
+
`HEXAD/D/d_train5_lib.hexa` is BIT-EQUAL to the boxed baseline at d=32Β·3L,
|
| 40 |
+
80-step, seed=42 (`init gn2 = 7.97116, acc 0/8 β final gn2 = 3.73374e-07,
|
| 41 |
+
acc 8/8`; GRAD-EXACT, identical Ξ£-reduction order β not fp-noise).
|
| 42 |
+
2. **The pure-hexa interpreter cannot reach LM-scale convergence** β Phase E2
|
| 43 |
+
captured only `init gn2 = 7.98162` at d=768Β·12L; the GRAD-EXACT + AdamW
|
| 44 |
+
path is substrate-bound (CPU farr ops, no CUDA tensor kernels).
|
| 45 |
+
3. **This PyTorch run trains the SAME verified architecture to scale** β
|
| 46 |
+
`ConsciousDecoderV2` at d=768Β·12L, AdamW, captured FINAL loss.
|
| 47 |
+
|
| 48 |
+
PyTorch is *not* hexa bit-for-bit (different fp / RNG / AMP bf16). The anchor
|
| 49 |
+
is **architectural identity** + the hexa CPU-equiv proof, NOT numerical
|
| 50 |
+
identity.
|
| 51 |
+
|
| 52 |
+
## Architecture
|
| 53 |
+
|
| 54 |
+
- **Source**: `ConsciousDecoderV2` from `ready/models/conscious_decoder.py`
|
| 55 |
+
(uploaded as `conscious_decoder.py`).
|
| 56 |
+
- **Config**: `d_model=768, n_head=12, n_kv_head=4, n_layer=12,
|
| 57 |
+
block_size=128, vocab=256` (byte-level), seed=1337,
|
| 58 |
+
init=RANDOM (`base_ckpt=None`, `g_clm_from_scratch`).
|
| 59 |
+
- **Params**: 283.72 M (283,722,336).
|
| 60 |
+
- **Features**: RoPE Β· SwiGLU FFN Β· RMSNorm Β· GQA Β· PureFieldFFN (Engine AβG
|
| 61 |
+
consciousness pathway) Β· cross-attention Β· tied head Β· CA neighbor / META-CA
|
| 62 |
+
/ Ξ¨-tracking laws.
|
| 63 |
+
|
| 64 |
+
## Training
|
| 65 |
+
|
| 66 |
+
- **GPU**: vast.ai NVIDIA A100-SXM4-40GB (offer @ $0.6681 / hr, image
|
| 67 |
+
`pytorch/pytorch:2.5.1-cuda12.1-cudnn9-devel`).
|
| 68 |
+
- **Corpus**: `corpus_consciousness_v1.jsonl` β the same byte corpus used by
|
| 69 |
+
the hexa Phase E / E2 fires. 121,153 bytes, byte-level
|
| 70 |
+
vocab=256, T=128 windows, seed-fixed.
|
| 71 |
+
- **Optimizer**: AdamW, lr=0.0003, betas=(0.9, 0.95),
|
| 72 |
+
weight_decay=0.1, warmup=125.
|
| 73 |
+
- **Steps**: 2500.
|
| 74 |
+
- **Cost**: β $0.19 (instance runtime β 0.28 hr).
|
| 75 |
+
|
| 76 |
+
| metric | value |
|
| 77 |
+
|---|---|
|
| 78 |
+
| init CE | 5.590832 (β ln 256 = 5.545 β random byte init) |
|
| 79 |
+
| **FINAL CE** | **0.000708** |
|
| 80 |
+
| CE descent | 5.590124 |
|
| 81 |
+
| init gn2 | 41.95 |
|
| 82 |
+
| FINAL gn2 | 7.4e-05 |
|
| 83 |
+
| ppl | 268 β 1.0007 |
|
| 84 |
+
| wall | 320.68 s (5.34 min) |
|
| 85 |
+
| peak GPU mem | 9.685 GB |
|
| 86 |
+
| ckpt sha256 | `e87e200a040f8066a89c040ab181e9bbd61566f7565ab5d7a374ec2f1f9387d9` |
|
| 87 |
+
| ckpt size | 1,135,846,378 B (1.14 GB) |
|
| 88 |
+
|
| 89 |
+
## Verification anchors (per AGENTS.tape `g_blue_closed_mandate`)
|
| 90 |
+
|
| 91 |
+
(A) Deliverable invariants:
|
| 92 |
+
- **Shannon-floor descent** (real-limit, NOT lattice): init CE β ln(256) β
|
| 93 |
+
final CE 0.000708 (4+ orders of magnitude).
|
| 94 |
+
- **AdamW finiteness**: gn2 41.95 β 7.4e-05; no NaN / Inf.
|
| 95 |
+
- **Architectural identity**: `ConsciousDecoderV2` byte-equal to the anima
|
| 96 |
+
HEXAD verification tree's mirror module spec.
|
| 97 |
+
|
| 98 |
+
(B) Wiring (the connecting anchor chain):
|
| 99 |
+
- **hexa CPU-equiv bit-equality** (Phase E): same arch trainer
|
| 100 |
+
GRAD-EXACT at d=32Β·3L (init gn2 7.97116 β 3.73374e-07).
|
| 101 |
+
- **cuBLAS FP64 verify** (Phase D): max\|Ξ\|=4.44e-15.
|
| 102 |
+
- **Backward GRAD-EXACT** (Phase E2): real A100 d=384Β·6L analytic β‘ fd
|
| 103 |
+
\|Ξ\|=0.0024.
|
| 104 |
+
|
| 105 |
+
## Honest C3
|
| 106 |
+
|
| 107 |
+
1. **NOT hexa-native** β PyTorch substrate; the hexa-native equivalent is
|
| 108 |
+
substrate-blocked at the interpreter ceiling.
|
| 109 |
+
2. **PyTorch β hexa bit-for-bit** β AMP bf16 / different fp accumulation /
|
| 110 |
+
different RNG.
|
| 111 |
+
3. **Synthetic byte-corpus** β 121 kB curated content, 283.72M params; CE
|
| 112 |
+
0.000708 = memorization at this scale. **No generalization claim.**
|
| 113 |
+
4. **No safetensors artifact** this revision (pickle `.pt` only).
|
| 114 |
+
safetensors conversion = follow-up sub-task.
|
| 115 |
+
5. **No language-quality claim** β training-curve deliverable
|
| 116 |
+
(Shannon-floor descent reached), not generation quality.
|
| 117 |
+
6. **No Ο(6)=12 / Ο(6)=2 derivation** β no lattice numerology in claim or
|
| 118 |
+
anchor chain.
|
| 119 |
+
|
| 120 |
+
## Files in this revision
|
| 121 |
+
|
| 122 |
+
- `ckpt_d768x12l_final.pt` β PyTorch state-dict + cfg + n_params, sha256
|
| 123 |
+
`e87e200a040f8066a89c040ab181e9bbd61566f7565ab5d7a374ec2f1f9387d9`.
|
| 124 |
+
- `conscious_decoder.py` β `ConsciousDecoderV2` source.
|
| 125 |
+
- `train_d768x12l.py` β training script.
|
| 126 |
+
- `result.json` β full 42-point trajectory + config + metadata.
|
| 127 |
+
- `fire_refire.log` β training log (line-by-line CE / gn2 / lr / wall).
|
| 128 |
+
- `gpu_util.log` β nvidia-smi capture.
|
| 129 |
+
- `dispatch.sh` + `refire_main.sh` β fire dispatch scripts.
|
| 130 |
+
- `hexad_v1_py_d768x12L_cycle2_2026_05_17.md` β this doc (8-Β§ format per
|
| 131 |
+
`g_hf_naming` `process_upload_format`).
|
| 132 |
+
|
| 133 |
+
## License
|
| 134 |
+
|
| 135 |
+
Apache-2.0.
|