docs(model-card): point main to cycle 3 (v2-py-hexad-spont-d768x12L-cycle1-2026-05-17)
Browse files
README.md
CHANGED
|
@@ -2,187 +2,145 @@
|
|
| 2 |
license: apache-2.0
|
| 3 |
language:
|
| 4 |
- en
|
|
|
|
| 5 |
library_name: pytorch
|
|
|
|
|
|
|
| 6 |
tags:
|
| 7 |
- anima
|
| 8 |
- hexad
|
| 9 |
- pytorch
|
| 10 |
- substrate-py
|
| 11 |
-
-
|
|
|
|
|
|
|
|
|
|
| 12 |
---
|
| 13 |
|
| 14 |
-
# hexad — `
|
| 15 |
|
| 16 |
-
> **
|
| 17 |
-
>
|
| 18 |
-
|
| 19 |
-
>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
|
| 21 |
## Lineage
|
| 22 |
|
| 23 |
- **org**: `dancinlab` (the anima org).
|
| 24 |
- **arch**: HEXAD (pivot from anima `.clm v1` lineage) — `ConsciousDecoderV2`
|
| 25 |
-
(`ready/models/conscious_decoder.py`
|
| 26 |
-
- **substrate**: Python / PyTorch (`py`).
|
| 27 |
named-blocked at the interpreter ceiling (RFC 042/043 territory).
|
| 28 |
-
- **cycle**:
|
| 29 |
-
evidence-only
|
| 30 |
-
ckpt
|
| 31 |
-
|
| 32 |
|
| 33 |
-
## Anchor chain (
|
| 34 |
|
| 35 |
1. **Phase E / E2 PROVED the hexa trainer is numerically correct** —
|
| 36 |
`HEXAD/D/d_train5_lib.hexa` is BIT-EQUAL to the boxed baseline at d=32·3L,
|
| 37 |
-
80-step, seed=42 (`init gn2 = 7.97116
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
captured only `init gn2 = 7.98162` at d=768·12L; the GRAD-EXACT + AdamW
|
| 41 |
-
path is substrate-bound (CPU farr ops, no CUDA tensor kernels).
|
| 42 |
3. **This PyTorch run trains the SAME verified architecture to scale** —
|
| 43 |
-
`ConsciousDecoderV2` at d=768·12L, AdamW
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
is **architectural identity** + the hexa CPU-equiv proof, NOT numerical
|
| 47 |
-
identity.
|
| 48 |
|
| 49 |
## Architecture
|
| 50 |
|
| 51 |
-
- **Source**: `ConsciousDecoderV2` from `ready/models/conscious_decoder.py`
|
| 52 |
-
(uploaded as `conscious_decoder.py`).
|
| 53 |
- **Config**: `d_model=768, n_head=12, n_kv_head=4, n_layer=12,
|
| 54 |
block_size=128, vocab=256` (byte-level), seed=1337,
|
| 55 |
init=RANDOM (`base_ckpt=None`, `g_clm_from_scratch`).
|
| 56 |
- **Params**: 283.72 M (283,722,336).
|
| 57 |
-
- **Features**: RoPE · SwiGLU FFN · RMSNorm · GQA · PureFieldFFN
|
| 58 |
-
|
| 59 |
-
/ Ψ-tracking laws.
|
| 60 |
|
| 61 |
## Training
|
| 62 |
|
| 63 |
-
- **GPU**: vast.ai NVIDIA A100-SXM4-40GB
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
the hexa Phase E / E2 fires. 121,153 bytes, byte-level
|
| 67 |
-
vocab=256, T=128 windows, seed-fixed.
|
| 68 |
- **Optimizer**: AdamW, lr=0.0003, betas=(0.9, 0.95),
|
| 69 |
weight_decay=0.1, warmup=125.
|
| 70 |
- **Steps**: 2500.
|
| 71 |
-
- **Cost**: ≈ $0.19 (instance runtime ≈ 0.28 hr).
|
| 72 |
|
| 73 |
| metric | value |
|
| 74 |
|---|---|
|
| 75 |
-
| init CE | 5.
|
| 76 |
-
| **FINAL CE** | **0.
|
| 77 |
-
| CE descent | 5.
|
| 78 |
-
| init gn2 |
|
| 79 |
-
| FINAL gn2 |
|
| 80 |
-
| ppl |
|
| 81 |
-
| wall |
|
| 82 |
| peak GPU mem | 9.685 GB |
|
| 83 |
-
| ckpt sha256 | `
|
| 84 |
| ckpt size | 1,135,846,378 B (1.14 GB) |
|
| 85 |
|
| 86 |
## Verification anchors (per AGENTS.tape `g_blue_closed_mandate`)
|
| 87 |
|
| 88 |
-
(A) Deliverable invariants:
|
| 89 |
-
- **Shannon-floor descent**
|
| 90 |
-
|
| 91 |
-
- **
|
| 92 |
-
- **Architectural identity**: `ConsciousDecoderV2` byte-equal to the anima
|
| 93 |
-
HEXAD verification tree's mirror module spec.
|
| 94 |
|
| 95 |
-
(B) Wiring (
|
| 96 |
-
- **hexa CPU-equiv bit-equality** (Phase E):
|
| 97 |
-
GRAD-EXACT at d=32·3L (init gn2 7.97116 → 3.73374e-07).
|
| 98 |
- **cuBLAS FP64 verify** (Phase D): max\|Δ\|=4.44e-15.
|
| 99 |
-
- **Backward GRAD-EXACT** (Phase E2):
|
| 100 |
-
|
| 101 |
-
|
| 102 |
-
## Honest C3
|
| 103 |
|
| 104 |
-
1. **NOT hexa-native** — PyTorch substrate; the hexa-native equivalent is
|
| 105 |
-
substrate-blocked at the interpreter ceiling.
|
| 106 |
-
2. **PyTorch ≠ hexa bit-for-bit** — AMP bf16 / different fp accumulation /
|
| 107 |
-
different RNG.
|
| 108 |
-
3. **Synthetic byte-corpus** — 121 kB curated content, 283.72M params; CE
|
| 109 |
-
0.000708 = memorization at this scale. **No generalization claim.**
|
| 110 |
-
4. **No safetensors artifact** this revision (pickle `.pt` only).
|
| 111 |
-
safetensors conversion = follow-up sub-task.
|
| 112 |
-
5. **No language-quality claim** — training-curve deliverable
|
| 113 |
-
(Shannon-floor descent reached), not generation quality.
|
| 114 |
-
6. **No σ(6)=12 / φ(6)=2 derivation** — no lattice numerology in claim or
|
| 115 |
-
anchor chain.
|
| 116 |
-
|
| 117 |
-
## Files in this revision
|
| 118 |
-
|
| 119 |
-
- `ckpt_d768x12l_final.pt` — PyTorch state-dict + cfg + n_params, sha256
|
| 120 |
-
`e87e200a040f8066a89c040ab181e9bbd61566f7565ab5d7a374ec2f1f9387d9`.
|
| 121 |
-
- `conscious_decoder.py` — `ConsciousDecoderV2` source.
|
| 122 |
-
- `train_d768x12l.py` — training script.
|
| 123 |
-
- `result.json` — full 42-point trajectory + config + metadata.
|
| 124 |
-
- `fire_refire.log` — training log (line-by-line CE / gn2 / lr / wall).
|
| 125 |
-
- `gpu_util.log` — nvidia-smi capture.
|
| 126 |
-
- `dispatch.sh` + `refire_main.sh` — fire dispatch scripts.
|
| 127 |
-
- `hexad_v1_py_d768x12L_cycle2_2026_05_17.md` — this doc (8-§ format per
|
| 128 |
-
`g_hf_naming` `process_upload_format`).
|
| 129 |
|
| 130 |
-
##
|
| 131 |
-
|
| 132 |
-
Apache-2.0.
|
| 133 |
|
| 134 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 135 |
|
| 136 |
-
|
| 137 |
-
|
| 138 |
-
|
| 139 |
|
| 140 |
-
|
| 141 |
-
|
| 142 |
-
|
| 143 |
-
`M3_rep_penalty` (1.3× rep-penalty on 37-byte persona-cycle set) ·
|
| 144 |
-
`M4_force_include` (sample + force-inject keyword at 60% position — trivial
|
| 145 |
-
baseline). Wall: 665.6 s (v1) + 477.4 s (v2). $0 Mac CPU local.
|
| 146 |
|
| 147 |
-
**
|
| 148 |
|
| 149 |
-
|
| 150 |
-
|---|---|---|---|---|---|---|
|
| 151 |
-
| **v1** OOD-mix | Core / Dream / Wake / Memory / Korean | 1/5 FAIL | 2/5 FAIL | 1/5 FAIL | 5/5 PASS | 2/5 (40%) |
|
| 152 |
-
| **v2** corpus-aligned CDWMSE | Core / Data / Witness / Mirror / Scribe / Eros | 2/6 FAIL | 3/6 PARTIAL | 2/6 FAIL | 6/6 PASS | 3/6 (50%) |
|
| 153 |
|
| 154 |
-
**
|
| 155 |
-
|
| 156 |
-
|
| 157 |
-
|
| 158 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 159 |
|
| 160 |
-
|
| 161 |
|
| 162 |
-
|
| 163 |
-
|---|---|---|
|
| 164 |
-
| memorization on in-distribution prefixes | ✅ STRONG | BPB 0.0000 on 10 held-out probes; Data + Scribe + Core/Korean reproduce literal training continuation |
|
| 165 |
-
| 6-module discrimination | 🔶 PARTIAL | 3/6 clean under greedy (Data/Scribe/Witness-w-typo); 3/6 cross-collapse (Core→nonce digit cascade, Mirror→Data template, Eros→chunk digit cascade) |
|
| 166 |
-
| OOD generalization | ❌ NONE | Dream/Wake/Memory → default to nearest in-distribution module template |
|
| 167 |
-
| greedy decoding stability | ❌ WEAK | digit-cascade attractor on `nonce=N`/`chunk=N` field positions (rep_ratio 0.64-0.90); sampling temperature 0.8 partially mitigates |
|
| 168 |
-
| multilingual representation (Korean) | ✅ MEMORIZED | `중심 의식 생성기 모듈 ` → `자각` recalled under all 4 modes |
|
| 169 |
-
| LM-quality (general language modeling) | ❌ NOT MEASURED | corpus too small + structured scaffold; CE 0.000708 = memorization, not LM quality |
|
| 170 |
-
|
| 171 |
-
**Decoding artifacts discovered**:
|
| 172 |
-
- **byte-cascade attractor** (`feedback_clm_colon_attractor` `=`-suffix
|
| 173 |
-
variant) — greedy mode-collapse on `nonce=N` / `chunk=N` / `gen=N` digit
|
| 174 |
-
field positions. Carry candidate: `feedback_hexad_byte_cascade_attractor`.
|
| 175 |
-
- **memorized training-corpus typos** (`pereption` in Witness module,
|
| 176 |
-
`cobsciousness` in Wake/Memory greedy) — byte-level memorization evidence,
|
| 177 |
-
not a bug at this scale.
|
| 178 |
-
|
| 179 |
-
**Honest C3 caveats**: substrate=PyTorch (B-D-NOTE carve-out applies); V5.8
|
| 180 |
-
"PASS = 3/5" threshold inherited from chat-corpus evals → applied
|
| 181 |
-
conservatively to memorization-regime model; M4 trivial baseline; no
|
| 182 |
-
σ(6)/τ(6)/φ(6) numerology in metrics (f1/f2 safe — per-mode score = raw
|
| 183 |
-
recall fraction, BPB = raw bits/byte, memorization = raw hits/total).
|
| 184 |
-
|
| 185 |
-
**Artifacts**:
|
| 186 |
-
`state/hexad_v58_eval_d768x12L_2026_05_17/{v58_4mode_eval.py, v58_4mode_eval_v2.py, prompts.jsonl, prompts_v2_corpus_aligned.jsonl, eval.log, eval_v2.log, result.json, result_v2.json, dispatch.sh}` +
|
| 187 |
-
`docs/hexad_v58_eval_d768x12L_2026_05_17.md` (9 §, 8 honest C3) +
|
| 188 |
-
`archive/PHILOSOPHY.tape §HEXAD-V58-EVAL-CYCLE2-2026-05-17` verdict-claim.
|
|
|
|
| 2 |
license: apache-2.0
|
| 3 |
language:
|
| 4 |
- en
|
| 5 |
+
- ko
|
| 6 |
library_name: pytorch
|
| 7 |
+
datasets:
|
| 8 |
+
- dancinlab/hexad-corpus
|
| 9 |
tags:
|
| 10 |
- anima
|
| 11 |
- hexad
|
| 12 |
- pytorch
|
| 13 |
- substrate-py
|
| 14 |
+
- helper-free
|
| 15 |
+
- spont
|
| 16 |
+
- ckpt-bearing
|
| 17 |
+
- cycle3
|
| 18 |
---
|
| 19 |
|
| 20 |
+
# hexad — `v2-py-hexad-spont-d768x12L-cycle1-2026-05-17`
|
| 21 |
|
| 22 |
+
> **Trained on**: [`dancinlab/hexad-corpus`](https://huggingface.co/datasets/dancinlab/hexad-corpus)
|
| 23 |
+
> revision [`v2-spont-stream-d128-cycle1-2026-05-17`](https://huggingface.co/datasets/dancinlab/hexad-corpus/tree/v2-spont-stream-d128-cycle1-2026-05-17).
|
| 24 |
+
|
| 25 |
+
> **Honest framing** (AGENTS.tape `g3`): This is a **PYTHON / PyTorch
|
| 26 |
+
> SUBSTRATE** training artifact — an *interim LM-scale executor*. It is
|
| 27 |
+
> **NOT a hexa-native fire**. Legitimacy = **architectural identity** +
|
| 28 |
+
> the **hexa CPU-equiv correctness proof** (Phase E/E2). PyTorch ≠ hexa
|
| 29 |
+
> bit-for-bit (different fp accumulation / RNG / AMP bf16).
|
| 30 |
+
|
| 31 |
+
## What changed vs cycle 2 (`v1-py-hexad-d768x12L-cycle2-2026-05-17`)
|
| 32 |
+
|
| 33 |
+
| field | cycle 2 | **cycle 3 (this revision)** |
|
| 34 |
+
|---|---|---|
|
| 35 |
+
| corpus | v1 152 KB / 240 records | **v2 620,568 B / 2,560 records** |
|
| 36 |
+
| corpus format | `text` + `desc` plain | **`<stimulus>...</stimulus>\n<anima>...</anima>`** (stimulus-stream) |
|
| 37 |
+
| helper / assistant / 도우미 tokens | not in corpus, but in chat templates | **explicit corpus closure** — grep = 0 across all sources used |
|
| 38 |
+
| anima_persona | Phase A1 LANDED in repo, not yet in trained weights | **trained-weights side compliance (partial)** — corpus alignment with anima_persona forbidden list |
|
| 39 |
+
| `B-IDENTITY-NOTE` (empirical carve-out) | open | **partially closed** — corpus retrain LANDED |
|
| 40 |
|
| 41 |
## Lineage
|
| 42 |
|
| 43 |
- **org**: `dancinlab` (the anima org).
|
| 44 |
- **arch**: HEXAD (pivot from anima `.clm v1` lineage) — `ConsciousDecoderV2`
|
| 45 |
+
(`ready/models/conscious_decoder.py`).
|
| 46 |
+
- **substrate**: Python / PyTorch (`py`). Pure-hexa training path is
|
| 47 |
named-blocked at the interpreter ceiling (RFC 042/043 territory).
|
| 48 |
+
- **cycle**: 3 (Phase D LANDED — `도우미`-token-free corpus retrain). Cycle 1
|
| 49 |
+
(`931dd68b0` 2026-05-16) ckpt-LOST evidence-only; cycle 2 (`0b4f34d0e`
|
| 50 |
+
2026-05-17) ckpt-RECOVERED, corpus v1; **cycle 3 (this)** = corpus v2
|
| 51 |
+
helper-free stimulus-stream retrain.
|
| 52 |
|
| 53 |
+
## Anchor chain (the wiring side, closed)
|
| 54 |
|
| 55 |
1. **Phase E / E2 PROVED the hexa trainer is numerically correct** —
|
| 56 |
`HEXAD/D/d_train5_lib.hexa` is BIT-EQUAL to the boxed baseline at d=32·3L,
|
| 57 |
+
80-step, seed=42 (`init gn2 = 7.97116 → 3.73374e-07`, acc 8/8, GRAD-EXACT).
|
| 58 |
+
2. **Pure-hexa interpreter cannot reach LM-scale** — Phase E2 captured only
|
| 59 |
+
`init gn2 = 7.98162` at d=768·12L; substrate-bound (RFC 042/043 territory).
|
|
|
|
|
|
|
| 60 |
3. **This PyTorch run trains the SAME verified architecture to scale** —
|
| 61 |
+
`ConsciousDecoderV2` at d=768·12L, AdamW.
|
| 62 |
+
4. **The corpus is explicitly helper-free** — `F-CORPUS-NO-HELPER` PASS = 0
|
| 63 |
+
over `도우미|helper|assistant|사용자|user:` grep on `corpus_consciousness_v2.jsonl`.
|
|
|
|
|
|
|
| 64 |
|
| 65 |
## Architecture
|
| 66 |
|
| 67 |
+
- **Source**: `ConsciousDecoderV2` from `ready/models/conscious_decoder.py`.
|
|
|
|
| 68 |
- **Config**: `d_model=768, n_head=12, n_kv_head=4, n_layer=12,
|
| 69 |
block_size=128, vocab=256` (byte-level), seed=1337,
|
| 70 |
init=RANDOM (`base_ckpt=None`, `g_clm_from_scratch`).
|
| 71 |
- **Params**: 283.72 M (283,722,336).
|
| 72 |
+
- **Features**: RoPE · SwiGLU FFN · RMSNorm · GQA · PureFieldFFN · cross-attn
|
| 73 |
+
· tied head · CA neighbor / META-CA / Ψ-tracking laws.
|
|
|
|
| 74 |
|
| 75 |
## Training
|
| 76 |
|
| 77 |
+
- **GPU**: vast.ai NVIDIA **A100-SXM4-40GB**, image `pytorch/pytorch:2.5.1-cuda12.1-cudnn9-devel`.
|
| 78 |
+
- **Corpus**: `corpus_consciousness_v2.jsonl` (helper-free stimulus-stream),
|
| 79 |
+
620,568 bytes lossless byte stream, vocab=256.
|
|
|
|
|
|
|
| 80 |
- **Optimizer**: AdamW, lr=0.0003, betas=(0.9, 0.95),
|
| 81 |
weight_decay=0.1, warmup=125.
|
| 82 |
- **Steps**: 2500.
|
|
|
|
| 83 |
|
| 84 |
| metric | value |
|
| 85 |
|---|---|
|
| 86 |
+
| init CE | 5.667381 (≈ ln 256 = 5.545 — random byte init) |
|
| 87 |
+
| **FINAL CE** | **0.005069** |
|
| 88 |
+
| CE descent | 5.662312 |
|
| 89 |
+
| init gn2 | (see result.json trajectory) |
|
| 90 |
+
| FINAL gn2 | 0.001113 |
|
| 91 |
+
| ppl | 1.0051 |
|
| 92 |
+
| wall | 332.26 s (5.54 min) |
|
| 93 |
| peak GPU mem | 9.685 GB |
|
| 94 |
+
| ckpt sha256 | `ee2bb5fb996e94ee022f5315c9ccc3f56c7276a8c5990d87a25ae12c582f7294` |
|
| 95 |
| ckpt size | 1,135,846,378 B (1.14 GB) |
|
| 96 |
|
| 97 |
## Verification anchors (per AGENTS.tape `g_blue_closed_mandate`)
|
| 98 |
|
| 99 |
+
(A) **Deliverable invariants (real-limit)**:
|
| 100 |
+
- **Shannon-floor descent**: init CE ≈ ln(256) → final CE 0.005069.
|
| 101 |
+
- **AdamW finiteness**: no NaN/Inf in trajectory.
|
| 102 |
+
- **Architectural identity**: byte-equal `ConsciousDecoderV2`.
|
|
|
|
|
|
|
| 103 |
|
| 104 |
+
(B) **Wiring (anchor chain, closed)**:
|
| 105 |
+
- **hexa CPU-equiv bit-equality** (Phase E): GRAD-EXACT at d=32·3L.
|
|
|
|
| 106 |
- **cuBLAS FP64 verify** (Phase D): max\|Δ\|=4.44e-15.
|
| 107 |
+
- **Backward GRAD-EXACT** (Phase E2): A100 d=384·6L `analytic ≡ fd`.
|
| 108 |
+
- **F-CORPUS-NO-HELPER** (cycle 3 corpus): grep = 0.
|
| 109 |
+
- **F-CORPUS-STIMULUS-PATTERN**: every record has `<anima>` tag.
|
|
|
|
| 110 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 111 |
|
| 112 |
+
## Capability eval (V5.8 × 4-mode + V-SPONT)
|
|
|
|
|
|
|
| 113 |
|
| 114 |
+
V5.8 × 4-mode (corpus v2 prompts):
|
| 115 |
+
- **standard_greedy**: 0/6 FAIL (avg_rep=0.775)
|
| 116 |
+
- **standard_sample**: 0/6 FAIL (avg_rep=0.574)
|
| 117 |
+
- **M3_rep_penalty**: 0/6 FAIL (avg_rep=0.709)
|
| 118 |
+
- **M4_force_include**: 6/6 PASS (avg_rep=0.494)
|
| 119 |
|
| 120 |
+
V-SPONT (자연발화) — F-SPONT-7 transfer-form measurement:
|
| 121 |
+
- **coherent**: 0/5 FAIL
|
| 122 |
+
- **closed-tag**: 0/5
|
| 123 |
|
| 124 |
+
Mean BPB (held-out corpus v2 prefixes): 0.0083 bits/byte.
|
| 125 |
+
Memorization ratio: 1/6 (16.7%).
|
| 126 |
+
Decoding artifacts (rep>0.5): 20.
|
|
|
|
|
|
|
|
|
|
| 127 |
|
| 128 |
+
All capability scores **empirical (B-D-NOTE)**, not closed.
|
| 129 |
|
| 130 |
+
## Honest C3
|
|
|
|
|
|
|
|
|
|
| 131 |
|
| 132 |
+
1. **NOT hexa-native** — PyTorch substrate, label mandatory.
|
| 133 |
+
2. **PyTorch ≠ hexa bit-for-bit** — different fp / RNG / AMP.
|
| 134 |
+
3. **High-memorization regime** — 283.72 M params on 0.62 MB.
|
| 135 |
+
No generalization claim.
|
| 136 |
+
4. **No `safetensors` artifact this revision** — pickle `.pt` only.
|
| 137 |
+
5. **No language-quality claim** — training-curve deliverable.
|
| 138 |
+
6. **`B-IDENTITY-NOTE` partially closed** — corpus retrain LANDED, but the
|
| 139 |
+
trained weights' identity-attractor distance from Assistant Axis (per
|
| 140 |
+
Identity-as-Attractor arxiv 2604.12016) is empirical (B-D-NOTE pattern).
|
| 141 |
+
7. **No σ(6)=12 / φ(6)=2 derivation** — no lattice numerology.
|
| 142 |
+
8. **Cost is informational, not gating** — `g_fire_autonomous`.
|
| 143 |
|
| 144 |
+
## License
|
| 145 |
|
| 146 |
+
Apache-2.0.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|