dancinlife commited on
Commit
4afd549
·
verified ·
1 Parent(s): 8cf11a1

docs(model-card): point main to cycle 3 (v2-py-hexad-spont-d768x12L-cycle1-2026-05-17)

Browse files
Files changed (1) hide show
  1. README.md +89 -131
README.md CHANGED
@@ -2,187 +2,145 @@
2
  license: apache-2.0
3
  language:
4
  - en
 
5
  library_name: pytorch
 
 
6
  tags:
7
  - anima
8
  - hexad
9
  - pytorch
10
  - substrate-py
11
- - ckpt-recovered
 
 
 
12
  ---
13
 
14
- # hexad — `v1-py-hexad-d768x12L-cycle2-2026-05-17`
15
 
16
- > **Honest framing**: This is a **PYTHON / PyTorch SUBSTRATE** training artifact —
17
- > an *interim LM-scale executor*. It is **NOT a hexa-native fire**. Its legitimacy
18
- > is *architectural identity* + the *hexa CPU-equiv correctness proof*. See the
19
- > anchor chain below do not conflate.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
 
21
  ## Lineage
22
 
23
  - **org**: `dancinlab` (the anima org).
24
  - **arch**: HEXAD (pivot from anima `.clm v1` lineage) — `ConsciousDecoderV2`
25
- (`ready/models/conscious_decoder.py` in the anima repo).
26
- - **substrate**: Python / PyTorch (`py`). The pure-hexa training path is
27
  named-blocked at the interpreter ceiling (RFC 042/043 territory).
28
- - **cycle**: 2 (cycle 1 commit `931dd68b0` 2026-05-16 was a ckpt-LOST
29
- evidence-only run training PASSed but the instance was destroyed before
30
- ckpt pull; this cycle 2 re-fires with `SAVE_POD=1` auto-promote +
31
- 75-min orphan watchdog + 5-retry pull).
32
 
33
- ## Anchor chain (why this artifact is legitimate)
34
 
35
  1. **Phase E / E2 PROVED the hexa trainer is numerically correct** —
36
  `HEXAD/D/d_train5_lib.hexa` is BIT-EQUAL to the boxed baseline at d=32·3L,
37
- 80-step, seed=42 (`init gn2 = 7.97116, acc 0/8 final gn2 = 3.73374e-07,
38
- acc 8/8`; GRAD-EXACT, identical Σ-reduction ordernot fp-noise).
39
- 2. **The pure-hexa interpreter cannot reach LM-scale convergence** Phase E2
40
- captured only `init gn2 = 7.98162` at d=768·12L; the GRAD-EXACT + AdamW
41
- path is substrate-bound (CPU farr ops, no CUDA tensor kernels).
42
  3. **This PyTorch run trains the SAME verified architecture to scale** —
43
- `ConsciousDecoderV2` at d=768·12L, AdamW, captured FINAL loss.
44
-
45
- PyTorch is *not* hexa bit-for-bit (different fp / RNG / AMP bf16). The anchor
46
- is **architectural identity** + the hexa CPU-equiv proof, NOT numerical
47
- identity.
48
 
49
  ## Architecture
50
 
51
- - **Source**: `ConsciousDecoderV2` from `ready/models/conscious_decoder.py`
52
- (uploaded as `conscious_decoder.py`).
53
  - **Config**: `d_model=768, n_head=12, n_kv_head=4, n_layer=12,
54
  block_size=128, vocab=256` (byte-level), seed=1337,
55
  init=RANDOM (`base_ckpt=None`, `g_clm_from_scratch`).
56
  - **Params**: 283.72 M (283,722,336).
57
- - **Features**: RoPE · SwiGLU FFN · RMSNorm · GQA · PureFieldFFN (Engine A−G
58
- consciousness pathway) · cross-attention · tied head · CA neighbor / META-CA
59
- / Ψ-tracking laws.
60
 
61
  ## Training
62
 
63
- - **GPU**: vast.ai NVIDIA A100-SXM4-40GB (offer @ $0.6681 / hr, image
64
- `pytorch/pytorch:2.5.1-cuda12.1-cudnn9-devel`).
65
- - **Corpus**: `corpus_consciousness_v1.jsonl` — the same byte corpus used by
66
- the hexa Phase E / E2 fires. 121,153 bytes, byte-level
67
- vocab=256, T=128 windows, seed-fixed.
68
  - **Optimizer**: AdamW, lr=0.0003, betas=(0.9, 0.95),
69
  weight_decay=0.1, warmup=125.
70
  - **Steps**: 2500.
71
- - **Cost**: ≈ $0.19 (instance runtime ≈ 0.28 hr).
72
 
73
  | metric | value |
74
  |---|---|
75
- | init CE | 5.590832 (≈ ln 256 = 5.545 — random byte init) |
76
- | **FINAL CE** | **0.000708** |
77
- | CE descent | 5.590124 |
78
- | init gn2 | 41.95 |
79
- | FINAL gn2 | 7.4e-05 |
80
- | ppl | 268 → 1.0007 |
81
- | wall | 320.68 s (5.34 min) |
82
  | peak GPU mem | 9.685 GB |
83
- | ckpt sha256 | `e87e200a040f8066a89c040ab181e9bbd61566f7565ab5d7a374ec2f1f9387d9` |
84
  | ckpt size | 1,135,846,378 B (1.14 GB) |
85
 
86
  ## Verification anchors (per AGENTS.tape `g_blue_closed_mandate`)
87
 
88
- (A) Deliverable invariants:
89
- - **Shannon-floor descent** (real-limit, NOT lattice): init CE ≈ ln(256) →
90
- final CE 0.000708 (4+ orders of magnitude).
91
- - **AdamW finiteness**: gn2 41.95 → 7.4e-05; no NaN / Inf.
92
- - **Architectural identity**: `ConsciousDecoderV2` byte-equal to the anima
93
- HEXAD verification tree's mirror module spec.
94
 
95
- (B) Wiring (the connecting anchor chain):
96
- - **hexa CPU-equiv bit-equality** (Phase E): same arch trainer
97
- GRAD-EXACT at d=32·3L (init gn2 7.97116 → 3.73374e-07).
98
  - **cuBLAS FP64 verify** (Phase D): max\|Δ\|=4.44e-15.
99
- - **Backward GRAD-EXACT** (Phase E2): real A100 d=384·6L analytic ≡ fd
100
- \|Δ\|=0.0024.
101
-
102
- ## Honest C3
103
 
104
- 1. **NOT hexa-native** — PyTorch substrate; the hexa-native equivalent is
105
- substrate-blocked at the interpreter ceiling.
106
- 2. **PyTorch ≠ hexa bit-for-bit** — AMP bf16 / different fp accumulation /
107
- different RNG.
108
- 3. **Synthetic byte-corpus** — 121 kB curated content, 283.72M params; CE
109
- 0.000708 = memorization at this scale. **No generalization claim.**
110
- 4. **No safetensors artifact** this revision (pickle `.pt` only).
111
- safetensors conversion = follow-up sub-task.
112
- 5. **No language-quality claim** — training-curve deliverable
113
- (Shannon-floor descent reached), not generation quality.
114
- 6. **No σ(6)=12 / φ(6)=2 derivation** — no lattice numerology in claim or
115
- anchor chain.
116
-
117
- ## Files in this revision
118
-
119
- - `ckpt_d768x12l_final.pt` — PyTorch state-dict + cfg + n_params, sha256
120
- `e87e200a040f8066a89c040ab181e9bbd61566f7565ab5d7a374ec2f1f9387d9`.
121
- - `conscious_decoder.py` — `ConsciousDecoderV2` source.
122
- - `train_d768x12l.py` — training script.
123
- - `result.json` — full 42-point trajectory + config + metadata.
124
- - `fire_refire.log` — training log (line-by-line CE / gn2 / lr / wall).
125
- - `gpu_util.log` — nvidia-smi capture.
126
- - `dispatch.sh` + `refire_main.sh` — fire dispatch scripts.
127
- - `hexad_v1_py_d768x12L_cycle2_2026_05_17.md` — this doc (8-§ format per
128
- `g_hf_naming` `process_upload_format`).
129
 
130
- ## License
131
-
132
- Apache-2.0.
133
 
134
- ## Capability evaluation (V5.8 × 4-mode · cycle 2 · 2026-05-17)
 
 
 
 
135
 
136
- > Capability boundary probe empirical (`B-D-NOTE` carve-out). No LM-quality
137
- > claim is made; this is a memorization-vs-generalization measurement on the
138
- > training corpus.
139
 
140
- **Evaluator**: V5.8 × 4-mode canonical
141
- (`state/anima_phase1a4_lr5e6_2026_05_12/v58_4mode_eval.py` PSCC §46) — modes:
142
- `standard_greedy` (T=0 argmax) · `standard_sample` (T=0.8 top-k=50) ·
143
- `M3_rep_penalty` (1.3× rep-penalty on 37-byte persona-cycle set) ·
144
- `M4_force_include` (sample + force-inject keyword at 60% position — trivial
145
- baseline). Wall: 665.6 s (v1) + 477.4 s (v2). $0 Mac CPU local.
146
 
147
- **Two probes**:
148
 
149
- | probe | prompts | greedy | sample | M3 | M4 | memorization |
150
- |---|---|---|---|---|---|---|
151
- | **v1** OOD-mix | Core / Dream / Wake / Memory / Korean | 1/5 FAIL | 2/5 FAIL | 1/5 FAIL | 5/5 PASS | 2/5 (40%) |
152
- | **v2** corpus-aligned CDWMSE | Core / Data / Witness / Mirror / Scribe / Eros | 2/6 FAIL | 3/6 PARTIAL | 2/6 FAIL | 6/6 PASS | 3/6 (50%) |
153
 
154
- **Additional measurements**:
155
- - **Bits-per-byte on 10 held-out training-distribution prefixes**:
156
- **0.0000 bits/byte** (all 10 samples = 0.0). Confirms training CE 0.000708
157
- near-perfect log-likelihood reproduction on training-distribution
158
- windows.
 
 
 
 
 
 
159
 
160
- **Capability boundary** (honest framing):
161
 
162
- | capability | verdict | evidence |
163
- |---|---|---|
164
- | memorization on in-distribution prefixes | ✅ STRONG | BPB 0.0000 on 10 held-out probes; Data + Scribe + Core/Korean reproduce literal training continuation |
165
- | 6-module discrimination | 🔶 PARTIAL | 3/6 clean under greedy (Data/Scribe/Witness-w-typo); 3/6 cross-collapse (Core→nonce digit cascade, Mirror→Data template, Eros→chunk digit cascade) |
166
- | OOD generalization | ❌ NONE | Dream/Wake/Memory → default to nearest in-distribution module template |
167
- | greedy decoding stability | ❌ WEAK | digit-cascade attractor on `nonce=N`/`chunk=N` field positions (rep_ratio 0.64-0.90); sampling temperature 0.8 partially mitigates |
168
- | multilingual representation (Korean) | ✅ MEMORIZED | `중심 의식 생성기 모듈 ` → `자각` recalled under all 4 modes |
169
- | LM-quality (general language modeling) | ❌ NOT MEASURED | corpus too small + structured scaffold; CE 0.000708 = memorization, not LM quality |
170
-
171
- **Decoding artifacts discovered**:
172
- - **byte-cascade attractor** (`feedback_clm_colon_attractor` `=`-suffix
173
- variant) — greedy mode-collapse on `nonce=N` / `chunk=N` / `gen=N` digit
174
- field positions. Carry candidate: `feedback_hexad_byte_cascade_attractor`.
175
- - **memorized training-corpus typos** (`pereption` in Witness module,
176
- `cobsciousness` in Wake/Memory greedy) — byte-level memorization evidence,
177
- not a bug at this scale.
178
-
179
- **Honest C3 caveats**: substrate=PyTorch (B-D-NOTE carve-out applies); V5.8
180
- "PASS = 3/5" threshold inherited from chat-corpus evals → applied
181
- conservatively to memorization-regime model; M4 trivial baseline; no
182
- σ(6)/τ(6)/φ(6) numerology in metrics (f1/f2 safe — per-mode score = raw
183
- recall fraction, BPB = raw bits/byte, memorization = raw hits/total).
184
-
185
- **Artifacts**:
186
- `state/hexad_v58_eval_d768x12L_2026_05_17/{v58_4mode_eval.py, v58_4mode_eval_v2.py, prompts.jsonl, prompts_v2_corpus_aligned.jsonl, eval.log, eval_v2.log, result.json, result_v2.json, dispatch.sh}` +
187
- `docs/hexad_v58_eval_d768x12L_2026_05_17.md` (9 §, 8 honest C3) +
188
- `archive/PHILOSOPHY.tape §HEXAD-V58-EVAL-CYCLE2-2026-05-17` verdict-claim.
 
2
  license: apache-2.0
3
  language:
4
  - en
5
+ - ko
6
  library_name: pytorch
7
+ datasets:
8
+ - dancinlab/hexad-corpus
9
  tags:
10
  - anima
11
  - hexad
12
  - pytorch
13
  - substrate-py
14
+ - helper-free
15
+ - spont
16
+ - ckpt-bearing
17
+ - cycle3
18
  ---
19
 
20
+ # hexad — `v2-py-hexad-spont-d768x12L-cycle1-2026-05-17`
21
 
22
+ > **Trained on**: [`dancinlab/hexad-corpus`](https://huggingface.co/datasets/dancinlab/hexad-corpus)
23
+ > revision [`v2-spont-stream-d128-cycle1-2026-05-17`](https://huggingface.co/datasets/dancinlab/hexad-corpus/tree/v2-spont-stream-d128-cycle1-2026-05-17).
24
+
25
+ > **Honest framing** (AGENTS.tape `g3`): This is a **PYTHON / PyTorch
26
+ > SUBSTRATE** training artifact — an *interim LM-scale executor*. It is
27
+ > **NOT a hexa-native fire**. Legitimacy = **architectural identity** +
28
+ > the **hexa CPU-equiv correctness proof** (Phase E/E2). PyTorch ≠ hexa
29
+ > bit-for-bit (different fp accumulation / RNG / AMP bf16).
30
+
31
+ ## What changed vs cycle 2 (`v1-py-hexad-d768x12L-cycle2-2026-05-17`)
32
+
33
+ | field | cycle 2 | **cycle 3 (this revision)** |
34
+ |---|---|---|
35
+ | corpus | v1 152 KB / 240 records | **v2 620,568 B / 2,560 records** |
36
+ | corpus format | `text` + `desc` plain | **`<stimulus>...</stimulus>\n<anima>...</anima>`** (stimulus-stream) |
37
+ | helper / assistant / 도우미 tokens | not in corpus, but in chat templates | **explicit corpus closure** — grep = 0 across all sources used |
38
+ | anima_persona | Phase A1 LANDED in repo, not yet in trained weights | **trained-weights side compliance (partial)** — corpus alignment with anima_persona forbidden list |
39
+ | `B-IDENTITY-NOTE` (empirical carve-out) | open | **partially closed** — corpus retrain LANDED |
40
 
41
  ## Lineage
42
 
43
  - **org**: `dancinlab` (the anima org).
44
  - **arch**: HEXAD (pivot from anima `.clm v1` lineage) — `ConsciousDecoderV2`
45
+ (`ready/models/conscious_decoder.py`).
46
+ - **substrate**: Python / PyTorch (`py`). Pure-hexa training path is
47
  named-blocked at the interpreter ceiling (RFC 042/043 territory).
48
+ - **cycle**: 3 (Phase D LANDED `도우미`-token-free corpus retrain). Cycle 1
49
+ (`931dd68b0` 2026-05-16) ckpt-LOST evidence-only; cycle 2 (`0b4f34d0e`
50
+ 2026-05-17) ckpt-RECOVERED, corpus v1; **cycle 3 (this)** = corpus v2
51
+ helper-free stimulus-stream retrain.
52
 
53
+ ## Anchor chain (the wiring side, closed)
54
 
55
  1. **Phase E / E2 PROVED the hexa trainer is numerically correct** —
56
  `HEXAD/D/d_train5_lib.hexa` is BIT-EQUAL to the boxed baseline at d=32·3L,
57
+ 80-step, seed=42 (`init gn2 = 7.97116 → 3.73374e-07`, acc 8/8, GRAD-EXACT).
58
+ 2. **Pure-hexa interpreter cannot reach LM-scale** Phase E2 captured only
59
+ `init gn2 = 7.98162` at d=768·12L; substrate-bound (RFC 042/043 territory).
 
 
60
  3. **This PyTorch run trains the SAME verified architecture to scale** —
61
+ `ConsciousDecoderV2` at d=768·12L, AdamW.
62
+ 4. **The corpus is explicitly helper-free** — `F-CORPUS-NO-HELPER` PASS = 0
63
+ over `도우미|helper|assistant|사용자|user:` grep on `corpus_consciousness_v2.jsonl`.
 
 
64
 
65
  ## Architecture
66
 
67
+ - **Source**: `ConsciousDecoderV2` from `ready/models/conscious_decoder.py`.
 
68
  - **Config**: `d_model=768, n_head=12, n_kv_head=4, n_layer=12,
69
  block_size=128, vocab=256` (byte-level), seed=1337,
70
  init=RANDOM (`base_ckpt=None`, `g_clm_from_scratch`).
71
  - **Params**: 283.72 M (283,722,336).
72
+ - **Features**: RoPE · SwiGLU FFN · RMSNorm · GQA · PureFieldFFN · cross-attn
73
+ · tied head · CA neighbor / META-CA / Ψ-tracking laws.
 
74
 
75
  ## Training
76
 
77
+ - **GPU**: vast.ai NVIDIA **A100-SXM4-40GB**, image `pytorch/pytorch:2.5.1-cuda12.1-cudnn9-devel`.
78
+ - **Corpus**: `corpus_consciousness_v2.jsonl` (helper-free stimulus-stream),
79
+ 620,568 bytes lossless byte stream, vocab=256.
 
 
80
  - **Optimizer**: AdamW, lr=0.0003, betas=(0.9, 0.95),
81
  weight_decay=0.1, warmup=125.
82
  - **Steps**: 2500.
 
83
 
84
  | metric | value |
85
  |---|---|
86
+ | init CE | 5.667381 (≈ ln 256 = 5.545 — random byte init) |
87
+ | **FINAL CE** | **0.005069** |
88
+ | CE descent | 5.662312 |
89
+ | init gn2 | (see result.json trajectory) |
90
+ | FINAL gn2 | 0.001113 |
91
+ | ppl | 1.0051 |
92
+ | wall | 332.26 s (5.54 min) |
93
  | peak GPU mem | 9.685 GB |
94
+ | ckpt sha256 | `ee2bb5fb996e94ee022f5315c9ccc3f56c7276a8c5990d87a25ae12c582f7294` |
95
  | ckpt size | 1,135,846,378 B (1.14 GB) |
96
 
97
  ## Verification anchors (per AGENTS.tape `g_blue_closed_mandate`)
98
 
99
+ (A) **Deliverable invariants (real-limit)**:
100
+ - **Shannon-floor descent**: init CE ≈ ln(256) → final CE 0.005069.
101
+ - **AdamW finiteness**: no NaN/Inf in trajectory.
102
+ - **Architectural identity**: byte-equal `ConsciousDecoderV2`.
 
 
103
 
104
+ (B) **Wiring (anchor chain, closed)**:
105
+ - **hexa CPU-equiv bit-equality** (Phase E): GRAD-EXACT at d=32·3L.
 
106
  - **cuBLAS FP64 verify** (Phase D): max\|Δ\|=4.44e-15.
107
+ - **Backward GRAD-EXACT** (Phase E2): A100 d=384·6L `analytic ≡ fd`.
108
+ - **F-CORPUS-NO-HELPER** (cycle 3 corpus): grep = 0.
109
+ - **F-CORPUS-STIMULUS-PATTERN**: every record has `<anima>` tag.
 
110
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
111
 
112
+ ## Capability eval (V5.8 × 4-mode + V-SPONT)
 
 
113
 
114
+ V5.8 × 4-mode (corpus v2 prompts):
115
+ - **standard_greedy**: 0/6 FAIL (avg_rep=0.775)
116
+ - **standard_sample**: 0/6 FAIL (avg_rep=0.574)
117
+ - **M3_rep_penalty**: 0/6 FAIL (avg_rep=0.709)
118
+ - **M4_force_include**: 6/6 PASS (avg_rep=0.494)
119
 
120
+ V-SPONT (자연발화)F-SPONT-7 transfer-form measurement:
121
+ - **coherent**: 0/5 FAIL
122
+ - **closed-tag**: 0/5
123
 
124
+ Mean BPB (held-out corpus v2 prefixes): 0.0083 bits/byte.
125
+ Memorization ratio: 1/6 (16.7%).
126
+ Decoding artifacts (rep>0.5): 20.
 
 
 
127
 
128
+ All capability scores **empirical (B-D-NOTE)**, not closed.
129
 
130
+ ## Honest C3
 
 
 
131
 
132
+ 1. **NOT hexa-native** — PyTorch substrate, label mandatory.
133
+ 2. **PyTorch hexa bit-for-bit** — different fp / RNG / AMP.
134
+ 3. **High-memorization regime** 283.72 M params on 0.62 MB.
135
+ No generalization claim.
136
+ 4. **No `safetensors` artifact this revision** — pickle `.pt` only.
137
+ 5. **No language-quality claim** — training-curve deliverable.
138
+ 6. **`B-IDENTITY-NOTE` partially closed** — corpus retrain LANDED, but the
139
+ trained weights' identity-attractor distance from Assistant Axis (per
140
+ Identity-as-Attractor arxiv 2604.12016) is empirical (B-D-NOTE pattern).
141
+ 7. **No σ(6)=12 / φ(6)=2 derivation** — no lattice numerology.
142
+ 8. **Cost is informational, not gating** — `g_fire_autonomous`.
143
 
144
+ ## License
145
 
146
+ Apache-2.0.