karlexmarin Claude Opus 4.7 (1M context) commited on
Commit
abea671
·
1 Parent(s): 3ebadcd

feat(v0.4): add 3 diagnostic recipes from sesión 29 cross-model panel

Browse files

New TAF formulas (sesión 29 findings, 2026-04-28, n=22 LLM panel):
- §28 ν = −1/(2π) learned-imprint slope (DERIVED + empirical err 0.3%)
- §29 K = γ × log(N²·D) Chinchilla-attention invariant (CV=0.329)
- §30 sign(γ_text − γ_random) IH-formation discriminator
- §31 γ-cluster on famous constants (CodeLlama=1−1/φ, etc — n=4 intriguing)

New Python functions (python/taf_browser.py):
- gamma_random_predict(theta, T_eval, n_params_M) — F1 imprint formula
- imprint_purity(...) — diagnostic with ±0.18 CI
- compute_invariant_K(...) — F2 with z-score vs panel
- ih_phase_check(...) — F4 Δγ probe
- gamma_decompose_v2(...) — 6-axis with imprint + instruct
- famous_constant_proximity(...) — golden-ratio detector

New recipes:
- X-21 Imprint Purity Diagnostic (predicts γ_random, classifies cleanliness)
- X-22 Compute-Context Invariant (K-band membership check)
- X-23 IH-Phase Detector (Δγ probe + size-consistency check)

UI updates:
- Help modal expanded with v0.4 section in 4 languages (EN/ES/FR/ZH)
- Recipe count updated 5 → 8
- New help.recipe.x{21,22,23} keys + help.section.v04 + help.v04.{imprint,invariant,ih_probe,constants}

README adds:
- Diagnostic recipes block (X-21/X-22/X-23) under "What it does"
- "What's new in v0.4" section with formulas and use cases

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Files changed (4) hide show
  1. README.md +68 -9
  2. index.html +28 -1
  3. js/i18n.js +48 -4
  4. python/taf_browser.py +355 -0
README.md CHANGED
@@ -33,7 +33,7 @@ language:
33
 
34
  **🌐 Live**: https://karlesmarin.github.io/tafagent
35
  **📦 Source**: https://github.com/karlesmarin/tafagent
36
- **📄 Paper**: [Transformer Thermodynamics — Marin 2026](https://github.com/karlesmarin/NeurIPS)
37
 
38
  ---
39
 
@@ -59,15 +59,21 @@ Drop in a model id (or paste any HuggingFace public model), get a
59
  falsifiable answer to "**will this work?**" — backed by the
60
  Thermodynamic Attention Framework (TAF) formulas:
61
 
 
62
  - *Will Llama-3-8B serve 32K context with NIAH retrieval?* → **X-2**
63
  - *Should I train a custom 7B model or pay for API access?* → **X-1**
64
  - *I have $5,000 — what model can I afford to train?* → **X-3**
65
  - *Cheapest GPU to serve Llama-70B at 100M tokens/day?* → **X-5**
66
  - *Soft KV decay or hard cutoff for compression?* → **X-19**
67
 
68
- Each as a chain of TAF formulas (paper §17, §19, §20, §24, §26) rendered
69
- with full audit trail. Every number is deterministic Python; nothing
70
- is hallucinated.
 
 
 
 
 
71
 
72
  ## Four ways to use it
73
 
@@ -152,9 +158,61 @@ paper (343 JSON files, ~5.5 MB). See `data/README.md` for the layout.
152
  - ~2 GB free RAM for the synthesis LLM
153
  - ~350 MB disk for model cache (one-time)
154
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
155
  ## How you can help
156
 
157
- This tool is at v0.3. There's a long way to go.
158
 
159
  - **🐛 Report bugs**: https://github.com/karlesmarin/tafagent/issues
160
  - **🌐 Translate**: add a language to `js/i18n.js`, send a PR
@@ -171,12 +229,13 @@ This tool is at v0.3. There's a long way to go.
171
  If this tool helps you — paper or code:
172
 
173
  ```bibtex
174
- @article{marin2026transformer_thermodynamics,
175
  author = {Marin, Carles},
176
- title = {Transformer Thermodynamics: A Closed-Form Theory of Attention Decay,
177
- Phase Transitions, and Context-Length Limits in RoPE Language Models},
 
178
  year = {2026},
179
- url = {https://github.com/karlesmarin/NeurIPS},
180
  }
181
 
182
  @misc{marin2026tafagent,
 
33
 
34
  **🌐 Live**: https://karlesmarin.github.io/tafagent
35
  **📦 Source**: https://github.com/karlesmarin/tafagent
36
+ **📄 Paper**: [Predicting How Transformers Atten — Marin 2026](https://zenodo.org/records/19826343)
37
 
38
  ---
39
 
 
59
  falsifiable answer to "**will this work?**" — backed by the
60
  Thermodynamic Attention Framework (TAF) formulas:
61
 
62
+ **Decision recipes**
63
  - *Will Llama-3-8B serve 32K context with NIAH retrieval?* → **X-2**
64
  - *Should I train a custom 7B model or pay for API access?* → **X-1**
65
  - *I have $5,000 — what model can I afford to train?* → **X-3**
66
  - *Cheapest GPU to serve Llama-70B at 100M tokens/day?* → **X-5**
67
  - *Soft KV decay or hard cutoff for compression?* → **X-19**
68
 
69
+ **Diagnostic recipes** (NEW v0.4 sesión 29 findings 2026-04-28)
70
+ - *How much positional bias did training imprint on this model?* → **X-21**
71
+ - *Does this model fit the empirical compute-context invariant band?* → **X-22**
72
+ - *Is this checkpoint pre- or post-induction-head?* → **X-23**
73
+
74
+ Each as a chain of TAF formulas (paper §17, §19, §20, §24, §26, §28-§30)
75
+ rendered with full audit trail. Every number is deterministic Python;
76
+ nothing is hallucinated.
77
 
78
  ## Four ways to use it
79
 
 
158
  - ~2 GB free RAM for the synthesis LLM
159
  - ~350 MB disk for model cache (one-time)
160
 
161
+ ## What's new in v0.4 (2026-04-28)
162
+
163
+ Three new diagnostic recipes derived from cross-model panel analysis (n=22 LLMs):
164
+
165
+ ### X-21 — Imprint Purity Diagnostic
166
+ Predicts γ on RANDOM-token input via the **learned-imprint formula**:
167
+
168
+ ```
169
+ γ_random = γ_pade(θ, T) + ν · log_10(P / 14M)
170
+ ν = −1/(2π) ≈ −0.1592 (DERIVED from RoPE rotation period)
171
+ ```
172
+
173
+ Even on random tokens, weights apply a learned positional bias proportional
174
+ to log(N_params). The slope ν is **fixed** (not fitted) — derivable from
175
+ RoPE's 2π rotation period. Empirical validation: n=22 LLMs, p=0.022, |err|=0.3%.
176
+
177
+ **Use case**: detect anomalous training, format conversion (e.g. OLMo native
178
+ vs HF Δγ=0.30), or fine-tuning drift by comparing predicted vs measured
179
+ γ_random.
180
+
181
+ ### X-22 — Compute-Context Invariant
182
+ Computes the empirical Chinchilla×attention invariant:
183
+
184
+ ```
185
+ K = γ × log(N² · D) where D = 20·N (Chinchilla compute-optimal)
186
+ Empirical band: K ∈ [34, 68] (51.2 ± 16.8, CV=0.329, n=22)
187
+ ```
188
+
189
+ K-outliers indicate scaling/training anomalies. Llama-3-8B with γ=1.045
190
+ gives K=74.6 (z=1.39, high-K OUTLIER) — flags supra-Padé attention.
191
+
192
+ ### X-23 — IH-Phase Detector
193
+ Uses the Δγ probe (cheaper than ICL benchmark):
194
+
195
+ ```
196
+ sign(γ_text − γ_random) > 0 ⟺ post-induction-head formation
197
+ ```
198
+
199
+ Pre-IH (P<400M, n=7): ⟨Δγ⟩=−0.19±0.26
200
+ Post-IH (P≥400M, n=15): ⟨Δγ⟩=+0.03±0.26
201
+
202
+ **Use case**: monitor training trajectories without running ICL benchmarks;
203
+ detect anomalous checkpoints.
204
+
205
+ ### Other v0.4 additions
206
+
207
+ - `gamma_decompose_v2(...)` — 6-axis decomposition with the new imprint axis
208
+ - `famous_constant_proximity(...)` — detects γ-cluster on famous constants
209
+ (e.g. CodeLlama-13b γ=0.382 ≈ 1−1/φ golden conjugate)
210
+
211
+ ---
212
+
213
  ## How you can help
214
 
215
+ This tool is at v0.4. There's a long way to go.
216
 
217
  - **🐛 Report bugs**: https://github.com/karlesmarin/tafagent/issues
218
  - **🌐 Translate**: add a language to `js/i18n.js`, send a PR
 
229
  If this tool helps you — paper or code:
230
 
231
  ```bibtex
232
+ @article{marin2026Predicting How Transformers Atten,
233
  author = {Marin, Carles},
234
+ title = {Predicting How Transformers Attend
235
+ Analytic Power-Law Theory, Phase Transitions, and Practical Compression
236
+ Tools},
237
  year = {2026},
238
+ url = {https://zenodo.org/records/19826343},
239
  }
240
 
241
  @misc{marin2026tafagent,
index.html CHANGED
@@ -77,7 +77,7 @@
77
  <p data-i18n="help.modes.ask"><strong>💬 Ask plain English</strong>: free-form question, in-browser LLM picks the recipe. Best for casual exploration.</p>
78
  <p data-i18n="help.modes.recipe"><strong>📋 Recipe + form</strong>: manual selection, full parameter control. Best when you want exact control.</p>
79
 
80
- <h3 data-i18n="help.recipes.title">The 5 recipes available</h3>
81
 
82
  <p data-i18n="help.recipe.x1.title"><strong>X-1 Custom training vs API</strong> — compares cost of training your own model vs paying for API access.</p>
83
  <div class="help-example" data-i18n="help.recipe.x1.example">
@@ -110,6 +110,33 @@
110
  Answer: USE SOFT DECAY / USE D_f CUTOFF / USE LITERATURE METHODS / USE HARD T_train.
111
  </div>
112
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
113
  <h3 data-i18n="help.add_models.title">Adding new models (3 ways)</h3>
114
  <ul>
115
  <li data-i18n="help.add_models.preset"><strong>Preset list</strong>: 11 popular models curated. Just select from dropdown.</li>
 
77
  <p data-i18n="help.modes.ask"><strong>💬 Ask plain English</strong>: free-form question, in-browser LLM picks the recipe. Best for casual exploration.</p>
78
  <p data-i18n="help.modes.recipe"><strong>📋 Recipe + form</strong>: manual selection, full parameter control. Best when you want exact control.</p>
79
 
80
+ <h3 data-i18n="help.recipes.title">The 8 recipes available</h3>
81
 
82
  <p data-i18n="help.recipe.x1.title"><strong>X-1 Custom training vs API</strong> — compares cost of training your own model vs paying for API access.</p>
83
  <div class="help-example" data-i18n="help.recipe.x1.example">
 
110
  Answer: USE SOFT DECAY / USE D_f CUTOFF / USE LITERATURE METHODS / USE HARD T_train.
111
  </div>
112
 
113
+ <h3 style="margin-top: 1.5em;">— v0.4 (sesión 29 findings) —</h3>
114
+
115
+ <p data-i18n="help.section.v04"><strong>What's new in v0.4</strong> (sesión 29 findings 2026-04-28): three diagnostic recipes derived from cross-model panel analysis (n=22 LLMs).</p>
116
+
117
+ <p data-i18n="help.recipe.x21.title"><strong>X-21 Imprint Purity Diagnostic</strong> — predicts γ on RANDOM tokens via ν=−1/(2π); how clean is the model's RoPE prediction?</p>
118
+ <div class="help-example" data-i18n="help.recipe.x21.example">
119
+ Try: <em>"How clean is the RoPE prediction on Llama-3-8B?"</em><br>
120
+ Answer: predicted γ_random + purity diagnostic (CLEAN / OVER-IMPRINTED / UNDER-IMPRINTED).
121
+ </div>
122
+ <p data-i18n="help.v04.imprint" style="font-size: 0.9em; opacity: 0.85;"><strong>Learned-imprint slope ν = −1/(2π)</strong>: RoPE rotation period 2π drives a positional bias on weights, proportional to log(N_params). Even random tokens show this scaling. ν is DERIVED — not fitted (empirical err 0.3%).</p>
123
+
124
+ <p data-i18n="help.recipe.x22.title"><strong>X-22 Compute-Context Invariant</strong> — does γ × log(N²·D) lie in panel band 51.2 ± 16.8? Detects scaling/training anomalies.</p>
125
+ <div class="help-example" data-i18n="help.recipe.x22.example">
126
+ Try: <em>"Does Mistral-7B fit the compute-context invariant?"</em><br>
127
+ Answer: K = γ·log(N²·D), z-score, IN-BAND or OUTLIER.
128
+ </div>
129
+ <p data-i18n="help.v04.invariant" style="font-size: 0.9em; opacity: 0.85;"><strong>Chinchilla-attention invariant K</strong>: γ × log(N²·D) ≈ 51.2 ± 16.8 (CV=0.329). Connects compute scaling and attention exponent into a single dimensionless number.</p>
130
+
131
+ <p data-i18n="help.recipe.x23.title"><strong>X-23 IH-Phase Detector</strong> — pre- or post-induction-head? Cheap probe via sign(γ_text − γ_random).</p>
132
+ <div class="help-example" data-i18n="help.recipe.x23.example">
133
+ Try: <em>"Is Qwen2.5-7B post-induction-head?"</em><br>
134
+ Answer: CONFIRMED PRE-IH / CONFIRMED POST-IH / ANOMALY (with size-vs-Δγ consistency check).
135
+ </div>
136
+ <p data-i18n="help.v04.ih_probe" style="font-size: 0.9em; opacity: 0.85;"><strong>Δγ as IH probe</strong>: sign(γ_text − γ_random) > 0 ⟺ post-induction-head. Cheaper than running an in-context-learning benchmark.</p>
137
+
138
+ <p data-i18n="help.v04.constants" style="font-size: 0.9em; opacity: 0.85;"><strong>γ-cluster on famous constants</strong> (intriguing, n=4): CodeLlama-13b γ=0.382 ≈ 1−1/φ (golden conjugate, err 0.0003); pythia-1.4b γ=0.705 ≈ 1/√2; Llama-2-7b γ=0.287 ≈ 1−1/√2; Mistral-Nemo γ=0.428 ≈ log_10(e). Caveat: could be coincidence.</p>
139
+
140
  <h3 data-i18n="help.add_models.title">Adding new models (3 ways)</h3>
141
  <ul>
142
  <li data-i18n="help.add_models.preset"><strong>Preset list</strong>: 11 popular models curated. Just select from dropdown.</li>
js/i18n.js CHANGED
@@ -170,7 +170,7 @@ export const TRANSLATIONS = {
170
  "help.modes.compare": "<strong>🆚 Compare</strong>: 2-3 models side-by-side on same recipe. Best when choosing between candidates.",
171
  "help.modes.ask": "<strong>💬 Ask plain English</strong>: free-form question, in-browser LLM picks the recipe. Best for casual exploration.",
172
  "help.modes.recipe": "<strong>📋 Recipe + form</strong>: manual selection, full parameter control. Best when you want exact control.",
173
- "help.recipes.title": "The 5 recipes available",
174
  "help.recipe.x1.title": "<strong>X-1 Custom training vs API</strong> — compares cost of training your own model vs paying for API access.",
175
  "help.recipe.x1.example": "Try: <em>\"Should I train an 8B custom model or use GPT-4o for 50M tokens/month?\"</em><br>Answer types: YES (custom) / NO (API) with break-even months.",
176
  "help.recipe.x2.title": "<strong>X-2 Long Context Viability</strong> — predicts if a model serves a target context length reliably.",
@@ -180,7 +180,18 @@ export const TRANSLATIONS = {
180
  "help.recipe.x5.title": "<strong>X-5 Hardware selection</strong> — which GPU should I use to serve at target throughput?",
181
  "help.recipe.x5.example": "Try: <em>\"Cheapest hardware to serve Llama-3-8B at 10M tokens/day\"</em><br>Answer: best GPU + $/Mtok + capacity vs target.",
182
  "help.recipe.x19.title": "<strong>X-19 KV Compression decision</strong> — should I use soft decay, hard cutoff, or literature methods?",
 
 
 
183
  "help.recipe.x19.example": "Try: <em>\"How to compress KV cache for Qwen2.5-7B at 32K?\"</em><br>Answer: USE SOFT DECAY / USE D_f CUTOFF / USE LITERATURE METHODS / USE HARD T_train.",
 
 
 
 
 
 
 
 
184
  "help.param.theta": "<strong>θ (rope_theta)</strong>: RoPE base frequency. Higher = more long-range capacity. Typical: 10000 (early), 500000 (Llama-3), 1000000 (Qwen2.5).",
185
  "help.param.T_train": "<strong>T_train</strong>: max context the model was trained on. From <code>max_position_embeddings</code>.",
186
  "help.param.T_eval": "<strong>T_eval</strong>: <em>your target</em> inference context length. The key knob.",
@@ -368,7 +379,7 @@ export const TRANSLATIONS = {
368
  "help.modes.compare": "<strong>🆚 Comparar</strong>: 2-3 modelos lado a lado en la misma receta. Mejor al elegir entre candidatos.",
369
  "help.modes.ask": "<strong>💬 Pregunta libre</strong>: pregunta en lenguaje natural, el LLM del navegador elige la receta. Mejor para exploración casual.",
370
  "help.modes.recipe": "<strong>📋 Receta + formulario</strong>: selección manual, control total de parámetros. Mejor cuando quieres control exacto.",
371
- "help.recipes.title": "Las 5 recetas disponibles",
372
  "help.recipe.x1.title": "<strong>X-1 Entrenamiento custom vs API</strong> — compara coste de entrenar tu propio modelo vs pagar API.",
373
  "help.recipe.x1.example": "Prueba: <em>\"¿Entrenar 8B custom o usar GPT-4o para 50M tokens/mes?\"</em><br>Respuestas: SÍ (custom) / NO (API) con meses para break-even.",
374
  "help.recipe.x2.title": "<strong>X-2 Viabilidad contexto largo</strong> — predice si un modelo sirve longitud objetivo de manera fiable.",
@@ -378,6 +389,17 @@ export const TRANSLATIONS = {
378
  "help.recipe.x5.title": "<strong>X-5 Selección hardware</strong> — ¿qué GPU usar para servir al throughput objetivo?",
379
  "help.recipe.x5.example": "Prueba: <em>\"Hardware más barato para servir Llama-3-8B a 10M tokens/día\"</em><br>Respuesta: mejor GPU + $/Mtok + capacidad vs objetivo.",
380
  "help.recipe.x19.title": "<strong>X-19 Decisión compresión KV</strong> — ¿usar soft decay, hard cutoff, o métodos de literatura?",
 
 
 
 
 
 
 
 
 
 
 
381
  "help.recipe.x19.example": "Prueba: <em>\"¿Cómo comprimir caché KV para Qwen2.5-7B a 32K?\"</em><br>Respuesta: USE SOFT DECAY / USE D_f CUTOFF / USE LITERATURE METHODS / USE HARD T_train.",
382
  "help.param.theta": "<strong>θ (rope_theta)</strong>: frecuencia base RoPE. Mayor = más capacidad de largo alcance. Típico: 10000 (modelos antiguos), 500000 (Llama-3), 1000000 (Qwen2.5).",
383
  "help.param.T_train": "<strong>T_train</strong>: contexto máximo que vio el modelo durante entrenamiento. De <code>max_position_embeddings</code>.",
@@ -565,7 +587,7 @@ export const TRANSLATIONS = {
565
  "help.modes.compare": "<strong>🆚 Comparer</strong>: 2-3 modèles côte à côte sur la même recette. Mieux pour choisir entre candidats.",
566
  "help.modes.ask": "<strong>💬 Question libre</strong>: question en langage naturel, le LLM du navigateur choisit la recette. Mieux pour exploration casuelle.",
567
  "help.modes.recipe": "<strong>📋 Recette + formulaire</strong>: sélection manuelle, contrôle total des paramètres. Mieux quand vous voulez un contrôle exact.",
568
- "help.recipes.title": "Les 5 recettes disponibles",
569
  "help.recipe.x1.title": "<strong>X-1 Entraînement custom vs API</strong> — compare le coût d'entraîner votre propre modèle vs payer l'accès API.",
570
  "help.recipe.x1.example": "Essayez: <em>« Dois-je entraîner un 8B custom ou utiliser GPT-4o pour 50M tokens/mois ? »</em><br>Réponses: OUI (custom) / NON (API) avec mois pour break-even.",
571
  "help.recipe.x2.title": "<strong>X-2 Viabilité contexte long</strong> — prédit si un modèle sert une longueur cible de manière fiable.",
@@ -575,7 +597,18 @@ export const TRANSLATIONS = {
575
  "help.recipe.x5.title": "<strong>X-5 Sélection hardware</strong> — quel GPU utiliser pour servir au throughput cible ?",
576
  "help.recipe.x5.example": "Essayez: <em>« Hardware le moins cher pour servir Llama-3-8B à 10M tokens/jour »</em><br>Réponse: meilleur GPU + $/Mtok + capacité vs cible.",
577
  "help.recipe.x19.title": "<strong>X-19 Décision compression KV</strong> — utiliser soft decay, hard cutoff, ou méthodes de littérature ?",
 
 
 
578
  "help.recipe.x19.example": "Essayez: <em>« Comment compresser le cache KV pour Qwen2.5-7B à 32K ? »</em><br>Réponse: USE SOFT DECAY / USE D_f CUTOFF / USE LITERATURE METHODS / USE HARD T_train.",
 
 
 
 
 
 
 
 
579
  "help.param.theta": "<strong>θ (rope_theta)</strong>: fréquence de base RoPE. Plus haut = plus de capacité longue portée. Typique: 10000 (anciens), 500000 (Llama-3), 1000000 (Qwen2.5).",
580
  "help.param.T_train": "<strong>T_train</strong>: contexte max vu par le modèle pendant l'entraînement. De <code>max_position_embeddings</code>.",
581
  "help.param.T_eval": "<strong>T_eval</strong>: <em>votre</em> longueur de contexte cible en inférence. Le bouton clé.",
@@ -762,7 +795,7 @@ export const TRANSLATIONS = {
762
  "help.modes.compare": "<strong>🆚 比较</strong>: 2-3 个模型在同一配方上并排。最适合在候选者之间选择。",
763
  "help.modes.ask": "<strong>💬 自由提问</strong>: 自然语言问题,浏览器 LLM 选择配方。最适合随意探索。",
764
  "help.modes.recipe": "<strong>📋 配方 + 表单</strong>: 手动选择,完全控制参数。最适合需要精确控制时。",
765
- "help.recipes.title": "可用的 5 个配方",
766
  "help.recipe.x1.title": "<strong>X-1 自定义训练 vs API</strong> — 比较训练自己模型的成本与付费使用 API 的成本。",
767
  "help.recipe.x1.example": "尝试: <em>\"我应该训练 8B 自定义模型还是使用 GPT-4o 处理每月 50M tokens?\"</em><br>答案: 是 (自定义) / 否 (API),含损益平衡月数。",
768
  "help.recipe.x2.title": "<strong>X-2 长上下文可行性</strong> — 预测模型是否能可靠地服务目标上下文长度。",
@@ -772,7 +805,18 @@ export const TRANSLATIONS = {
772
  "help.recipe.x5.title": "<strong>X-5 硬件选择</strong> — 应该使用哪个 GPU 以达到目标吞吐量?",
773
  "help.recipe.x5.example": "尝试: <em>\"以每天 1000 万 tokens 提供 Llama-3-8B 的最便宜硬件\"</em><br>答案: 最佳 GPU + $/Mtok + 容量 vs 目标。",
774
  "help.recipe.x19.title": "<strong>X-19 KV 压缩决策</strong> — 应该使用 soft decay、hard cutoff 还是文献方法?",
 
 
 
775
  "help.recipe.x19.example": "尝试: <em>\"如何为 Qwen2.5-7B 在 32K 压缩 KV 缓存?\"</em><br>答案: USE SOFT DECAY / USE D_f CUTOFF / USE LITERATURE METHODS / USE HARD T_train.",
 
 
 
 
 
 
 
 
776
  "help.param.theta": "<strong>θ (rope_theta)</strong>: RoPE 基础频率。越高 = 长程能力越强。典型: 10000 (早期),500000 (Llama-3),1000000 (Qwen2.5)。",
777
  "help.param.T_train": "<strong>T_train</strong>: 模型训练时的最大上下文。来自 <code>max_position_embeddings</code>。",
778
  "help.param.T_eval": "<strong>T_eval</strong>: <em>您的</em> 目标推理上下文长度。关键旋钮。",
 
170
  "help.modes.compare": "<strong>🆚 Compare</strong>: 2-3 models side-by-side on same recipe. Best when choosing between candidates.",
171
  "help.modes.ask": "<strong>💬 Ask plain English</strong>: free-form question, in-browser LLM picks the recipe. Best for casual exploration.",
172
  "help.modes.recipe": "<strong>📋 Recipe + form</strong>: manual selection, full parameter control. Best when you want exact control.",
173
+ "help.recipes.title": "The 8 recipes available",
174
  "help.recipe.x1.title": "<strong>X-1 Custom training vs API</strong> — compares cost of training your own model vs paying for API access.",
175
  "help.recipe.x1.example": "Try: <em>\"Should I train an 8B custom model or use GPT-4o for 50M tokens/month?\"</em><br>Answer types: YES (custom) / NO (API) with break-even months.",
176
  "help.recipe.x2.title": "<strong>X-2 Long Context Viability</strong> — predicts if a model serves a target context length reliably.",
 
180
  "help.recipe.x5.title": "<strong>X-5 Hardware selection</strong> — which GPU should I use to serve at target throughput?",
181
  "help.recipe.x5.example": "Try: <em>\"Cheapest hardware to serve Llama-3-8B at 10M tokens/day\"</em><br>Answer: best GPU + $/Mtok + capacity vs target.",
182
  "help.recipe.x19.title": "<strong>X-19 KV Compression decision</strong> — should I use soft decay, hard cutoff, or literature methods?",
183
+ "help.recipe.x21.title": "<strong>X-21 Imprint Purity Diagnostic</strong> — predicts γ on RANDOM tokens via ν=−1/(2π); how clean is the model's RoPE prediction?",
184
+ "help.recipe.x22.title": "<strong>X-22 Compute-Context Invariant</strong> — does γ × log(N²·D) lie in panel band 51.2 ± 16.8? Detects scaling/training anomalies.",
185
+ "help.recipe.x23.title": "<strong>X-23 IH-Phase Detector</strong> — pre- or post-induction-head? Cheap probe via sign(γ_text − γ_random).",
186
  "help.recipe.x19.example": "Try: <em>\"How to compress KV cache for Qwen2.5-7B at 32K?\"</em><br>Answer: USE SOFT DECAY / USE D_f CUTOFF / USE LITERATURE METHODS / USE HARD T_train.",
187
+ "help.recipe.x21.example": "Try: <em>\"How clean is the RoPE prediction on Llama-3-8B?\"</em><br>Answer: predicted γ_random + purity diagnostic (CLEAN / OVER-IMPRINTED / UNDER-IMPRINTED).",
188
+ "help.recipe.x22.example": "Try: <em>\"Does Mistral-7B fit the compute-context invariant?\"</em><br>Answer: K = γ·log(N²·D), z-score, IN-BAND or OUTLIER.",
189
+ "help.recipe.x23.example": "Try: <em>\"Is Qwen2.5-7B post-induction-head?\"</em><br>Answer: CONFIRMED PRE-IH / CONFIRMED POST-IH / ANOMALY (with size-vs-Δγ consistency check).",
190
+ "help.section.v04": "<strong>What's new in v0.4</strong> (sesión 29 findings 2026-04-28): three diagnostic recipes derived from cross-model panel analysis (n=22 LLMs).",
191
+ "help.v04.imprint": "<strong>Learned-imprint slope ν = −1/(2π)</strong>: RoPE rotation period 2π drives a positional bias on weights, proportional to log(N_params). Even random tokens show this scaling. ν is DERIVED — not fitted (empirical err 0.3%).",
192
+ "help.v04.invariant": "<strong>Chinchilla-attention invariant K</strong>: γ × log(N²·D) ≈ 51.2 ± 16.8 (CV=0.329). Connects compute scaling and attention exponent into a single dimensionless number.",
193
+ "help.v04.ih_probe": "<strong>Δγ as IH probe</strong>: sign(γ_text − γ_random) > 0 ⟺ post-induction-head. Cheaper than running an in-context-learning benchmark.",
194
+ "help.v04.constants": "<strong>γ-cluster on famous constants</strong> (intriguing, n=4): CodeLlama-13b γ=0.382 ≈ 1−1/φ (golden conjugate, err 0.0003); pythia-1.4b γ=0.705 ≈ 1/√2; Llama-2-7b γ=0.287 ≈ 1−1/√2; Mistral-Nemo γ=0.428 ≈ log_10(e). Caveat: could be coincidence.",
195
  "help.param.theta": "<strong>θ (rope_theta)</strong>: RoPE base frequency. Higher = more long-range capacity. Typical: 10000 (early), 500000 (Llama-3), 1000000 (Qwen2.5).",
196
  "help.param.T_train": "<strong>T_train</strong>: max context the model was trained on. From <code>max_position_embeddings</code>.",
197
  "help.param.T_eval": "<strong>T_eval</strong>: <em>your target</em> inference context length. The key knob.",
 
379
  "help.modes.compare": "<strong>🆚 Comparar</strong>: 2-3 modelos lado a lado en la misma receta. Mejor al elegir entre candidatos.",
380
  "help.modes.ask": "<strong>💬 Pregunta libre</strong>: pregunta en lenguaje natural, el LLM del navegador elige la receta. Mejor para exploración casual.",
381
  "help.modes.recipe": "<strong>📋 Receta + formulario</strong>: selección manual, control total de parámetros. Mejor cuando quieres control exacto.",
382
+ "help.recipes.title": "Las 8 recetas disponibles",
383
  "help.recipe.x1.title": "<strong>X-1 Entrenamiento custom vs API</strong> — compara coste de entrenar tu propio modelo vs pagar API.",
384
  "help.recipe.x1.example": "Prueba: <em>\"¿Entrenar 8B custom o usar GPT-4o para 50M tokens/mes?\"</em><br>Respuestas: SÍ (custom) / NO (API) con meses para break-even.",
385
  "help.recipe.x2.title": "<strong>X-2 Viabilidad contexto largo</strong> — predice si un modelo sirve longitud objetivo de manera fiable.",
 
389
  "help.recipe.x5.title": "<strong>X-5 Selección hardware</strong> — ¿qué GPU usar para servir al throughput objetivo?",
390
  "help.recipe.x5.example": "Prueba: <em>\"Hardware más barato para servir Llama-3-8B a 10M tokens/día\"</em><br>Respuesta: mejor GPU + $/Mtok + capacidad vs objetivo.",
391
  "help.recipe.x19.title": "<strong>X-19 Decisión compresión KV</strong> — ¿usar soft decay, hard cutoff, o métodos de literatura?",
392
+ "help.recipe.x21.title": "<strong>X-21 Diagnóstico Pureza Imprint</strong> — predice γ sobre tokens RANDOM via ν=−1/(2π); ¿cuán limpia es la predicción RoPE del modelo?",
393
+ "help.recipe.x22.title": "<strong>X-22 Invariante Compute-Context</strong> — ¿γ × log(N²·D) está en banda 51.2 ± 16.8? Detecta anomalías de scaling/training.",
394
+ "help.recipe.x23.title": "<strong>X-23 Detector Fase IH</strong> — ¿pre- o post-induction-head? Probe barato via sign(γ_text − γ_random).",
395
+ "help.recipe.x21.example": "Prueba: <em>«¿Cuán limpia es la predicción RoPE en Llama-3-8B?»</em><br>Respuesta: γ_random predicho + diagnóstico (CLEAN / OVER-IMPRINTED / UNDER-IMPRINTED).",
396
+ "help.recipe.x22.example": "Prueba: <em>«¿Mistral-7B entra en el invariante compute-context?»</em><br>Respuesta: K = γ·log(N²·D), z-score, IN-BAND u OUTLIER.",
397
+ "help.recipe.x23.example": "Prueba: <em>«¿Qwen2.5-7B es post-induction-head?»</em><br>Respuesta: CONFIRMED PRE-IH / CONFIRMED POST-IH / ANOMALY (chequeo consistencia tamaño vs Δγ).",
398
+ "help.section.v04": "<strong>Novedades v0.4</strong> (hallazgos sesión 29 del 2026-04-28): tres recipes diagnósticas derivadas del análisis panel cross-model (n=22 LLMs).",
399
+ "help.v04.imprint": "<strong>Slope imprint aprendido ν = −1/(2π)</strong>: el periodo de rotación RoPE 2π provoca un sesgo posicional en los pesos, proporcional a log(N_params). Incluso tokens random muestran este scaling. ν es DERIVADO — no ajustado (err empírico 0.3%).",
400
+ "help.v04.invariant": "<strong>Invariante Chinchilla-atención K</strong>: γ × log(N²·D) ≈ 51.2 ± 16.8 (CV=0.329). Conecta compute scaling y exponente de atención en un solo número adimensional.",
401
+ "help.v04.ih_probe": "<strong>Δγ como probe IH</strong>: sign(γ_text − γ_random) > 0 ⟺ post-induction-head. Más barato que correr un benchmark in-context-learning.",
402
+ "help.v04.constants": "<strong>γ-cluster en constantes famosas</strong> (intrigante, n=4): CodeLlama-13b γ=0.382 ≈ 1−1/φ (conjugado áureo, err 0.0003); pythia-1.4b γ=0.705 ≈ 1/√2; Llama-2-7b γ=0.287 ≈ 1−1/√2; Mistral-Nemo γ=0.428 ≈ log_10(e). Caveat: podría ser coincidencia.",
403
  "help.recipe.x19.example": "Prueba: <em>\"¿Cómo comprimir caché KV para Qwen2.5-7B a 32K?\"</em><br>Respuesta: USE SOFT DECAY / USE D_f CUTOFF / USE LITERATURE METHODS / USE HARD T_train.",
404
  "help.param.theta": "<strong>θ (rope_theta)</strong>: frecuencia base RoPE. Mayor = más capacidad de largo alcance. Típico: 10000 (modelos antiguos), 500000 (Llama-3), 1000000 (Qwen2.5).",
405
  "help.param.T_train": "<strong>T_train</strong>: contexto máximo que vio el modelo durante entrenamiento. De <code>max_position_embeddings</code>.",
 
587
  "help.modes.compare": "<strong>🆚 Comparer</strong>: 2-3 modèles côte à côte sur la même recette. Mieux pour choisir entre candidats.",
588
  "help.modes.ask": "<strong>💬 Question libre</strong>: question en langage naturel, le LLM du navigateur choisit la recette. Mieux pour exploration casuelle.",
589
  "help.modes.recipe": "<strong>📋 Recette + formulaire</strong>: sélection manuelle, contrôle total des paramètres. Mieux quand vous voulez un contrôle exact.",
590
+ "help.recipes.title": "Les 8 recettes disponibles",
591
  "help.recipe.x1.title": "<strong>X-1 Entraînement custom vs API</strong> — compare le coût d'entraîner votre propre modèle vs payer l'accès API.",
592
  "help.recipe.x1.example": "Essayez: <em>« Dois-je entraîner un 8B custom ou utiliser GPT-4o pour 50M tokens/mois ? »</em><br>Réponses: OUI (custom) / NON (API) avec mois pour break-even.",
593
  "help.recipe.x2.title": "<strong>X-2 Viabilité contexte long</strong> — prédit si un modèle sert une longueur cible de manière fiable.",
 
597
  "help.recipe.x5.title": "<strong>X-5 Sélection hardware</strong> — quel GPU utiliser pour servir au throughput cible ?",
598
  "help.recipe.x5.example": "Essayez: <em>« Hardware le moins cher pour servir Llama-3-8B à 10M tokens/jour »</em><br>Réponse: meilleur GPU + $/Mtok + capacité vs cible.",
599
  "help.recipe.x19.title": "<strong>X-19 Décision compression KV</strong> — utiliser soft decay, hard cutoff, ou méthodes de littérature ?",
600
+ "help.recipe.x21.title": "<strong>X-21 Diagnostic Pureté Imprint</strong> — prédit γ sur tokens RANDOM via ν=−1/(2π); à quel point la prédiction RoPE du modèle est-elle propre ?",
601
+ "help.recipe.x22.title": "<strong>X-22 Invariant Compute-Context</strong> — γ × log(N²·D) est-il dans la bande 51.2 ± 16.8 ? Détecte anomalies de scaling/training.",
602
+ "help.recipe.x23.title": "<strong>X-23 Détecteur Phase IH</strong> — pré- ou post-induction-head ? Probe peu coûteux via sign(γ_text − γ_random).",
603
  "help.recipe.x19.example": "Essayez: <em>« Comment compresser le cache KV pour Qwen2.5-7B à 32K ? »</em><br>Réponse: USE SOFT DECAY / USE D_f CUTOFF / USE LITERATURE METHODS / USE HARD T_train.",
604
+ "help.recipe.x21.example": "Essayez: <em>« Quelle est la pureté de la prédiction RoPE sur Llama-3-8B ? »</em><br>Réponse: γ_random prédit + diagnostic (CLEAN / OVER-IMPRINTED / UNDER-IMPRINTED).",
605
+ "help.recipe.x22.example": "Essayez: <em>« Mistral-7B entre-t-il dans l'invariant compute-context ? »</em><br>Réponse: K = γ·log(N²·D), z-score, IN-BAND ou OUTLIER.",
606
+ "help.recipe.x23.example": "Essayez: <em>« Qwen2.5-7B est-il post-induction-head ? »</em><br>Réponse: CONFIRMED PRE-IH / CONFIRMED POST-IH / ANOMALY.",
607
+ "help.section.v04": "<strong>Nouveautés v0.4</strong> (résultats session 29, 2026-04-28) : trois recettes de diagnostic dérivées de l'analyse panel cross-model (n=22 LLMs).",
608
+ "help.v04.imprint": "<strong>Pente d'imprint apprise ν = −1/(2π)</strong> : la période de rotation RoPE 2π entraîne un biais positionnel dans les poids, proportionnel à log(N_params). Même les tokens aléatoires montrent ce scaling. ν est DÉRIVÉ — non ajusté (erreur empirique 0,3 %).",
609
+ "help.v04.invariant": "<strong>Invariant Chinchilla-attention K</strong> : γ × log(N²·D) ≈ 51.2 ± 16.8 (CV=0.329). Connecte le scaling de compute et l'exposant d'attention en un seul nombre sans dimension.",
610
+ "help.v04.ih_probe": "<strong>Δγ comme probe IH</strong> : sign(γ_text − γ_random) > 0 ⟺ post-induction-head. Moins coûteux que de lancer un benchmark in-context-learning.",
611
+ "help.v04.constants": "<strong>γ-cluster sur constantes célèbres</strong> (intriguant, n=4) : CodeLlama-13b γ=0.382 ≈ 1−1/φ (conjugué doré, err 0,0003) ; pythia-1.4b γ=0.705 ≈ 1/√2 ; Llama-2-7b γ=0.287 ≈ 1−1/√2 ; Mistral-Nemo γ=0.428 ≈ log_10(e). Caveat : peut être coïncidence.",
612
  "help.param.theta": "<strong>θ (rope_theta)</strong>: fréquence de base RoPE. Plus haut = plus de capacité longue portée. Typique: 10000 (anciens), 500000 (Llama-3), 1000000 (Qwen2.5).",
613
  "help.param.T_train": "<strong>T_train</strong>: contexte max vu par le modèle pendant l'entraînement. De <code>max_position_embeddings</code>.",
614
  "help.param.T_eval": "<strong>T_eval</strong>: <em>votre</em> longueur de contexte cible en inférence. Le bouton clé.",
 
795
  "help.modes.compare": "<strong>🆚 比较</strong>: 2-3 个模型在同一配方上并排。最适合在候选者之间选择。",
796
  "help.modes.ask": "<strong>💬 自由提问</strong>: 自然语言问题,浏览器 LLM 选择配方。最适合随意探索。",
797
  "help.modes.recipe": "<strong>📋 配方 + 表单</strong>: 手动选择,完全控制参数。最适合需要精确控制时。",
798
+ "help.recipes.title": "可用的 8 个配方",
799
  "help.recipe.x1.title": "<strong>X-1 自定义训练 vs API</strong> — 比较训练自己模型的成本与付费使用 API 的成本。",
800
  "help.recipe.x1.example": "尝试: <em>\"我应该训练 8B 自定义模型还是使用 GPT-4o 处理每月 50M tokens?\"</em><br>答案: 是 (自定义) / 否 (API),含损益平衡月数。",
801
  "help.recipe.x2.title": "<strong>X-2 长上下文可行性</strong> — 预测模型是否能可靠地服务目标上下文长度。",
 
805
  "help.recipe.x5.title": "<strong>X-5 硬件选择</strong> — 应该使用哪个 GPU 以达到目标吞吐量?",
806
  "help.recipe.x5.example": "尝试: <em>\"以每天 1000 万 tokens 提供 Llama-3-8B 的最便宜硬件\"</em><br>答案: 最佳 GPU + $/Mtok + 容量 vs 目标。",
807
  "help.recipe.x19.title": "<strong>X-19 KV 压缩决策</strong> — 应该使用 soft decay、hard cutoff 还是文献方法?",
808
+ "help.recipe.x21.title": "<strong>X-21 Imprint 纯度诊断</strong> — 通过 ν=−1/(2π) 预测 RANDOM token 上的 γ;模型的 RoPE 预测有多干净?",
809
+ "help.recipe.x22.title": "<strong>X-22 Compute-Context 不变量</strong> — γ × log(N²·D) 是否落在 51.2 ± 16.8 区间内?检测 scaling/training 异常。",
810
+ "help.recipe.x23.title": "<strong>X-23 IH-Phase 检测器</strong> — 前- 还是后-induction-head?通过 sign(γ_text − γ_random) 进行廉价探测。",
811
  "help.recipe.x19.example": "尝试: <em>\"如何为 Qwen2.5-7B 在 32K 压缩 KV 缓存?\"</em><br>答案: USE SOFT DECAY / USE D_f CUTOFF / USE LITERATURE METHODS / USE HARD T_train.",
812
+ "help.recipe.x21.example": "尝试: <em>\"Llama-3-8B 上的 RoPE 预测有多干净?\"</em><br>答案: 预测的 γ_random + 诊断 (CLEAN / OVER-IMPRINTED / UNDER-IMPRINTED)。",
813
+ "help.recipe.x22.example": "尝试: <em>\"Mistral-7B 是否符合 compute-context 不变量?\"</em><br>答案: K = γ·log(N²·D)、z-score、IN-BAND 或 OUTLIER。",
814
+ "help.recipe.x23.example": "尝试: <em>\"Qwen2.5-7B 是后-induction-head 吗?\"</em><br>答案: CONFIRMED PRE-IH / CONFIRMED POST-IH / ANOMALY。",
815
+ "help.section.v04": "<strong>v0.4 新增</strong> (第 29 次研究会话, 2026-04-28): 来自 cross-model panel 分析 (n=22 LLMs) 的三个诊断 recipes。",
816
+ "help.v04.imprint": "<strong>学习印记斜率 ν = −1/(2π)</strong>: RoPE 旋转周期 2π 在权重上引发位置偏置, 与 log(N_params) 成正比。即使 random token 也显示此 scaling。ν 是 DERIVED — 非拟合 (经验误差 0.3%)。",
817
+ "help.v04.invariant": "<strong>Chinchilla-attention 不变量 K</strong>: γ × log(N²·D) ≈ 51.2 ± 16.8 (CV=0.329)。将 compute scaling 和 attention 指数连接为单一无量纲数。",
818
+ "help.v04.ih_probe": "<strong>Δγ 作为 IH 探测</strong>: sign(γ_text − γ_random) > 0 ⟺ post-induction-head。比运行 in-context-learning 基准更便宜。",
819
+ "help.v04.constants": "<strong>γ 簇落在著名常数上</strong> (有趣, n=4): CodeLlama-13b γ=0.382 ≈ 1−1/φ (黄金共轭, err 0.0003); pythia-1.4b γ=0.705 ≈ 1/√2; Llama-2-7b γ=0.287 ≈ 1−1/√2; Mistral-Nemo γ=0.428 ≈ log_10(e)。Caveat: 可能是巧合。",
820
  "help.param.theta": "<strong>θ (rope_theta)</strong>: RoPE 基础频率。越高 = 长程能力越强。典型: 10000 (早期),500000 (Llama-3),1000000 (Qwen2.5)。",
821
  "help.param.T_train": "<strong>T_train</strong>: 模型训练时的最大上下文。来自 <code>max_position_embeddings</code>。",
822
  "help.param.T_eval": "<strong>T_eval</strong>: <em>您的</em> 目标推理上下文长度。关键旋钮。",
python/taf_browser.py CHANGED
@@ -99,6 +99,170 @@ def kv_soft_decay_regime(theta: float, gamma: float, T_train: int) -> str:
99
  return "use-hard-cutoff"
100
 
101
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
102
  # ════════════════════════════════════════════════════════════════════════════
103
  # §17 — Pre-training viability formulas
104
  # ════════════════════════════════════════════════════════════════════════════
@@ -584,6 +748,172 @@ def run_recipe_x19(theta, T_train, T_eval, n_attention_heads, n_kv_heads,
584
  return _wrap("X-19", "KV compression decision", locals(), chain, verdict, reason, mit)
585
 
586
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
587
  # ════════════════════════════════════════════════════════════════════════════
588
  # Helpers
589
  # ════════════════════════════════════════════════════════════════════════════
@@ -669,6 +999,31 @@ RECIPES = {
669
  "category": "kv-compression",
670
  "uses_sections": ["§26", "§19"],
671
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
672
  }
673
 
674
 
 
99
  return "use-hard-cutoff"
100
 
101
 
102
+ # ════════════════════════════════════════════════════════════════════════════
103
+ # §28 — Sesión 29 (2026-04-28): learned-imprint, F2 Chinchilla, Δγ-IH probe
104
+ # ════════════════════════════════════════════════════════════════════════════
105
+ NU_IMPRINT = -1.0 / (2 * math.pi) # §28 — learned-imprint slope (DERIVED, n=22 err 0.3%)
106
+ P_0_IMPRINT_M = 14.0 # baseline pythia-14m (smallest panel reference)
107
+
108
+
109
+ def gamma_random_predict(theta: float, T_eval: int, n_params_M: float) -> float:
110
+ """§28.1 — Predicted γ on RANDOM-token input.
111
+
112
+ γ_random = γ_pade(θ,T) + ν · log_10(P / P_0), ν = -1/(2π) ≈ -0.1592.
113
+ Empirical n=22 LLMs (sesión 29). Random-input γ scales with model size
114
+ despite RoPE-Padé predicting only (θ,T) dependence — weights imprint
115
+ a learned positional bias proportional to log(N_params).
116
+
117
+ Predicted CI ≈ ±0.18 (95%).
118
+ """
119
+ g_pade = gamma_pade(theta, T_eval)
120
+ return g_pade + NU_IMPRINT * math.log10(max(n_params_M, 1e-3) / P_0_IMPRINT_M)
121
+
122
+
123
+ def imprint_purity(gamma_random_obs: float, theta: float, T_eval: int,
124
+ n_params_M: float) -> dict:
125
+ """§28.2 — Diagnostic: how clean is the model's RoPE-Padé prediction?
126
+
127
+ Compares observed γ_random to predicted (γ_pade + ν·log_10(P/P_0)).
128
+ Negative residual ⇒ extra-strong training imprint (less clean).
129
+ Positive ⇒ weaker than expected imprint (cleaner / less trained).
130
+ """
131
+ g_pred = gamma_random_predict(theta, T_eval, n_params_M)
132
+ g_pade_only = gamma_pade(theta, T_eval)
133
+ residual = gamma_random_obs - g_pred
134
+ return {
135
+ "gamma_random_obs": gamma_random_obs,
136
+ "gamma_random_pred": g_pred,
137
+ "gamma_pade_only": g_pade_only,
138
+ "imprint_predicted": g_pred - g_pade_only,
139
+ "imprint_residual": residual,
140
+ "purity": "clean (within CI)" if abs(residual) < 0.18 else
141
+ ("over-imprinted" if residual < 0 else "under-imprinted"),
142
+ "ci_95_half_width": 0.18,
143
+ }
144
+
145
+
146
+ def compute_invariant_K(gamma: float, n_params_M: float,
147
+ D_tokens: float = None) -> dict:
148
+ """§29 — F2 Chinchilla compute-context invariant.
149
+
150
+ K = γ × log(N²·D), D = 20·N (Chinchilla compute-optimal) if not given.
151
+ Empirical: K ≈ 51.2 ± 16.8 (CV=0.329, n=22). In-distribution if K∈[34, 68].
152
+ """
153
+ N = n_params_M * 1e6
154
+ if D_tokens is None:
155
+ D_tokens = 20 * N
156
+ K = gamma * math.log(N * N * D_tokens)
157
+ panel_mean, panel_std = 51.2, 16.8
158
+ z = (K - panel_mean) / panel_std
159
+ return {
160
+ "K": K,
161
+ "panel_mean": panel_mean,
162
+ "panel_std": panel_std,
163
+ "z_score": z,
164
+ "in_distribution": abs(z) <= 1.0,
165
+ "interpretation": "in-band" if abs(z) <= 1.0 else
166
+ ("high-K outlier" if z > 0 else "low-K outlier"),
167
+ }
168
+
169
+
170
+ def ih_phase_check(gamma_text: float, gamma_random: float,
171
+ n_params_M: float = None) -> dict:
172
+ """§30 — IH-formation phase discriminator.
173
+
174
+ sign(γ_text − γ_random) > 0 ⟺ post-IH (text concentrates more than random).
175
+ Pre-IH (P<400M, n=7): ⟨Δγ⟩ = -0.19 ± 0.26
176
+ Post-IH (P≥400M, n=15): ⟨Δγ⟩ = +0.03 ± 0.26
177
+ """
178
+ delta = gamma_text - gamma_random
179
+ phase_observed = "post-IH" if delta > 0 else ("pre-IH" if delta < 0 else "ambiguous")
180
+ phase_expected = None
181
+ if n_params_M is not None:
182
+ phase_expected = "post-IH" if n_params_M * 1e6 >= 4e8 else "pre-IH"
183
+ consistent = (phase_expected is None) or (phase_observed == phase_expected)
184
+ return {
185
+ "delta_gamma": delta,
186
+ "phase_observed": phase_observed,
187
+ "phase_expected_by_size": phase_expected,
188
+ "consistent": consistent,
189
+ "panel_pre_IH_mean": -0.19,
190
+ "panel_post_IH_mean": +0.03,
191
+ "panel_std": 0.26,
192
+ }
193
+
194
+
195
+ def gamma_decompose_v2(gamma_pade_val: float, n_params_M: float,
196
+ has_GQA: bool = False, has_SWA: bool = False,
197
+ corpus: str = "text", is_instruct: bool = False) -> dict:
198
+ """§28.3 — 6-axis decomposition (sesión 29 update with imprint axis).
199
+
200
+ γ_obs = γ_pade
201
+ + ν·log_10(P/P_0)·𝟙[corpus=random] ← NEW imprint axis (DERIVED)
202
+ + Δ_corpus(text-rand)
203
+ + δ_arch(GQA, SWA)
204
+ + δ_circuit(IH phase)
205
+ + δ_train(steps, RLHF, instruct)
206
+ + ε
207
+ Imprint axis activates only on RANDOM input. TEXT input dominated by corpus.
208
+ """
209
+ delta_imprint = NU_IMPRINT * math.log10(max(n_params_M, 1e-3) / P_0_IMPRINT_M) \
210
+ if corpus == "random" else 0.0
211
+ delta_GQA = +0.11 if has_GQA else 0.0
212
+ delta_SWA = -0.21 if has_SWA else 0.0
213
+ delta_post_IH = -0.15 if n_params_M >= 400 else 0.0
214
+ delta_instruct = -0.10 if is_instruct else 0.0 # F9 tentative (n=3, p=0.06)
215
+ return {
216
+ "pade_centroid": gamma_pade_val,
217
+ "delta_imprint": delta_imprint,
218
+ "delta_GQA": delta_GQA,
219
+ "delta_SWA": delta_SWA,
220
+ "delta_post_IH": delta_post_IH,
221
+ "delta_instruct": delta_instruct,
222
+ "gamma_corrected": gamma_pade_val + delta_imprint + delta_GQA
223
+ + delta_SWA + delta_post_IH + delta_instruct,
224
+ "corpus": corpus,
225
+ "axes": ["pade", "imprint", "GQA", "SWA", "IH", "instruct"],
226
+ }
227
+
228
+
229
+ def famous_constant_proximity(gamma: float, tolerance: float = 0.01) -> dict:
230
+ """§31 — Detect proximity to famous constants in γ-cluster (sesión 29).
231
+
232
+ Empirical hits (n=4 in panel):
233
+ CodeLlama-13b γ=0.3823 ≈ 1−1/φ = 0.3820 (golden conjugate)
234
+ pythia-1.4b γ=0.7051 ≈ 1/√2 = 0.7071
235
+ Llama-2-7b γ=0.2871 ≈ 1−1/√2 = 0.2929
236
+ Mistral-Nemo γ=0.4284 ≈ log_10(e) = 0.4343
237
+ Returns nearest constant within tolerance, or None.
238
+ """
239
+ phi = (1 + math.sqrt(5)) / 2
240
+ constants = {
241
+ "1−1/φ (golden conjugate)": 1 - 1/phi,
242
+ "1/√2": 1 / math.sqrt(2),
243
+ "1−1/√2": 1 - 1/math.sqrt(2),
244
+ "log_10(e)": math.log10(math.e),
245
+ "1/π": 1 / math.pi,
246
+ "2/π": 2 / math.pi,
247
+ "1/φ": 1 / phi,
248
+ "ln(2)": math.log(2),
249
+ "z*_Cayley = (√17−3)/2": (math.sqrt(17) - 3) / 2,
250
+ }
251
+ hits = []
252
+ for name, val in constants.items():
253
+ err = abs(gamma - val)
254
+ if err <= tolerance:
255
+ hits.append({"constant": name, "value": val, "error": err})
256
+ hits.sort(key=lambda h: h["error"])
257
+ return {
258
+ "gamma": gamma,
259
+ "tolerance": tolerance,
260
+ "n_hits": len(hits),
261
+ "hits": hits[:3],
262
+ "caveat": "n=4 hits in panel; could be coincidence (continuous distribution)",
263
+ }
264
+
265
+
266
  # ════════════════════════════════════════════════════════════════════════════
267
  # §17 — Pre-training viability formulas
268
  # ════════════════════════════════════════════════════════════════════════════
 
748
  return _wrap("X-19", "KV compression decision", locals(), chain, verdict, reason, mit)
749
 
750
 
751
+ # ─────────────────────────────────────────────────────────────────────
752
+ # X-21 — Imprint Purity Diagnostic (sesión 29 — uses §28 ν=−1/(2π))
753
+ # ─────────────────────────────────────────────────────────────────────
754
+ def run_recipe_x21(theta, T_train, n_attention_heads, n_kv_heads,
755
+ d_head, n_layers, n_params, T_eval=None,
756
+ gamma_random_obs=None, **_unused):
757
+ """X-21: how clean is the model's RoPE-Padé prediction?
758
+
759
+ Predicts γ on RANDOM-token input via learned-imprint formula:
760
+ γ_random = γ_pade(θ,T) + ν·log_10(P/14M), ν = −1/(2π) ≈ −0.1592
761
+ If user provides observed γ_random, returns purity diagnostic.
762
+ """
763
+ chain = []
764
+ if T_eval is None:
765
+ T_eval = T_train
766
+
767
+ # Step 1: γ_Padé baseline
768
+ g_pade = gamma_pade(theta, T_eval)
769
+ chain.append(_step(1, "§26.1", "γ_Padé", "(2θ-T√2)/(2θ+T√2)",
770
+ {"theta": theta, "T_eval": T_eval}, g_pade,
771
+ _phase_label(g_pade)))
772
+
773
+ # Step 2: predicted imprint shift
774
+ n_params_M = n_params / 1e6
775
+ imprint_shift = NU_IMPRINT * math.log10(max(n_params_M, 1e-3) / P_0_IMPRINT_M)
776
+ chain.append(_step(2, "§28.1", "Imprint shift", "ν·log_10(P/P_0), ν=−1/(2π)",
777
+ {"P_M": n_params_M, "P_0_M": P_0_IMPRINT_M, "nu": NU_IMPRINT},
778
+ imprint_shift,
779
+ f"Bigger model → stronger imprint (more negative shift)."))
780
+
781
+ # Step 3: predicted γ_random
782
+ g_pred = g_pade + imprint_shift
783
+ chain.append(_step(3, "§28.1", "γ_random predicted", "γ_pade + ν·log_10(P/P_0)",
784
+ {"gamma_pade": g_pade, "imprint": imprint_shift}, g_pred,
785
+ f"Predicted γ_random = {g_pred:.4f} ± 0.18 (95% CI)"))
786
+
787
+ # Step 4: purity diagnostic if observed value provided
788
+ if gamma_random_obs is not None:
789
+ purity = imprint_purity(gamma_random_obs, theta, T_eval, n_params_M)
790
+ chain.append(_step(4, "§28.2", "Imprint purity",
791
+ "obs − pred (purity = within ±0.18)",
792
+ {"gamma_random_obs": gamma_random_obs,
793
+ "gamma_random_pred": g_pred},
794
+ purity["imprint_residual"], purity["purity"]))
795
+ verdict = "CLEAN" if abs(purity["imprint_residual"]) < 0.18 else \
796
+ ("OVER-IMPRINTED" if purity["imprint_residual"] < 0 else "UNDER-IMPRINTED")
797
+ reason = (f"Residual γ_random_obs − γ_pred = {purity['imprint_residual']:+.4f}. "
798
+ f"95% CI is ±0.18.")
799
+ mit = ("Models far from prediction may have anomalous training (e.g. heavy "
800
+ "fine-tuning, format conversion). Compare to native checkpoint.")
801
+ else:
802
+ verdict = "PREDICTION ONLY"
803
+ reason = (f"Predicted γ_random = {g_pred:.4f}. Provide gamma_random_obs to "
804
+ f"check purity (measure on RANDOM token sequences, e.g. via E4 protocol).")
805
+ mit = ("To measure: run a 150-prompt forward pass on RANDOM-token sequences "
806
+ "across distances d=10..1000 and fit power law. "
807
+ "(See https://github.com/karlesmarin/tafagent for E4 protocol.)")
808
+
809
+ return _wrap("X-21", "Imprint Purity Diagnostic", locals(), chain,
810
+ verdict, reason, mit)
811
+
812
+
813
+ # ─────────────────────────────────────────────────────────────────────
814
+ # X-22 — Compute-Context Invariant Check (sesión 29 — F2 Chinchilla)
815
+ # ─────────────────────────────────────────────────────────────────────
816
+ def run_recipe_x22(theta, T_train, n_params, gamma_obs, D_tokens=None,
817
+ T_eval=None, **_unused):
818
+ """X-22: does the model lie in the empirical Chinchilla invariant band?
819
+
820
+ K = γ × log(N²·D), D = 20·N if not given.
821
+ Empirical: K ≈ 51.2 ± 16.8 (CV=0.329, n=22 panel).
822
+ """
823
+ chain = []
824
+ if T_eval is None:
825
+ T_eval = T_train
826
+
827
+ n_params_M = n_params / 1e6
828
+ if D_tokens is None:
829
+ D_tokens = 20 * n_params # Chinchilla compute-optimal
830
+
831
+ # Step 1: K computation
832
+ inv = compute_invariant_K(gamma_obs, n_params_M, D_tokens)
833
+ chain.append(_step(1, "§29", "K = γ·log(N²·D)", "γ × ln(N²·D)",
834
+ {"gamma": gamma_obs, "N": n_params, "D": D_tokens},
835
+ inv["K"],
836
+ f"K = {inv['K']:.2f} (panel mean {inv['panel_mean']:.1f} ± "
837
+ f"{inv['panel_std']:.1f})"))
838
+
839
+ # Step 2: z-score interpretation
840
+ chain.append(_step(2, "§29", "z-score vs panel", "(K − μ)/σ",
841
+ {"K": inv["K"], "mean": inv["panel_mean"],
842
+ "std": inv["panel_std"]},
843
+ inv["z_score"],
844
+ inv["interpretation"]))
845
+
846
+ # Step 3: γ_pade comparison (anomaly test)
847
+ g_pade = gamma_pade(theta, T_eval)
848
+ pade_diff = gamma_obs - g_pade
849
+ chain.append(_step(3, "§26.1", "γ deviation from Padé", "γ_obs − γ_pade",
850
+ {"gamma_obs": gamma_obs, "gamma_pade": g_pade}, pade_diff,
851
+ "negative = anomaly (sub-Padé); positive = supra-Padé"))
852
+
853
+ if inv["in_distribution"]:
854
+ verdict = "IN-BAND"
855
+ reason = f"K = {inv['K']:.2f} within ±1σ of panel mean {inv['panel_mean']:.1f}."
856
+ mit = "Model conforms to compute-context invariant. No action needed."
857
+ else:
858
+ verdict = "OUTLIER"
859
+ reason = (f"K = {inv['K']:.2f} ({inv['interpretation']}). "
860
+ f"|z| = {abs(inv['z_score']):.2f} > 1.")
861
+ mit = ("High-K (over-concentrating attention for given compute) or low-K "
862
+ "(under-using compute for attention concentration). Check tokenizer, "
863
+ "training recipe, fine-tuning history.")
864
+
865
+ return _wrap("X-22", "Compute-Context Invariant", locals(), chain,
866
+ verdict, reason, mit)
867
+
868
+
869
+ # ─────────────────────────────────────────────────────────────────────
870
+ # X-23 — IH-Phase Detector (sesión 29 — F4 Δγ probe)
871
+ # ─────────────────────────────────────────────────────────────────────
872
+ def run_recipe_x23(n_params, gamma_text=None, gamma_random=None, **_unused):
873
+ """X-23: is this checkpoint pre- or post-induction-head formation?
874
+
875
+ Discriminator: sign(γ_text − γ_random) > 0 ⟺ post-IH.
876
+ Cheaper than ICL benchmark for monitoring training trajectories.
877
+ """
878
+ chain = []
879
+ n_params_M = n_params / 1e6
880
+
881
+ # Step 1: size-based prediction
882
+ expected = "post-IH" if n_params >= 4e8 else "pre-IH"
883
+ chain.append(_step(1, "§30", "Size-based phase prediction",
884
+ "P ≥ 400M ⇒ post-IH",
885
+ {"n_params_M": n_params_M, "threshold_M": 400}, expected))
886
+
887
+ # Step 2: γ-based discrimination if both gammas given
888
+ if gamma_text is not None and gamma_random is not None:
889
+ check = ih_phase_check(gamma_text, gamma_random, n_params_M)
890
+ chain.append(_step(2, "§30", "Δγ discriminator", "sign(γ_text − γ_random)",
891
+ {"gamma_text": gamma_text, "gamma_random": gamma_random},
892
+ check["delta_gamma"],
893
+ f"observed phase: {check['phase_observed']}"))
894
+
895
+ if check["consistent"]:
896
+ verdict = f"CONFIRMED {check['phase_observed'].upper()}"
897
+ reason = (f"Δγ = {check['delta_gamma']:+.3f} sign matches size-prediction "
898
+ f"({expected}).")
899
+ mit = "Phase confirmed. Use this checkpoint for downstream tasks accordingly."
900
+ else:
901
+ verdict = "ANOMALY"
902
+ reason = (f"Δγ = {check['delta_gamma']:+.3f} suggests {check['phase_observed']}, "
903
+ f"but size predicts {expected}. Investigate.")
904
+ mit = ("Possible causes: incomplete training, anomalous fine-tuning, "
905
+ "format conversion, tokenizer corruption (cf. F5 OLMo Δγ=0.30).")
906
+ else:
907
+ verdict = f"PREDICTED {expected.upper()}"
908
+ reason = (f"Only size given: P = {n_params_M:.0f}M. "
909
+ f"Provide gamma_text + gamma_random to verify via Δγ probe.")
910
+ mit = ("Run E4 protocol with corpus=mongo and corpus=random; "
911
+ "compare γ values.")
912
+
913
+ return _wrap("X-23", "IH-Phase Detector", locals(), chain,
914
+ verdict, reason, mit)
915
+
916
+
917
  # ════════════════════════════════════════════════════════════════════════════
918
  # Helpers
919
  # ════════════════════════════════════════════════════════════════════════════
 
999
  "category": "kv-compression",
1000
  "uses_sections": ["§26", "§19"],
1001
  },
1002
+ "X-21": {
1003
+ "name": "Imprint Purity Diagnostic",
1004
+ "description": "How clean is the model's RoPE-Padé prediction? Predicts γ on RANDOM-token input via ν=−1/(2π).",
1005
+ "fn": run_recipe_x21,
1006
+ "params": ["theta", "T_train", "n_attention_heads", "n_kv_heads",
1007
+ "d_head", "n_layers", "n_params", "T_eval", "gamma_random_obs"],
1008
+ "category": "diagnostic",
1009
+ "uses_sections": ["§26", "§28"],
1010
+ },
1011
+ "X-22": {
1012
+ "name": "Compute-Context Invariant",
1013
+ "description": "Does γ × log(N²·D) lie in the panel band 51.2 ± 16.8? Detects training/scaling anomalies.",
1014
+ "fn": run_recipe_x22,
1015
+ "params": ["theta", "T_train", "n_params", "gamma_obs", "D_tokens", "T_eval"],
1016
+ "category": "diagnostic",
1017
+ "uses_sections": ["§26", "§29"],
1018
+ },
1019
+ "X-23": {
1020
+ "name": "IH-Phase Detector",
1021
+ "description": "Is this model pre- or post-induction-head? Cheap probe via sign(γ_text − γ_random).",
1022
+ "fn": run_recipe_x23,
1023
+ "params": ["n_params", "gamma_text", "gamma_random"],
1024
+ "category": "diagnostic",
1025
+ "uses_sections": ["§30"],
1026
+ },
1027
  }
1028
 
1029