Spaces:
Running
feat(v0.4): add 3 diagnostic recipes from sesión 29 cross-model panel
Browse filesNew TAF formulas (sesión 29 findings, 2026-04-28, n=22 LLM panel):
- §28 ν = −1/(2π) learned-imprint slope (DERIVED + empirical err 0.3%)
- §29 K = γ × log(N²·D) Chinchilla-attention invariant (CV=0.329)
- §30 sign(γ_text − γ_random) IH-formation discriminator
- §31 γ-cluster on famous constants (CodeLlama=1−1/φ, etc — n=4 intriguing)
New Python functions (python/taf_browser.py):
- gamma_random_predict(theta, T_eval, n_params_M) — F1 imprint formula
- imprint_purity(...) — diagnostic with ±0.18 CI
- compute_invariant_K(...) — F2 with z-score vs panel
- ih_phase_check(...) — F4 Δγ probe
- gamma_decompose_v2(...) — 6-axis with imprint + instruct
- famous_constant_proximity(...) — golden-ratio detector
New recipes:
- X-21 Imprint Purity Diagnostic (predicts γ_random, classifies cleanliness)
- X-22 Compute-Context Invariant (K-band membership check)
- X-23 IH-Phase Detector (Δγ probe + size-consistency check)
UI updates:
- Help modal expanded with v0.4 section in 4 languages (EN/ES/FR/ZH)
- Recipe count updated 5 → 8
- New help.recipe.x{21,22,23} keys + help.section.v04 + help.v04.{imprint,invariant,ih_probe,constants}
README adds:
- Diagnostic recipes block (X-21/X-22/X-23) under "What it does"
- "What's new in v0.4" section with formulas and use cases
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- README.md +68 -9
- index.html +28 -1
- js/i18n.js +48 -4
- python/taf_browser.py +355 -0
|
@@ -33,7 +33,7 @@ language:
|
|
| 33 |
|
| 34 |
**🌐 Live**: https://karlesmarin.github.io/tafagent
|
| 35 |
**📦 Source**: https://github.com/karlesmarin/tafagent
|
| 36 |
-
**📄 Paper**: [
|
| 37 |
|
| 38 |
---
|
| 39 |
|
|
@@ -59,15 +59,21 @@ Drop in a model id (or paste any HuggingFace public model), get a
|
|
| 59 |
falsifiable answer to "**will this work?**" — backed by the
|
| 60 |
Thermodynamic Attention Framework (TAF) formulas:
|
| 61 |
|
|
|
|
| 62 |
- *Will Llama-3-8B serve 32K context with NIAH retrieval?* → **X-2**
|
| 63 |
- *Should I train a custom 7B model or pay for API access?* → **X-1**
|
| 64 |
- *I have $5,000 — what model can I afford to train?* → **X-3**
|
| 65 |
- *Cheapest GPU to serve Llama-70B at 100M tokens/day?* → **X-5**
|
| 66 |
- *Soft KV decay or hard cutoff for compression?* → **X-19**
|
| 67 |
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 71 |
|
| 72 |
## Four ways to use it
|
| 73 |
|
|
@@ -152,9 +158,61 @@ paper (343 JSON files, ~5.5 MB). See `data/README.md` for the layout.
|
|
| 152 |
- ~2 GB free RAM for the synthesis LLM
|
| 153 |
- ~350 MB disk for model cache (one-time)
|
| 154 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 155 |
## How you can help
|
| 156 |
|
| 157 |
-
This tool is at v0.
|
| 158 |
|
| 159 |
- **🐛 Report bugs**: https://github.com/karlesmarin/tafagent/issues
|
| 160 |
- **🌐 Translate**: add a language to `js/i18n.js`, send a PR
|
|
@@ -171,12 +229,13 @@ This tool is at v0.3. There's a long way to go.
|
|
| 171 |
If this tool helps you — paper or code:
|
| 172 |
|
| 173 |
```bibtex
|
| 174 |
-
@article{
|
| 175 |
author = {Marin, Carles},
|
| 176 |
-
title = {
|
| 177 |
-
|
|
|
|
| 178 |
year = {2026},
|
| 179 |
-
url = {https://
|
| 180 |
}
|
| 181 |
|
| 182 |
@misc{marin2026tafagent,
|
|
|
|
| 33 |
|
| 34 |
**🌐 Live**: https://karlesmarin.github.io/tafagent
|
| 35 |
**📦 Source**: https://github.com/karlesmarin/tafagent
|
| 36 |
+
**📄 Paper**: [Predicting How Transformers Atten — Marin 2026](https://zenodo.org/records/19826343)
|
| 37 |
|
| 38 |
---
|
| 39 |
|
|
|
|
| 59 |
falsifiable answer to "**will this work?**" — backed by the
|
| 60 |
Thermodynamic Attention Framework (TAF) formulas:
|
| 61 |
|
| 62 |
+
**Decision recipes**
|
| 63 |
- *Will Llama-3-8B serve 32K context with NIAH retrieval?* → **X-2**
|
| 64 |
- *Should I train a custom 7B model or pay for API access?* → **X-1**
|
| 65 |
- *I have $5,000 — what model can I afford to train?* → **X-3**
|
| 66 |
- *Cheapest GPU to serve Llama-70B at 100M tokens/day?* → **X-5**
|
| 67 |
- *Soft KV decay or hard cutoff for compression?* → **X-19**
|
| 68 |
|
| 69 |
+
**Diagnostic recipes** (NEW v0.4 — sesión 29 findings 2026-04-28)
|
| 70 |
+
- *How much positional bias did training imprint on this model?* → **X-21**
|
| 71 |
+
- *Does this model fit the empirical compute-context invariant band?* → **X-22**
|
| 72 |
+
- *Is this checkpoint pre- or post-induction-head?* → **X-23**
|
| 73 |
+
|
| 74 |
+
Each as a chain of TAF formulas (paper §17, §19, §20, §24, §26, §28-§30)
|
| 75 |
+
rendered with full audit trail. Every number is deterministic Python;
|
| 76 |
+
nothing is hallucinated.
|
| 77 |
|
| 78 |
## Four ways to use it
|
| 79 |
|
|
|
|
| 158 |
- ~2 GB free RAM for the synthesis LLM
|
| 159 |
- ~350 MB disk for model cache (one-time)
|
| 160 |
|
| 161 |
+
## What's new in v0.4 (2026-04-28)
|
| 162 |
+
|
| 163 |
+
Three new diagnostic recipes derived from cross-model panel analysis (n=22 LLMs):
|
| 164 |
+
|
| 165 |
+
### X-21 — Imprint Purity Diagnostic
|
| 166 |
+
Predicts γ on RANDOM-token input via the **learned-imprint formula**:
|
| 167 |
+
|
| 168 |
+
```
|
| 169 |
+
γ_random = γ_pade(θ, T) + ν · log_10(P / 14M)
|
| 170 |
+
ν = −1/(2π) ≈ −0.1592 (DERIVED from RoPE rotation period)
|
| 171 |
+
```
|
| 172 |
+
|
| 173 |
+
Even on random tokens, weights apply a learned positional bias proportional
|
| 174 |
+
to log(N_params). The slope ν is **fixed** (not fitted) — derivable from
|
| 175 |
+
RoPE's 2π rotation period. Empirical validation: n=22 LLMs, p=0.022, |err|=0.3%.
|
| 176 |
+
|
| 177 |
+
**Use case**: detect anomalous training, format conversion (e.g. OLMo native
|
| 178 |
+
vs HF Δγ=0.30), or fine-tuning drift by comparing predicted vs measured
|
| 179 |
+
γ_random.
|
| 180 |
+
|
| 181 |
+
### X-22 — Compute-Context Invariant
|
| 182 |
+
Computes the empirical Chinchilla×attention invariant:
|
| 183 |
+
|
| 184 |
+
```
|
| 185 |
+
K = γ × log(N² · D) where D = 20·N (Chinchilla compute-optimal)
|
| 186 |
+
Empirical band: K ∈ [34, 68] (51.2 ± 16.8, CV=0.329, n=22)
|
| 187 |
+
```
|
| 188 |
+
|
| 189 |
+
K-outliers indicate scaling/training anomalies. Llama-3-8B with γ=1.045
|
| 190 |
+
gives K=74.6 (z=1.39, high-K OUTLIER) — flags supra-Padé attention.
|
| 191 |
+
|
| 192 |
+
### X-23 — IH-Phase Detector
|
| 193 |
+
Uses the Δγ probe (cheaper than ICL benchmark):
|
| 194 |
+
|
| 195 |
+
```
|
| 196 |
+
sign(γ_text − γ_random) > 0 ⟺ post-induction-head formation
|
| 197 |
+
```
|
| 198 |
+
|
| 199 |
+
Pre-IH (P<400M, n=7): ⟨Δγ⟩=−0.19±0.26
|
| 200 |
+
Post-IH (P≥400M, n=15): ⟨Δγ⟩=+0.03±0.26
|
| 201 |
+
|
| 202 |
+
**Use case**: monitor training trajectories without running ICL benchmarks;
|
| 203 |
+
detect anomalous checkpoints.
|
| 204 |
+
|
| 205 |
+
### Other v0.4 additions
|
| 206 |
+
|
| 207 |
+
- `gamma_decompose_v2(...)` — 6-axis decomposition with the new imprint axis
|
| 208 |
+
- `famous_constant_proximity(...)` — detects γ-cluster on famous constants
|
| 209 |
+
(e.g. CodeLlama-13b γ=0.382 ≈ 1−1/φ golden conjugate)
|
| 210 |
+
|
| 211 |
+
---
|
| 212 |
+
|
| 213 |
## How you can help
|
| 214 |
|
| 215 |
+
This tool is at v0.4. There's a long way to go.
|
| 216 |
|
| 217 |
- **🐛 Report bugs**: https://github.com/karlesmarin/tafagent/issues
|
| 218 |
- **🌐 Translate**: add a language to `js/i18n.js`, send a PR
|
|
|
|
| 229 |
If this tool helps you — paper or code:
|
| 230 |
|
| 231 |
```bibtex
|
| 232 |
+
@article{marin2026Predicting How Transformers Atten,
|
| 233 |
author = {Marin, Carles},
|
| 234 |
+
title = {Predicting How Transformers Attend
|
| 235 |
+
Analytic Power-Law Theory, Phase Transitions, and Practical Compression
|
| 236 |
+
Tools},
|
| 237 |
year = {2026},
|
| 238 |
+
url = {https://zenodo.org/records/19826343},
|
| 239 |
}
|
| 240 |
|
| 241 |
@misc{marin2026tafagent,
|
|
@@ -77,7 +77,7 @@
|
|
| 77 |
<p data-i18n="help.modes.ask"><strong>💬 Ask plain English</strong>: free-form question, in-browser LLM picks the recipe. Best for casual exploration.</p>
|
| 78 |
<p data-i18n="help.modes.recipe"><strong>📋 Recipe + form</strong>: manual selection, full parameter control. Best when you want exact control.</p>
|
| 79 |
|
| 80 |
-
<h3 data-i18n="help.recipes.title">The
|
| 81 |
|
| 82 |
<p data-i18n="help.recipe.x1.title"><strong>X-1 Custom training vs API</strong> — compares cost of training your own model vs paying for API access.</p>
|
| 83 |
<div class="help-example" data-i18n="help.recipe.x1.example">
|
|
@@ -110,6 +110,33 @@
|
|
| 110 |
Answer: USE SOFT DECAY / USE D_f CUTOFF / USE LITERATURE METHODS / USE HARD T_train.
|
| 111 |
</div>
|
| 112 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 113 |
<h3 data-i18n="help.add_models.title">Adding new models (3 ways)</h3>
|
| 114 |
<ul>
|
| 115 |
<li data-i18n="help.add_models.preset"><strong>Preset list</strong>: 11 popular models curated. Just select from dropdown.</li>
|
|
|
|
| 77 |
<p data-i18n="help.modes.ask"><strong>💬 Ask plain English</strong>: free-form question, in-browser LLM picks the recipe. Best for casual exploration.</p>
|
| 78 |
<p data-i18n="help.modes.recipe"><strong>📋 Recipe + form</strong>: manual selection, full parameter control. Best when you want exact control.</p>
|
| 79 |
|
| 80 |
+
<h3 data-i18n="help.recipes.title">The 8 recipes available</h3>
|
| 81 |
|
| 82 |
<p data-i18n="help.recipe.x1.title"><strong>X-1 Custom training vs API</strong> — compares cost of training your own model vs paying for API access.</p>
|
| 83 |
<div class="help-example" data-i18n="help.recipe.x1.example">
|
|
|
|
| 110 |
Answer: USE SOFT DECAY / USE D_f CUTOFF / USE LITERATURE METHODS / USE HARD T_train.
|
| 111 |
</div>
|
| 112 |
|
| 113 |
+
<h3 style="margin-top: 1.5em;">— v0.4 (sesión 29 findings) —</h3>
|
| 114 |
+
|
| 115 |
+
<p data-i18n="help.section.v04"><strong>What's new in v0.4</strong> (sesión 29 findings 2026-04-28): three diagnostic recipes derived from cross-model panel analysis (n=22 LLMs).</p>
|
| 116 |
+
|
| 117 |
+
<p data-i18n="help.recipe.x21.title"><strong>X-21 Imprint Purity Diagnostic</strong> — predicts γ on RANDOM tokens via ν=−1/(2π); how clean is the model's RoPE prediction?</p>
|
| 118 |
+
<div class="help-example" data-i18n="help.recipe.x21.example">
|
| 119 |
+
Try: <em>"How clean is the RoPE prediction on Llama-3-8B?"</em><br>
|
| 120 |
+
Answer: predicted γ_random + purity diagnostic (CLEAN / OVER-IMPRINTED / UNDER-IMPRINTED).
|
| 121 |
+
</div>
|
| 122 |
+
<p data-i18n="help.v04.imprint" style="font-size: 0.9em; opacity: 0.85;"><strong>Learned-imprint slope ν = −1/(2π)</strong>: RoPE rotation period 2π drives a positional bias on weights, proportional to log(N_params). Even random tokens show this scaling. ν is DERIVED — not fitted (empirical err 0.3%).</p>
|
| 123 |
+
|
| 124 |
+
<p data-i18n="help.recipe.x22.title"><strong>X-22 Compute-Context Invariant</strong> — does γ × log(N²·D) lie in panel band 51.2 ± 16.8? Detects scaling/training anomalies.</p>
|
| 125 |
+
<div class="help-example" data-i18n="help.recipe.x22.example">
|
| 126 |
+
Try: <em>"Does Mistral-7B fit the compute-context invariant?"</em><br>
|
| 127 |
+
Answer: K = γ·log(N²·D), z-score, IN-BAND or OUTLIER.
|
| 128 |
+
</div>
|
| 129 |
+
<p data-i18n="help.v04.invariant" style="font-size: 0.9em; opacity: 0.85;"><strong>Chinchilla-attention invariant K</strong>: γ × log(N²·D) ≈ 51.2 ± 16.8 (CV=0.329). Connects compute scaling and attention exponent into a single dimensionless number.</p>
|
| 130 |
+
|
| 131 |
+
<p data-i18n="help.recipe.x23.title"><strong>X-23 IH-Phase Detector</strong> — pre- or post-induction-head? Cheap probe via sign(γ_text − γ_random).</p>
|
| 132 |
+
<div class="help-example" data-i18n="help.recipe.x23.example">
|
| 133 |
+
Try: <em>"Is Qwen2.5-7B post-induction-head?"</em><br>
|
| 134 |
+
Answer: CONFIRMED PRE-IH / CONFIRMED POST-IH / ANOMALY (with size-vs-Δγ consistency check).
|
| 135 |
+
</div>
|
| 136 |
+
<p data-i18n="help.v04.ih_probe" style="font-size: 0.9em; opacity: 0.85;"><strong>Δγ as IH probe</strong>: sign(γ_text − γ_random) > 0 ⟺ post-induction-head. Cheaper than running an in-context-learning benchmark.</p>
|
| 137 |
+
|
| 138 |
+
<p data-i18n="help.v04.constants" style="font-size: 0.9em; opacity: 0.85;"><strong>γ-cluster on famous constants</strong> (intriguing, n=4): CodeLlama-13b γ=0.382 ≈ 1−1/φ (golden conjugate, err 0.0003); pythia-1.4b γ=0.705 ≈ 1/√2; Llama-2-7b γ=0.287 ≈ 1−1/√2; Mistral-Nemo γ=0.428 ≈ log_10(e). Caveat: could be coincidence.</p>
|
| 139 |
+
|
| 140 |
<h3 data-i18n="help.add_models.title">Adding new models (3 ways)</h3>
|
| 141 |
<ul>
|
| 142 |
<li data-i18n="help.add_models.preset"><strong>Preset list</strong>: 11 popular models curated. Just select from dropdown.</li>
|
|
@@ -170,7 +170,7 @@ export const TRANSLATIONS = {
|
|
| 170 |
"help.modes.compare": "<strong>🆚 Compare</strong>: 2-3 models side-by-side on same recipe. Best when choosing between candidates.",
|
| 171 |
"help.modes.ask": "<strong>💬 Ask plain English</strong>: free-form question, in-browser LLM picks the recipe. Best for casual exploration.",
|
| 172 |
"help.modes.recipe": "<strong>📋 Recipe + form</strong>: manual selection, full parameter control. Best when you want exact control.",
|
| 173 |
-
"help.recipes.title": "The
|
| 174 |
"help.recipe.x1.title": "<strong>X-1 Custom training vs API</strong> — compares cost of training your own model vs paying for API access.",
|
| 175 |
"help.recipe.x1.example": "Try: <em>\"Should I train an 8B custom model or use GPT-4o for 50M tokens/month?\"</em><br>Answer types: YES (custom) / NO (API) with break-even months.",
|
| 176 |
"help.recipe.x2.title": "<strong>X-2 Long Context Viability</strong> — predicts if a model serves a target context length reliably.",
|
|
@@ -180,7 +180,18 @@ export const TRANSLATIONS = {
|
|
| 180 |
"help.recipe.x5.title": "<strong>X-5 Hardware selection</strong> — which GPU should I use to serve at target throughput?",
|
| 181 |
"help.recipe.x5.example": "Try: <em>\"Cheapest hardware to serve Llama-3-8B at 10M tokens/day\"</em><br>Answer: best GPU + $/Mtok + capacity vs target.",
|
| 182 |
"help.recipe.x19.title": "<strong>X-19 KV Compression decision</strong> — should I use soft decay, hard cutoff, or literature methods?",
|
|
|
|
|
|
|
|
|
|
| 183 |
"help.recipe.x19.example": "Try: <em>\"How to compress KV cache for Qwen2.5-7B at 32K?\"</em><br>Answer: USE SOFT DECAY / USE D_f CUTOFF / USE LITERATURE METHODS / USE HARD T_train.",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 184 |
"help.param.theta": "<strong>θ (rope_theta)</strong>: RoPE base frequency. Higher = more long-range capacity. Typical: 10000 (early), 500000 (Llama-3), 1000000 (Qwen2.5).",
|
| 185 |
"help.param.T_train": "<strong>T_train</strong>: max context the model was trained on. From <code>max_position_embeddings</code>.",
|
| 186 |
"help.param.T_eval": "<strong>T_eval</strong>: <em>your target</em> inference context length. The key knob.",
|
|
@@ -368,7 +379,7 @@ export const TRANSLATIONS = {
|
|
| 368 |
"help.modes.compare": "<strong>🆚 Comparar</strong>: 2-3 modelos lado a lado en la misma receta. Mejor al elegir entre candidatos.",
|
| 369 |
"help.modes.ask": "<strong>💬 Pregunta libre</strong>: pregunta en lenguaje natural, el LLM del navegador elige la receta. Mejor para exploración casual.",
|
| 370 |
"help.modes.recipe": "<strong>📋 Receta + formulario</strong>: selección manual, control total de parámetros. Mejor cuando quieres control exacto.",
|
| 371 |
-
"help.recipes.title": "Las
|
| 372 |
"help.recipe.x1.title": "<strong>X-1 Entrenamiento custom vs API</strong> — compara coste de entrenar tu propio modelo vs pagar API.",
|
| 373 |
"help.recipe.x1.example": "Prueba: <em>\"¿Entrenar 8B custom o usar GPT-4o para 50M tokens/mes?\"</em><br>Respuestas: SÍ (custom) / NO (API) con meses para break-even.",
|
| 374 |
"help.recipe.x2.title": "<strong>X-2 Viabilidad contexto largo</strong> — predice si un modelo sirve longitud objetivo de manera fiable.",
|
|
@@ -378,6 +389,17 @@ export const TRANSLATIONS = {
|
|
| 378 |
"help.recipe.x5.title": "<strong>X-5 Selección hardware</strong> — ¿qué GPU usar para servir al throughput objetivo?",
|
| 379 |
"help.recipe.x5.example": "Prueba: <em>\"Hardware más barato para servir Llama-3-8B a 10M tokens/día\"</em><br>Respuesta: mejor GPU + $/Mtok + capacidad vs objetivo.",
|
| 380 |
"help.recipe.x19.title": "<strong>X-19 Decisión compresión KV</strong> — ¿usar soft decay, hard cutoff, o métodos de literatura?",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 381 |
"help.recipe.x19.example": "Prueba: <em>\"¿Cómo comprimir caché KV para Qwen2.5-7B a 32K?\"</em><br>Respuesta: USE SOFT DECAY / USE D_f CUTOFF / USE LITERATURE METHODS / USE HARD T_train.",
|
| 382 |
"help.param.theta": "<strong>θ (rope_theta)</strong>: frecuencia base RoPE. Mayor = más capacidad de largo alcance. Típico: 10000 (modelos antiguos), 500000 (Llama-3), 1000000 (Qwen2.5).",
|
| 383 |
"help.param.T_train": "<strong>T_train</strong>: contexto máximo que vio el modelo durante entrenamiento. De <code>max_position_embeddings</code>.",
|
|
@@ -565,7 +587,7 @@ export const TRANSLATIONS = {
|
|
| 565 |
"help.modes.compare": "<strong>🆚 Comparer</strong>: 2-3 modèles côte à côte sur la même recette. Mieux pour choisir entre candidats.",
|
| 566 |
"help.modes.ask": "<strong>💬 Question libre</strong>: question en langage naturel, le LLM du navigateur choisit la recette. Mieux pour exploration casuelle.",
|
| 567 |
"help.modes.recipe": "<strong>📋 Recette + formulaire</strong>: sélection manuelle, contrôle total des paramètres. Mieux quand vous voulez un contrôle exact.",
|
| 568 |
-
"help.recipes.title": "Les
|
| 569 |
"help.recipe.x1.title": "<strong>X-1 Entraînement custom vs API</strong> — compare le coût d'entraîner votre propre modèle vs payer l'accès API.",
|
| 570 |
"help.recipe.x1.example": "Essayez: <em>« Dois-je entraîner un 8B custom ou utiliser GPT-4o pour 50M tokens/mois ? »</em><br>Réponses: OUI (custom) / NON (API) avec mois pour break-even.",
|
| 571 |
"help.recipe.x2.title": "<strong>X-2 Viabilité contexte long</strong> — prédit si un modèle sert une longueur cible de manière fiable.",
|
|
@@ -575,7 +597,18 @@ export const TRANSLATIONS = {
|
|
| 575 |
"help.recipe.x5.title": "<strong>X-5 Sélection hardware</strong> — quel GPU utiliser pour servir au throughput cible ?",
|
| 576 |
"help.recipe.x5.example": "Essayez: <em>« Hardware le moins cher pour servir Llama-3-8B à 10M tokens/jour »</em><br>Réponse: meilleur GPU + $/Mtok + capacité vs cible.",
|
| 577 |
"help.recipe.x19.title": "<strong>X-19 Décision compression KV</strong> — utiliser soft decay, hard cutoff, ou méthodes de littérature ?",
|
|
|
|
|
|
|
|
|
|
| 578 |
"help.recipe.x19.example": "Essayez: <em>« Comment compresser le cache KV pour Qwen2.5-7B à 32K ? »</em><br>Réponse: USE SOFT DECAY / USE D_f CUTOFF / USE LITERATURE METHODS / USE HARD T_train.",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 579 |
"help.param.theta": "<strong>θ (rope_theta)</strong>: fréquence de base RoPE. Plus haut = plus de capacité longue portée. Typique: 10000 (anciens), 500000 (Llama-3), 1000000 (Qwen2.5).",
|
| 580 |
"help.param.T_train": "<strong>T_train</strong>: contexte max vu par le modèle pendant l'entraînement. De <code>max_position_embeddings</code>.",
|
| 581 |
"help.param.T_eval": "<strong>T_eval</strong>: <em>votre</em> longueur de contexte cible en inférence. Le bouton clé.",
|
|
@@ -762,7 +795,7 @@ export const TRANSLATIONS = {
|
|
| 762 |
"help.modes.compare": "<strong>🆚 比较</strong>: 2-3 个模型在同一配方上并排。最适合在候选者之间选择。",
|
| 763 |
"help.modes.ask": "<strong>💬 自由提问</strong>: 自然语言问题,浏览器 LLM 选择配方。最适合随意探索。",
|
| 764 |
"help.modes.recipe": "<strong>📋 配方 + 表单</strong>: 手动选择,完全控制参数。最适合需要精确控制时。",
|
| 765 |
-
"help.recipes.title": "可用的
|
| 766 |
"help.recipe.x1.title": "<strong>X-1 自定义训练 vs API</strong> — 比较训练自己模型的成本与付费使用 API 的成本。",
|
| 767 |
"help.recipe.x1.example": "尝试: <em>\"我应该训练 8B 自定义模型还是使用 GPT-4o 处理每月 50M tokens?\"</em><br>答案: 是 (自定义) / 否 (API),含损益平衡月数。",
|
| 768 |
"help.recipe.x2.title": "<strong>X-2 长上下文可行性</strong> — 预测模型是否能可靠地服务目标上下文长度。",
|
|
@@ -772,7 +805,18 @@ export const TRANSLATIONS = {
|
|
| 772 |
"help.recipe.x5.title": "<strong>X-5 硬件选择</strong> — 应该使用哪个 GPU 以达到目标吞吐量?",
|
| 773 |
"help.recipe.x5.example": "尝试: <em>\"以每天 1000 万 tokens 提供 Llama-3-8B 的最便宜硬件\"</em><br>答案: 最佳 GPU + $/Mtok + 容量 vs 目标。",
|
| 774 |
"help.recipe.x19.title": "<strong>X-19 KV 压缩决策</strong> — 应该使用 soft decay、hard cutoff 还是文献方法?",
|
|
|
|
|
|
|
|
|
|
| 775 |
"help.recipe.x19.example": "尝试: <em>\"如何为 Qwen2.5-7B 在 32K 压缩 KV 缓存?\"</em><br>答案: USE SOFT DECAY / USE D_f CUTOFF / USE LITERATURE METHODS / USE HARD T_train.",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 776 |
"help.param.theta": "<strong>θ (rope_theta)</strong>: RoPE 基础频率。越高 = 长程能力越强。典型: 10000 (早期),500000 (Llama-3),1000000 (Qwen2.5)。",
|
| 777 |
"help.param.T_train": "<strong>T_train</strong>: 模型训练时的最大上下文。来自 <code>max_position_embeddings</code>。",
|
| 778 |
"help.param.T_eval": "<strong>T_eval</strong>: <em>您的</em> 目标推理上下文长度。关键旋钮。",
|
|
|
|
| 170 |
"help.modes.compare": "<strong>🆚 Compare</strong>: 2-3 models side-by-side on same recipe. Best when choosing between candidates.",
|
| 171 |
"help.modes.ask": "<strong>💬 Ask plain English</strong>: free-form question, in-browser LLM picks the recipe. Best for casual exploration.",
|
| 172 |
"help.modes.recipe": "<strong>📋 Recipe + form</strong>: manual selection, full parameter control. Best when you want exact control.",
|
| 173 |
+
"help.recipes.title": "The 8 recipes available",
|
| 174 |
"help.recipe.x1.title": "<strong>X-1 Custom training vs API</strong> — compares cost of training your own model vs paying for API access.",
|
| 175 |
"help.recipe.x1.example": "Try: <em>\"Should I train an 8B custom model or use GPT-4o for 50M tokens/month?\"</em><br>Answer types: YES (custom) / NO (API) with break-even months.",
|
| 176 |
"help.recipe.x2.title": "<strong>X-2 Long Context Viability</strong> — predicts if a model serves a target context length reliably.",
|
|
|
|
| 180 |
"help.recipe.x5.title": "<strong>X-5 Hardware selection</strong> — which GPU should I use to serve at target throughput?",
|
| 181 |
"help.recipe.x5.example": "Try: <em>\"Cheapest hardware to serve Llama-3-8B at 10M tokens/day\"</em><br>Answer: best GPU + $/Mtok + capacity vs target.",
|
| 182 |
"help.recipe.x19.title": "<strong>X-19 KV Compression decision</strong> — should I use soft decay, hard cutoff, or literature methods?",
|
| 183 |
+
"help.recipe.x21.title": "<strong>X-21 Imprint Purity Diagnostic</strong> — predicts γ on RANDOM tokens via ν=−1/(2π); how clean is the model's RoPE prediction?",
|
| 184 |
+
"help.recipe.x22.title": "<strong>X-22 Compute-Context Invariant</strong> — does γ × log(N²·D) lie in panel band 51.2 ± 16.8? Detects scaling/training anomalies.",
|
| 185 |
+
"help.recipe.x23.title": "<strong>X-23 IH-Phase Detector</strong> — pre- or post-induction-head? Cheap probe via sign(γ_text − γ_random).",
|
| 186 |
"help.recipe.x19.example": "Try: <em>\"How to compress KV cache for Qwen2.5-7B at 32K?\"</em><br>Answer: USE SOFT DECAY / USE D_f CUTOFF / USE LITERATURE METHODS / USE HARD T_train.",
|
| 187 |
+
"help.recipe.x21.example": "Try: <em>\"How clean is the RoPE prediction on Llama-3-8B?\"</em><br>Answer: predicted γ_random + purity diagnostic (CLEAN / OVER-IMPRINTED / UNDER-IMPRINTED).",
|
| 188 |
+
"help.recipe.x22.example": "Try: <em>\"Does Mistral-7B fit the compute-context invariant?\"</em><br>Answer: K = γ·log(N²·D), z-score, IN-BAND or OUTLIER.",
|
| 189 |
+
"help.recipe.x23.example": "Try: <em>\"Is Qwen2.5-7B post-induction-head?\"</em><br>Answer: CONFIRMED PRE-IH / CONFIRMED POST-IH / ANOMALY (with size-vs-Δγ consistency check).",
|
| 190 |
+
"help.section.v04": "<strong>What's new in v0.4</strong> (sesión 29 findings 2026-04-28): three diagnostic recipes derived from cross-model panel analysis (n=22 LLMs).",
|
| 191 |
+
"help.v04.imprint": "<strong>Learned-imprint slope ν = −1/(2π)</strong>: RoPE rotation period 2π drives a positional bias on weights, proportional to log(N_params). Even random tokens show this scaling. ν is DERIVED — not fitted (empirical err 0.3%).",
|
| 192 |
+
"help.v04.invariant": "<strong>Chinchilla-attention invariant K</strong>: γ × log(N²·D) ≈ 51.2 ± 16.8 (CV=0.329). Connects compute scaling and attention exponent into a single dimensionless number.",
|
| 193 |
+
"help.v04.ih_probe": "<strong>Δγ as IH probe</strong>: sign(γ_text − γ_random) > 0 ⟺ post-induction-head. Cheaper than running an in-context-learning benchmark.",
|
| 194 |
+
"help.v04.constants": "<strong>γ-cluster on famous constants</strong> (intriguing, n=4): CodeLlama-13b γ=0.382 ≈ 1−1/φ (golden conjugate, err 0.0003); pythia-1.4b γ=0.705 ≈ 1/√2; Llama-2-7b γ=0.287 ≈ 1−1/√2; Mistral-Nemo γ=0.428 ≈ log_10(e). Caveat: could be coincidence.",
|
| 195 |
"help.param.theta": "<strong>θ (rope_theta)</strong>: RoPE base frequency. Higher = more long-range capacity. Typical: 10000 (early), 500000 (Llama-3), 1000000 (Qwen2.5).",
|
| 196 |
"help.param.T_train": "<strong>T_train</strong>: max context the model was trained on. From <code>max_position_embeddings</code>.",
|
| 197 |
"help.param.T_eval": "<strong>T_eval</strong>: <em>your target</em> inference context length. The key knob.",
|
|
|
|
| 379 |
"help.modes.compare": "<strong>🆚 Comparar</strong>: 2-3 modelos lado a lado en la misma receta. Mejor al elegir entre candidatos.",
|
| 380 |
"help.modes.ask": "<strong>💬 Pregunta libre</strong>: pregunta en lenguaje natural, el LLM del navegador elige la receta. Mejor para exploración casual.",
|
| 381 |
"help.modes.recipe": "<strong>📋 Receta + formulario</strong>: selección manual, control total de parámetros. Mejor cuando quieres control exacto.",
|
| 382 |
+
"help.recipes.title": "Las 8 recetas disponibles",
|
| 383 |
"help.recipe.x1.title": "<strong>X-1 Entrenamiento custom vs API</strong> — compara coste de entrenar tu propio modelo vs pagar API.",
|
| 384 |
"help.recipe.x1.example": "Prueba: <em>\"¿Entrenar 8B custom o usar GPT-4o para 50M tokens/mes?\"</em><br>Respuestas: SÍ (custom) / NO (API) con meses para break-even.",
|
| 385 |
"help.recipe.x2.title": "<strong>X-2 Viabilidad contexto largo</strong> — predice si un modelo sirve longitud objetivo de manera fiable.",
|
|
|
|
| 389 |
"help.recipe.x5.title": "<strong>X-5 Selección hardware</strong> — ¿qué GPU usar para servir al throughput objetivo?",
|
| 390 |
"help.recipe.x5.example": "Prueba: <em>\"Hardware más barato para servir Llama-3-8B a 10M tokens/día\"</em><br>Respuesta: mejor GPU + $/Mtok + capacidad vs objetivo.",
|
| 391 |
"help.recipe.x19.title": "<strong>X-19 Decisión compresión KV</strong> — ¿usar soft decay, hard cutoff, o métodos de literatura?",
|
| 392 |
+
"help.recipe.x21.title": "<strong>X-21 Diagnóstico Pureza Imprint</strong> — predice γ sobre tokens RANDOM via ν=−1/(2π); ¿cuán limpia es la predicción RoPE del modelo?",
|
| 393 |
+
"help.recipe.x22.title": "<strong>X-22 Invariante Compute-Context</strong> — ¿γ × log(N²·D) está en banda 51.2 ± 16.8? Detecta anomalías de scaling/training.",
|
| 394 |
+
"help.recipe.x23.title": "<strong>X-23 Detector Fase IH</strong> — ¿pre- o post-induction-head? Probe barato via sign(γ_text − γ_random).",
|
| 395 |
+
"help.recipe.x21.example": "Prueba: <em>«¿Cuán limpia es la predicción RoPE en Llama-3-8B?»</em><br>Respuesta: γ_random predicho + diagnóstico (CLEAN / OVER-IMPRINTED / UNDER-IMPRINTED).",
|
| 396 |
+
"help.recipe.x22.example": "Prueba: <em>«¿Mistral-7B entra en el invariante compute-context?»</em><br>Respuesta: K = γ·log(N²·D), z-score, IN-BAND u OUTLIER.",
|
| 397 |
+
"help.recipe.x23.example": "Prueba: <em>«¿Qwen2.5-7B es post-induction-head?»</em><br>Respuesta: CONFIRMED PRE-IH / CONFIRMED POST-IH / ANOMALY (chequeo consistencia tamaño vs Δγ).",
|
| 398 |
+
"help.section.v04": "<strong>Novedades v0.4</strong> (hallazgos sesión 29 del 2026-04-28): tres recipes diagnósticas derivadas del análisis panel cross-model (n=22 LLMs).",
|
| 399 |
+
"help.v04.imprint": "<strong>Slope imprint aprendido ν = −1/(2π)</strong>: el periodo de rotación RoPE 2π provoca un sesgo posicional en los pesos, proporcional a log(N_params). Incluso tokens random muestran este scaling. ν es DERIVADO — no ajustado (err empírico 0.3%).",
|
| 400 |
+
"help.v04.invariant": "<strong>Invariante Chinchilla-atención K</strong>: γ × log(N²·D) ≈ 51.2 ± 16.8 (CV=0.329). Conecta compute scaling y exponente de atención en un solo número adimensional.",
|
| 401 |
+
"help.v04.ih_probe": "<strong>Δγ como probe IH</strong>: sign(γ_text − γ_random) > 0 ⟺ post-induction-head. Más barato que correr un benchmark in-context-learning.",
|
| 402 |
+
"help.v04.constants": "<strong>γ-cluster en constantes famosas</strong> (intrigante, n=4): CodeLlama-13b γ=0.382 ≈ 1−1/φ (conjugado áureo, err 0.0003); pythia-1.4b γ=0.705 ≈ 1/√2; Llama-2-7b γ=0.287 ≈ 1−1/√2; Mistral-Nemo γ=0.428 ≈ log_10(e). Caveat: podría ser coincidencia.",
|
| 403 |
"help.recipe.x19.example": "Prueba: <em>\"¿Cómo comprimir caché KV para Qwen2.5-7B a 32K?\"</em><br>Respuesta: USE SOFT DECAY / USE D_f CUTOFF / USE LITERATURE METHODS / USE HARD T_train.",
|
| 404 |
"help.param.theta": "<strong>θ (rope_theta)</strong>: frecuencia base RoPE. Mayor = más capacidad de largo alcance. Típico: 10000 (modelos antiguos), 500000 (Llama-3), 1000000 (Qwen2.5).",
|
| 405 |
"help.param.T_train": "<strong>T_train</strong>: contexto máximo que vio el modelo durante entrenamiento. De <code>max_position_embeddings</code>.",
|
|
|
|
| 587 |
"help.modes.compare": "<strong>🆚 Comparer</strong>: 2-3 modèles côte à côte sur la même recette. Mieux pour choisir entre candidats.",
|
| 588 |
"help.modes.ask": "<strong>💬 Question libre</strong>: question en langage naturel, le LLM du navigateur choisit la recette. Mieux pour exploration casuelle.",
|
| 589 |
"help.modes.recipe": "<strong>📋 Recette + formulaire</strong>: sélection manuelle, contrôle total des paramètres. Mieux quand vous voulez un contrôle exact.",
|
| 590 |
+
"help.recipes.title": "Les 8 recettes disponibles",
|
| 591 |
"help.recipe.x1.title": "<strong>X-1 Entraînement custom vs API</strong> — compare le coût d'entraîner votre propre modèle vs payer l'accès API.",
|
| 592 |
"help.recipe.x1.example": "Essayez: <em>« Dois-je entraîner un 8B custom ou utiliser GPT-4o pour 50M tokens/mois ? »</em><br>Réponses: OUI (custom) / NON (API) avec mois pour break-even.",
|
| 593 |
"help.recipe.x2.title": "<strong>X-2 Viabilité contexte long</strong> — prédit si un modèle sert une longueur cible de manière fiable.",
|
|
|
|
| 597 |
"help.recipe.x5.title": "<strong>X-5 Sélection hardware</strong> — quel GPU utiliser pour servir au throughput cible ?",
|
| 598 |
"help.recipe.x5.example": "Essayez: <em>« Hardware le moins cher pour servir Llama-3-8B à 10M tokens/jour »</em><br>Réponse: meilleur GPU + $/Mtok + capacité vs cible.",
|
| 599 |
"help.recipe.x19.title": "<strong>X-19 Décision compression KV</strong> — utiliser soft decay, hard cutoff, ou méthodes de littérature ?",
|
| 600 |
+
"help.recipe.x21.title": "<strong>X-21 Diagnostic Pureté Imprint</strong> — prédit γ sur tokens RANDOM via ν=−1/(2π); à quel point la prédiction RoPE du modèle est-elle propre ?",
|
| 601 |
+
"help.recipe.x22.title": "<strong>X-22 Invariant Compute-Context</strong> — γ × log(N²·D) est-il dans la bande 51.2 ± 16.8 ? Détecte anomalies de scaling/training.",
|
| 602 |
+
"help.recipe.x23.title": "<strong>X-23 Détecteur Phase IH</strong> — pré- ou post-induction-head ? Probe peu coûteux via sign(γ_text − γ_random).",
|
| 603 |
"help.recipe.x19.example": "Essayez: <em>« Comment compresser le cache KV pour Qwen2.5-7B à 32K ? »</em><br>Réponse: USE SOFT DECAY / USE D_f CUTOFF / USE LITERATURE METHODS / USE HARD T_train.",
|
| 604 |
+
"help.recipe.x21.example": "Essayez: <em>« Quelle est la pureté de la prédiction RoPE sur Llama-3-8B ? »</em><br>Réponse: γ_random prédit + diagnostic (CLEAN / OVER-IMPRINTED / UNDER-IMPRINTED).",
|
| 605 |
+
"help.recipe.x22.example": "Essayez: <em>« Mistral-7B entre-t-il dans l'invariant compute-context ? »</em><br>Réponse: K = γ·log(N²·D), z-score, IN-BAND ou OUTLIER.",
|
| 606 |
+
"help.recipe.x23.example": "Essayez: <em>« Qwen2.5-7B est-il post-induction-head ? »</em><br>Réponse: CONFIRMED PRE-IH / CONFIRMED POST-IH / ANOMALY.",
|
| 607 |
+
"help.section.v04": "<strong>Nouveautés v0.4</strong> (résultats session 29, 2026-04-28) : trois recettes de diagnostic dérivées de l'analyse panel cross-model (n=22 LLMs).",
|
| 608 |
+
"help.v04.imprint": "<strong>Pente d'imprint apprise ν = −1/(2π)</strong> : la période de rotation RoPE 2π entraîne un biais positionnel dans les poids, proportionnel à log(N_params). Même les tokens aléatoires montrent ce scaling. ν est DÉRIVÉ — non ajusté (erreur empirique 0,3 %).",
|
| 609 |
+
"help.v04.invariant": "<strong>Invariant Chinchilla-attention K</strong> : γ × log(N²·D) ≈ 51.2 ± 16.8 (CV=0.329). Connecte le scaling de compute et l'exposant d'attention en un seul nombre sans dimension.",
|
| 610 |
+
"help.v04.ih_probe": "<strong>Δγ comme probe IH</strong> : sign(γ_text − γ_random) > 0 ⟺ post-induction-head. Moins coûteux que de lancer un benchmark in-context-learning.",
|
| 611 |
+
"help.v04.constants": "<strong>γ-cluster sur constantes célèbres</strong> (intriguant, n=4) : CodeLlama-13b γ=0.382 ≈ 1−1/φ (conjugué doré, err 0,0003) ; pythia-1.4b γ=0.705 ≈ 1/√2 ; Llama-2-7b γ=0.287 ≈ 1−1/√2 ; Mistral-Nemo γ=0.428 ≈ log_10(e). Caveat : peut être coïncidence.",
|
| 612 |
"help.param.theta": "<strong>θ (rope_theta)</strong>: fréquence de base RoPE. Plus haut = plus de capacité longue portée. Typique: 10000 (anciens), 500000 (Llama-3), 1000000 (Qwen2.5).",
|
| 613 |
"help.param.T_train": "<strong>T_train</strong>: contexte max vu par le modèle pendant l'entraînement. De <code>max_position_embeddings</code>.",
|
| 614 |
"help.param.T_eval": "<strong>T_eval</strong>: <em>votre</em> longueur de contexte cible en inférence. Le bouton clé.",
|
|
|
|
| 795 |
"help.modes.compare": "<strong>🆚 比较</strong>: 2-3 个模型在同一配方上并排。最适合在候选者之间选择。",
|
| 796 |
"help.modes.ask": "<strong>💬 自由提问</strong>: 自然语言问题,浏览器 LLM 选择配方。最适合随意探索。",
|
| 797 |
"help.modes.recipe": "<strong>📋 配方 + 表单</strong>: 手动选择,完全控制参数。最适合需要精确控制时。",
|
| 798 |
+
"help.recipes.title": "可用的 8 个配方",
|
| 799 |
"help.recipe.x1.title": "<strong>X-1 自定义训练 vs API</strong> — 比较训练自己模型的成本与付费使用 API 的成本。",
|
| 800 |
"help.recipe.x1.example": "尝试: <em>\"我应该训练 8B 自定义模型还是使用 GPT-4o 处理每月 50M tokens?\"</em><br>答案: 是 (自定义) / 否 (API),含损益平衡月数。",
|
| 801 |
"help.recipe.x2.title": "<strong>X-2 长上下文可行性</strong> — 预测模型是否能可靠地服务目标上下文长度。",
|
|
|
|
| 805 |
"help.recipe.x5.title": "<strong>X-5 硬件选择</strong> — 应该使用哪个 GPU 以达到目标吞吐量?",
|
| 806 |
"help.recipe.x5.example": "尝试: <em>\"以每天 1000 万 tokens 提供 Llama-3-8B 的最便宜硬件\"</em><br>答案: 最佳 GPU + $/Mtok + 容量 vs 目标。",
|
| 807 |
"help.recipe.x19.title": "<strong>X-19 KV 压缩决策</strong> — 应该使用 soft decay、hard cutoff 还是文献方法?",
|
| 808 |
+
"help.recipe.x21.title": "<strong>X-21 Imprint 纯度诊断</strong> — 通过 ν=−1/(2π) 预测 RANDOM token 上的 γ;模型的 RoPE 预测有多干净?",
|
| 809 |
+
"help.recipe.x22.title": "<strong>X-22 Compute-Context 不变量</strong> — γ × log(N²·D) 是否落在 51.2 ± 16.8 区间内?检测 scaling/training 异常。",
|
| 810 |
+
"help.recipe.x23.title": "<strong>X-23 IH-Phase 检测器</strong> — 前- 还是后-induction-head?通过 sign(γ_text − γ_random) 进行廉价探测。",
|
| 811 |
"help.recipe.x19.example": "尝试: <em>\"如何为 Qwen2.5-7B 在 32K 压缩 KV 缓存?\"</em><br>答案: USE SOFT DECAY / USE D_f CUTOFF / USE LITERATURE METHODS / USE HARD T_train.",
|
| 812 |
+
"help.recipe.x21.example": "尝试: <em>\"Llama-3-8B 上的 RoPE 预测有多干净?\"</em><br>答案: 预测的 γ_random + 诊断 (CLEAN / OVER-IMPRINTED / UNDER-IMPRINTED)。",
|
| 813 |
+
"help.recipe.x22.example": "尝试: <em>\"Mistral-7B 是否符合 compute-context 不变量?\"</em><br>答案: K = γ·log(N²·D)、z-score、IN-BAND 或 OUTLIER。",
|
| 814 |
+
"help.recipe.x23.example": "尝试: <em>\"Qwen2.5-7B 是后-induction-head 吗?\"</em><br>答案: CONFIRMED PRE-IH / CONFIRMED POST-IH / ANOMALY。",
|
| 815 |
+
"help.section.v04": "<strong>v0.4 新增</strong> (第 29 次研究会话, 2026-04-28): 来自 cross-model panel 分析 (n=22 LLMs) 的三个诊断 recipes。",
|
| 816 |
+
"help.v04.imprint": "<strong>学习印记斜率 ν = −1/(2π)</strong>: RoPE 旋转周期 2π 在权重上引发位置偏置, 与 log(N_params) 成正比。即使 random token 也显示此 scaling。ν 是 DERIVED — 非拟合 (经验误差 0.3%)。",
|
| 817 |
+
"help.v04.invariant": "<strong>Chinchilla-attention 不变量 K</strong>: γ × log(N²·D) ≈ 51.2 ± 16.8 (CV=0.329)。将 compute scaling 和 attention 指数连接为单一无量纲数。",
|
| 818 |
+
"help.v04.ih_probe": "<strong>Δγ 作为 IH 探测</strong>: sign(γ_text − γ_random) > 0 ⟺ post-induction-head。比运行 in-context-learning 基准更便宜。",
|
| 819 |
+
"help.v04.constants": "<strong>γ 簇落在著名常数上</strong> (有趣, n=4): CodeLlama-13b γ=0.382 ≈ 1−1/φ (黄金共轭, err 0.0003); pythia-1.4b γ=0.705 ≈ 1/√2; Llama-2-7b γ=0.287 ≈ 1−1/√2; Mistral-Nemo γ=0.428 ≈ log_10(e)。Caveat: 可能是巧合。",
|
| 820 |
"help.param.theta": "<strong>θ (rope_theta)</strong>: RoPE 基础频率。越高 = 长程能力越强。典型: 10000 (早期),500000 (Llama-3),1000000 (Qwen2.5)。",
|
| 821 |
"help.param.T_train": "<strong>T_train</strong>: 模型训练时的最大上下文。来自 <code>max_position_embeddings</code>。",
|
| 822 |
"help.param.T_eval": "<strong>T_eval</strong>: <em>您的</em> 目标推理上下文长度。关键旋钮。",
|
|
@@ -99,6 +99,170 @@ def kv_soft_decay_regime(theta: float, gamma: float, T_train: int) -> str:
|
|
| 99 |
return "use-hard-cutoff"
|
| 100 |
|
| 101 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 102 |
# ════════════════════════════════════════════════════════════════════════════
|
| 103 |
# §17 — Pre-training viability formulas
|
| 104 |
# ════════════════════════════════════════════════════════════════════════════
|
|
@@ -584,6 +748,172 @@ def run_recipe_x19(theta, T_train, T_eval, n_attention_heads, n_kv_heads,
|
|
| 584 |
return _wrap("X-19", "KV compression decision", locals(), chain, verdict, reason, mit)
|
| 585 |
|
| 586 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 587 |
# ════════════════════════════════════════════════════════════════════════════
|
| 588 |
# Helpers
|
| 589 |
# ════════════════════════════════════════════════════════════════════════════
|
|
@@ -669,6 +999,31 @@ RECIPES = {
|
|
| 669 |
"category": "kv-compression",
|
| 670 |
"uses_sections": ["§26", "§19"],
|
| 671 |
},
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 672 |
}
|
| 673 |
|
| 674 |
|
|
|
|
| 99 |
return "use-hard-cutoff"
|
| 100 |
|
| 101 |
|
| 102 |
+
# ════════════════════════════════════════════════════════════════════════════
|
| 103 |
+
# §28 — Sesión 29 (2026-04-28): learned-imprint, F2 Chinchilla, Δγ-IH probe
|
| 104 |
+
# ════════════════════════════════════════════════════════════════════════════
|
| 105 |
+
NU_IMPRINT = -1.0 / (2 * math.pi) # §28 — learned-imprint slope (DERIVED, n=22 err 0.3%)
|
| 106 |
+
P_0_IMPRINT_M = 14.0 # baseline pythia-14m (smallest panel reference)
|
| 107 |
+
|
| 108 |
+
|
| 109 |
+
def gamma_random_predict(theta: float, T_eval: int, n_params_M: float) -> float:
|
| 110 |
+
"""§28.1 — Predicted γ on RANDOM-token input.
|
| 111 |
+
|
| 112 |
+
γ_random = γ_pade(θ,T) + ν · log_10(P / P_0), ν = -1/(2π) ≈ -0.1592.
|
| 113 |
+
Empirical n=22 LLMs (sesión 29). Random-input γ scales with model size
|
| 114 |
+
despite RoPE-Padé predicting only (θ,T) dependence — weights imprint
|
| 115 |
+
a learned positional bias proportional to log(N_params).
|
| 116 |
+
|
| 117 |
+
Predicted CI ≈ ±0.18 (95%).
|
| 118 |
+
"""
|
| 119 |
+
g_pade = gamma_pade(theta, T_eval)
|
| 120 |
+
return g_pade + NU_IMPRINT * math.log10(max(n_params_M, 1e-3) / P_0_IMPRINT_M)
|
| 121 |
+
|
| 122 |
+
|
| 123 |
+
def imprint_purity(gamma_random_obs: float, theta: float, T_eval: int,
|
| 124 |
+
n_params_M: float) -> dict:
|
| 125 |
+
"""§28.2 — Diagnostic: how clean is the model's RoPE-Padé prediction?
|
| 126 |
+
|
| 127 |
+
Compares observed γ_random to predicted (γ_pade + ν·log_10(P/P_0)).
|
| 128 |
+
Negative residual ⇒ extra-strong training imprint (less clean).
|
| 129 |
+
Positive ⇒ weaker than expected imprint (cleaner / less trained).
|
| 130 |
+
"""
|
| 131 |
+
g_pred = gamma_random_predict(theta, T_eval, n_params_M)
|
| 132 |
+
g_pade_only = gamma_pade(theta, T_eval)
|
| 133 |
+
residual = gamma_random_obs - g_pred
|
| 134 |
+
return {
|
| 135 |
+
"gamma_random_obs": gamma_random_obs,
|
| 136 |
+
"gamma_random_pred": g_pred,
|
| 137 |
+
"gamma_pade_only": g_pade_only,
|
| 138 |
+
"imprint_predicted": g_pred - g_pade_only,
|
| 139 |
+
"imprint_residual": residual,
|
| 140 |
+
"purity": "clean (within CI)" if abs(residual) < 0.18 else
|
| 141 |
+
("over-imprinted" if residual < 0 else "under-imprinted"),
|
| 142 |
+
"ci_95_half_width": 0.18,
|
| 143 |
+
}
|
| 144 |
+
|
| 145 |
+
|
| 146 |
+
def compute_invariant_K(gamma: float, n_params_M: float,
|
| 147 |
+
D_tokens: float = None) -> dict:
|
| 148 |
+
"""§29 — F2 Chinchilla compute-context invariant.
|
| 149 |
+
|
| 150 |
+
K = γ × log(N²·D), D = 20·N (Chinchilla compute-optimal) if not given.
|
| 151 |
+
Empirical: K ≈ 51.2 ± 16.8 (CV=0.329, n=22). In-distribution if K∈[34, 68].
|
| 152 |
+
"""
|
| 153 |
+
N = n_params_M * 1e6
|
| 154 |
+
if D_tokens is None:
|
| 155 |
+
D_tokens = 20 * N
|
| 156 |
+
K = gamma * math.log(N * N * D_tokens)
|
| 157 |
+
panel_mean, panel_std = 51.2, 16.8
|
| 158 |
+
z = (K - panel_mean) / panel_std
|
| 159 |
+
return {
|
| 160 |
+
"K": K,
|
| 161 |
+
"panel_mean": panel_mean,
|
| 162 |
+
"panel_std": panel_std,
|
| 163 |
+
"z_score": z,
|
| 164 |
+
"in_distribution": abs(z) <= 1.0,
|
| 165 |
+
"interpretation": "in-band" if abs(z) <= 1.0 else
|
| 166 |
+
("high-K outlier" if z > 0 else "low-K outlier"),
|
| 167 |
+
}
|
| 168 |
+
|
| 169 |
+
|
| 170 |
+
def ih_phase_check(gamma_text: float, gamma_random: float,
|
| 171 |
+
n_params_M: float = None) -> dict:
|
| 172 |
+
"""§30 — IH-formation phase discriminator.
|
| 173 |
+
|
| 174 |
+
sign(γ_text − γ_random) > 0 ⟺ post-IH (text concentrates more than random).
|
| 175 |
+
Pre-IH (P<400M, n=7): ⟨Δγ⟩ = -0.19 ± 0.26
|
| 176 |
+
Post-IH (P≥400M, n=15): ⟨Δγ⟩ = +0.03 ± 0.26
|
| 177 |
+
"""
|
| 178 |
+
delta = gamma_text - gamma_random
|
| 179 |
+
phase_observed = "post-IH" if delta > 0 else ("pre-IH" if delta < 0 else "ambiguous")
|
| 180 |
+
phase_expected = None
|
| 181 |
+
if n_params_M is not None:
|
| 182 |
+
phase_expected = "post-IH" if n_params_M * 1e6 >= 4e8 else "pre-IH"
|
| 183 |
+
consistent = (phase_expected is None) or (phase_observed == phase_expected)
|
| 184 |
+
return {
|
| 185 |
+
"delta_gamma": delta,
|
| 186 |
+
"phase_observed": phase_observed,
|
| 187 |
+
"phase_expected_by_size": phase_expected,
|
| 188 |
+
"consistent": consistent,
|
| 189 |
+
"panel_pre_IH_mean": -0.19,
|
| 190 |
+
"panel_post_IH_mean": +0.03,
|
| 191 |
+
"panel_std": 0.26,
|
| 192 |
+
}
|
| 193 |
+
|
| 194 |
+
|
| 195 |
+
def gamma_decompose_v2(gamma_pade_val: float, n_params_M: float,
|
| 196 |
+
has_GQA: bool = False, has_SWA: bool = False,
|
| 197 |
+
corpus: str = "text", is_instruct: bool = False) -> dict:
|
| 198 |
+
"""§28.3 — 6-axis decomposition (sesión 29 update with imprint axis).
|
| 199 |
+
|
| 200 |
+
γ_obs = γ_pade
|
| 201 |
+
+ ν·log_10(P/P_0)·𝟙[corpus=random] ← NEW imprint axis (DERIVED)
|
| 202 |
+
+ Δ_corpus(text-rand)
|
| 203 |
+
+ δ_arch(GQA, SWA)
|
| 204 |
+
+ δ_circuit(IH phase)
|
| 205 |
+
+ δ_train(steps, RLHF, instruct)
|
| 206 |
+
+ ε
|
| 207 |
+
Imprint axis activates only on RANDOM input. TEXT input dominated by corpus.
|
| 208 |
+
"""
|
| 209 |
+
delta_imprint = NU_IMPRINT * math.log10(max(n_params_M, 1e-3) / P_0_IMPRINT_M) \
|
| 210 |
+
if corpus == "random" else 0.0
|
| 211 |
+
delta_GQA = +0.11 if has_GQA else 0.0
|
| 212 |
+
delta_SWA = -0.21 if has_SWA else 0.0
|
| 213 |
+
delta_post_IH = -0.15 if n_params_M >= 400 else 0.0
|
| 214 |
+
delta_instruct = -0.10 if is_instruct else 0.0 # F9 tentative (n=3, p=0.06)
|
| 215 |
+
return {
|
| 216 |
+
"pade_centroid": gamma_pade_val,
|
| 217 |
+
"delta_imprint": delta_imprint,
|
| 218 |
+
"delta_GQA": delta_GQA,
|
| 219 |
+
"delta_SWA": delta_SWA,
|
| 220 |
+
"delta_post_IH": delta_post_IH,
|
| 221 |
+
"delta_instruct": delta_instruct,
|
| 222 |
+
"gamma_corrected": gamma_pade_val + delta_imprint + delta_GQA
|
| 223 |
+
+ delta_SWA + delta_post_IH + delta_instruct,
|
| 224 |
+
"corpus": corpus,
|
| 225 |
+
"axes": ["pade", "imprint", "GQA", "SWA", "IH", "instruct"],
|
| 226 |
+
}
|
| 227 |
+
|
| 228 |
+
|
| 229 |
+
def famous_constant_proximity(gamma: float, tolerance: float = 0.01) -> dict:
|
| 230 |
+
"""§31 — Detect proximity to famous constants in γ-cluster (sesión 29).
|
| 231 |
+
|
| 232 |
+
Empirical hits (n=4 in panel):
|
| 233 |
+
CodeLlama-13b γ=0.3823 ≈ 1−1/φ = 0.3820 (golden conjugate)
|
| 234 |
+
pythia-1.4b γ=0.7051 ≈ 1/√2 = 0.7071
|
| 235 |
+
Llama-2-7b γ=0.2871 ≈ 1−1/√2 = 0.2929
|
| 236 |
+
Mistral-Nemo γ=0.4284 ≈ log_10(e) = 0.4343
|
| 237 |
+
Returns nearest constant within tolerance, or None.
|
| 238 |
+
"""
|
| 239 |
+
phi = (1 + math.sqrt(5)) / 2
|
| 240 |
+
constants = {
|
| 241 |
+
"1−1/φ (golden conjugate)": 1 - 1/phi,
|
| 242 |
+
"1/√2": 1 / math.sqrt(2),
|
| 243 |
+
"1−1/√2": 1 - 1/math.sqrt(2),
|
| 244 |
+
"log_10(e)": math.log10(math.e),
|
| 245 |
+
"1/π": 1 / math.pi,
|
| 246 |
+
"2/π": 2 / math.pi,
|
| 247 |
+
"1/φ": 1 / phi,
|
| 248 |
+
"ln(2)": math.log(2),
|
| 249 |
+
"z*_Cayley = (√17−3)/2": (math.sqrt(17) - 3) / 2,
|
| 250 |
+
}
|
| 251 |
+
hits = []
|
| 252 |
+
for name, val in constants.items():
|
| 253 |
+
err = abs(gamma - val)
|
| 254 |
+
if err <= tolerance:
|
| 255 |
+
hits.append({"constant": name, "value": val, "error": err})
|
| 256 |
+
hits.sort(key=lambda h: h["error"])
|
| 257 |
+
return {
|
| 258 |
+
"gamma": gamma,
|
| 259 |
+
"tolerance": tolerance,
|
| 260 |
+
"n_hits": len(hits),
|
| 261 |
+
"hits": hits[:3],
|
| 262 |
+
"caveat": "n=4 hits in panel; could be coincidence (continuous distribution)",
|
| 263 |
+
}
|
| 264 |
+
|
| 265 |
+
|
| 266 |
# ════════════════════════════════════════════════════════════════════════════
|
| 267 |
# §17 — Pre-training viability formulas
|
| 268 |
# ════════════════════════════════════════════════════════════════════════════
|
|
|
|
| 748 |
return _wrap("X-19", "KV compression decision", locals(), chain, verdict, reason, mit)
|
| 749 |
|
| 750 |
|
| 751 |
+
# ─────────────────────────────────────────────────────────────────────
|
| 752 |
+
# X-21 — Imprint Purity Diagnostic (sesión 29 — uses §28 ν=−1/(2π))
|
| 753 |
+
# ─────────────────────────────────────────────────────────────────────
|
| 754 |
+
def run_recipe_x21(theta, T_train, n_attention_heads, n_kv_heads,
|
| 755 |
+
d_head, n_layers, n_params, T_eval=None,
|
| 756 |
+
gamma_random_obs=None, **_unused):
|
| 757 |
+
"""X-21: how clean is the model's RoPE-Padé prediction?
|
| 758 |
+
|
| 759 |
+
Predicts γ on RANDOM-token input via learned-imprint formula:
|
| 760 |
+
γ_random = γ_pade(θ,T) + ν·log_10(P/14M), ν = −1/(2π) ≈ −0.1592
|
| 761 |
+
If user provides observed γ_random, returns purity diagnostic.
|
| 762 |
+
"""
|
| 763 |
+
chain = []
|
| 764 |
+
if T_eval is None:
|
| 765 |
+
T_eval = T_train
|
| 766 |
+
|
| 767 |
+
# Step 1: γ_Padé baseline
|
| 768 |
+
g_pade = gamma_pade(theta, T_eval)
|
| 769 |
+
chain.append(_step(1, "§26.1", "γ_Padé", "(2θ-T√2)/(2θ+T√2)",
|
| 770 |
+
{"theta": theta, "T_eval": T_eval}, g_pade,
|
| 771 |
+
_phase_label(g_pade)))
|
| 772 |
+
|
| 773 |
+
# Step 2: predicted imprint shift
|
| 774 |
+
n_params_M = n_params / 1e6
|
| 775 |
+
imprint_shift = NU_IMPRINT * math.log10(max(n_params_M, 1e-3) / P_0_IMPRINT_M)
|
| 776 |
+
chain.append(_step(2, "§28.1", "Imprint shift", "ν·log_10(P/P_0), ν=−1/(2π)",
|
| 777 |
+
{"P_M": n_params_M, "P_0_M": P_0_IMPRINT_M, "nu": NU_IMPRINT},
|
| 778 |
+
imprint_shift,
|
| 779 |
+
f"Bigger model → stronger imprint (more negative shift)."))
|
| 780 |
+
|
| 781 |
+
# Step 3: predicted γ_random
|
| 782 |
+
g_pred = g_pade + imprint_shift
|
| 783 |
+
chain.append(_step(3, "§28.1", "γ_random predicted", "γ_pade + ν·log_10(P/P_0)",
|
| 784 |
+
{"gamma_pade": g_pade, "imprint": imprint_shift}, g_pred,
|
| 785 |
+
f"Predicted γ_random = {g_pred:.4f} ± 0.18 (95% CI)"))
|
| 786 |
+
|
| 787 |
+
# Step 4: purity diagnostic if observed value provided
|
| 788 |
+
if gamma_random_obs is not None:
|
| 789 |
+
purity = imprint_purity(gamma_random_obs, theta, T_eval, n_params_M)
|
| 790 |
+
chain.append(_step(4, "§28.2", "Imprint purity",
|
| 791 |
+
"obs − pred (purity = within ±0.18)",
|
| 792 |
+
{"gamma_random_obs": gamma_random_obs,
|
| 793 |
+
"gamma_random_pred": g_pred},
|
| 794 |
+
purity["imprint_residual"], purity["purity"]))
|
| 795 |
+
verdict = "CLEAN" if abs(purity["imprint_residual"]) < 0.18 else \
|
| 796 |
+
("OVER-IMPRINTED" if purity["imprint_residual"] < 0 else "UNDER-IMPRINTED")
|
| 797 |
+
reason = (f"Residual γ_random_obs − γ_pred = {purity['imprint_residual']:+.4f}. "
|
| 798 |
+
f"95% CI is ±0.18.")
|
| 799 |
+
mit = ("Models far from prediction may have anomalous training (e.g. heavy "
|
| 800 |
+
"fine-tuning, format conversion). Compare to native checkpoint.")
|
| 801 |
+
else:
|
| 802 |
+
verdict = "PREDICTION ONLY"
|
| 803 |
+
reason = (f"Predicted γ_random = {g_pred:.4f}. Provide gamma_random_obs to "
|
| 804 |
+
f"check purity (measure on RANDOM token sequences, e.g. via E4 protocol).")
|
| 805 |
+
mit = ("To measure: run a 150-prompt forward pass on RANDOM-token sequences "
|
| 806 |
+
"across distances d=10..1000 and fit power law. "
|
| 807 |
+
"(See https://github.com/karlesmarin/tafagent for E4 protocol.)")
|
| 808 |
+
|
| 809 |
+
return _wrap("X-21", "Imprint Purity Diagnostic", locals(), chain,
|
| 810 |
+
verdict, reason, mit)
|
| 811 |
+
|
| 812 |
+
|
| 813 |
+
# ─────────────────────────────────────────────────────────────────────
|
| 814 |
+
# X-22 — Compute-Context Invariant Check (sesión 29 — F2 Chinchilla)
|
| 815 |
+
# ─────────────────────────────────────────────────────────────────────
|
| 816 |
+
def run_recipe_x22(theta, T_train, n_params, gamma_obs, D_tokens=None,
|
| 817 |
+
T_eval=None, **_unused):
|
| 818 |
+
"""X-22: does the model lie in the empirical Chinchilla invariant band?
|
| 819 |
+
|
| 820 |
+
K = γ × log(N²·D), D = 20·N if not given.
|
| 821 |
+
Empirical: K ≈ 51.2 ± 16.8 (CV=0.329, n=22 panel).
|
| 822 |
+
"""
|
| 823 |
+
chain = []
|
| 824 |
+
if T_eval is None:
|
| 825 |
+
T_eval = T_train
|
| 826 |
+
|
| 827 |
+
n_params_M = n_params / 1e6
|
| 828 |
+
if D_tokens is None:
|
| 829 |
+
D_tokens = 20 * n_params # Chinchilla compute-optimal
|
| 830 |
+
|
| 831 |
+
# Step 1: K computation
|
| 832 |
+
inv = compute_invariant_K(gamma_obs, n_params_M, D_tokens)
|
| 833 |
+
chain.append(_step(1, "§29", "K = γ·log(N²·D)", "γ × ln(N²·D)",
|
| 834 |
+
{"gamma": gamma_obs, "N": n_params, "D": D_tokens},
|
| 835 |
+
inv["K"],
|
| 836 |
+
f"K = {inv['K']:.2f} (panel mean {inv['panel_mean']:.1f} ± "
|
| 837 |
+
f"{inv['panel_std']:.1f})"))
|
| 838 |
+
|
| 839 |
+
# Step 2: z-score interpretation
|
| 840 |
+
chain.append(_step(2, "§29", "z-score vs panel", "(K − μ)/σ",
|
| 841 |
+
{"K": inv["K"], "mean": inv["panel_mean"],
|
| 842 |
+
"std": inv["panel_std"]},
|
| 843 |
+
inv["z_score"],
|
| 844 |
+
inv["interpretation"]))
|
| 845 |
+
|
| 846 |
+
# Step 3: γ_pade comparison (anomaly test)
|
| 847 |
+
g_pade = gamma_pade(theta, T_eval)
|
| 848 |
+
pade_diff = gamma_obs - g_pade
|
| 849 |
+
chain.append(_step(3, "§26.1", "γ deviation from Padé", "γ_obs − γ_pade",
|
| 850 |
+
{"gamma_obs": gamma_obs, "gamma_pade": g_pade}, pade_diff,
|
| 851 |
+
"negative = anomaly (sub-Padé); positive = supra-Padé"))
|
| 852 |
+
|
| 853 |
+
if inv["in_distribution"]:
|
| 854 |
+
verdict = "IN-BAND"
|
| 855 |
+
reason = f"K = {inv['K']:.2f} within ±1σ of panel mean {inv['panel_mean']:.1f}."
|
| 856 |
+
mit = "Model conforms to compute-context invariant. No action needed."
|
| 857 |
+
else:
|
| 858 |
+
verdict = "OUTLIER"
|
| 859 |
+
reason = (f"K = {inv['K']:.2f} ({inv['interpretation']}). "
|
| 860 |
+
f"|z| = {abs(inv['z_score']):.2f} > 1.")
|
| 861 |
+
mit = ("High-K (over-concentrating attention for given compute) or low-K "
|
| 862 |
+
"(under-using compute for attention concentration). Check tokenizer, "
|
| 863 |
+
"training recipe, fine-tuning history.")
|
| 864 |
+
|
| 865 |
+
return _wrap("X-22", "Compute-Context Invariant", locals(), chain,
|
| 866 |
+
verdict, reason, mit)
|
| 867 |
+
|
| 868 |
+
|
| 869 |
+
# ─────────────────────────────────────────────────────────────────────
|
| 870 |
+
# X-23 — IH-Phase Detector (sesión 29 — F4 Δγ probe)
|
| 871 |
+
# ─────────────────────────────────────────────────────────────────────
|
| 872 |
+
def run_recipe_x23(n_params, gamma_text=None, gamma_random=None, **_unused):
|
| 873 |
+
"""X-23: is this checkpoint pre- or post-induction-head formation?
|
| 874 |
+
|
| 875 |
+
Discriminator: sign(γ_text − γ_random) > 0 ⟺ post-IH.
|
| 876 |
+
Cheaper than ICL benchmark for monitoring training trajectories.
|
| 877 |
+
"""
|
| 878 |
+
chain = []
|
| 879 |
+
n_params_M = n_params / 1e6
|
| 880 |
+
|
| 881 |
+
# Step 1: size-based prediction
|
| 882 |
+
expected = "post-IH" if n_params >= 4e8 else "pre-IH"
|
| 883 |
+
chain.append(_step(1, "§30", "Size-based phase prediction",
|
| 884 |
+
"P ≥ 400M ⇒ post-IH",
|
| 885 |
+
{"n_params_M": n_params_M, "threshold_M": 400}, expected))
|
| 886 |
+
|
| 887 |
+
# Step 2: γ-based discrimination if both gammas given
|
| 888 |
+
if gamma_text is not None and gamma_random is not None:
|
| 889 |
+
check = ih_phase_check(gamma_text, gamma_random, n_params_M)
|
| 890 |
+
chain.append(_step(2, "§30", "Δγ discriminator", "sign(γ_text − γ_random)",
|
| 891 |
+
{"gamma_text": gamma_text, "gamma_random": gamma_random},
|
| 892 |
+
check["delta_gamma"],
|
| 893 |
+
f"observed phase: {check['phase_observed']}"))
|
| 894 |
+
|
| 895 |
+
if check["consistent"]:
|
| 896 |
+
verdict = f"CONFIRMED {check['phase_observed'].upper()}"
|
| 897 |
+
reason = (f"Δγ = {check['delta_gamma']:+.3f} sign matches size-prediction "
|
| 898 |
+
f"({expected}).")
|
| 899 |
+
mit = "Phase confirmed. Use this checkpoint for downstream tasks accordingly."
|
| 900 |
+
else:
|
| 901 |
+
verdict = "ANOMALY"
|
| 902 |
+
reason = (f"Δγ = {check['delta_gamma']:+.3f} suggests {check['phase_observed']}, "
|
| 903 |
+
f"but size predicts {expected}. Investigate.")
|
| 904 |
+
mit = ("Possible causes: incomplete training, anomalous fine-tuning, "
|
| 905 |
+
"format conversion, tokenizer corruption (cf. F5 OLMo Δγ=0.30).")
|
| 906 |
+
else:
|
| 907 |
+
verdict = f"PREDICTED {expected.upper()}"
|
| 908 |
+
reason = (f"Only size given: P = {n_params_M:.0f}M. "
|
| 909 |
+
f"Provide gamma_text + gamma_random to verify via Δγ probe.")
|
| 910 |
+
mit = ("Run E4 protocol with corpus=mongo and corpus=random; "
|
| 911 |
+
"compare γ values.")
|
| 912 |
+
|
| 913 |
+
return _wrap("X-23", "IH-Phase Detector", locals(), chain,
|
| 914 |
+
verdict, reason, mit)
|
| 915 |
+
|
| 916 |
+
|
| 917 |
# ════════════════════════════════════════════════════════════════════════════
|
| 918 |
# Helpers
|
| 919 |
# ════════════════════════════════════════════════════════════════════════════
|
|
|
|
| 999 |
"category": "kv-compression",
|
| 1000 |
"uses_sections": ["§26", "§19"],
|
| 1001 |
},
|
| 1002 |
+
"X-21": {
|
| 1003 |
+
"name": "Imprint Purity Diagnostic",
|
| 1004 |
+
"description": "How clean is the model's RoPE-Padé prediction? Predicts γ on RANDOM-token input via ν=−1/(2π).",
|
| 1005 |
+
"fn": run_recipe_x21,
|
| 1006 |
+
"params": ["theta", "T_train", "n_attention_heads", "n_kv_heads",
|
| 1007 |
+
"d_head", "n_layers", "n_params", "T_eval", "gamma_random_obs"],
|
| 1008 |
+
"category": "diagnostic",
|
| 1009 |
+
"uses_sections": ["§26", "§28"],
|
| 1010 |
+
},
|
| 1011 |
+
"X-22": {
|
| 1012 |
+
"name": "Compute-Context Invariant",
|
| 1013 |
+
"description": "Does γ × log(N²·D) lie in the panel band 51.2 ± 16.8? Detects training/scaling anomalies.",
|
| 1014 |
+
"fn": run_recipe_x22,
|
| 1015 |
+
"params": ["theta", "T_train", "n_params", "gamma_obs", "D_tokens", "T_eval"],
|
| 1016 |
+
"category": "diagnostic",
|
| 1017 |
+
"uses_sections": ["§26", "§29"],
|
| 1018 |
+
},
|
| 1019 |
+
"X-23": {
|
| 1020 |
+
"name": "IH-Phase Detector",
|
| 1021 |
+
"description": "Is this model pre- or post-induction-head? Cheap probe via sign(γ_text − γ_random).",
|
| 1022 |
+
"fn": run_recipe_x23,
|
| 1023 |
+
"params": ["n_params", "gamma_text", "gamma_random"],
|
| 1024 |
+
"category": "diagnostic",
|
| 1025 |
+
"uses_sections": ["§30"],
|
| 1026 |
+
},
|
| 1027 |
}
|
| 1028 |
|
| 1029 |
|