karlexmarin Claude Opus 4.7 (1M context) commited on
Commit
6ab0441
·
0 Parent(s):

feat: TAF Agent v0.1 — client-side transformer diagnostic

Browse files

A Pyodide+WebLLM browser app that predicts transformer LLM viability
(long-context, training budget, hardware fit, KV compression, custom-vs-API)
using the TAF (Thermodynamic Attention Framework) formula chains from
Marin 2026.

Phase 1: Pyodide loads taf_browser.py (10 formulas, 11 model presets,
11 GPU catalog, deterministic Python, no server)
Phase 2: WebLLM loads Llama-3.2-1B in browser → plain-English synthesis
Phase 3: Free-form question router (LLM picks recipe + extracts params)

Recipes (5):
X-1 Custom training vs API
X-2 Long Context Viability
X-3 Budget Pre-flight
X-5 Hardware Selection for serving
X-19 KV Compression decision

UI: 2 modes (Ask plain-English / Recipe + form), HF Hub config fetch
for any public model, audit-trail expandable steps, mobile-responsive.

Hosting: GitHub Pages (static); compute: user's browser; cost: \$0/mo.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Files changed (8) hide show
  1. .github/workflows/deploy.yml +29 -0
  2. .gitignore +23 -0
  3. LICENSE +17 -0
  4. README.md +101 -0
  5. index.html +108 -0
  6. js/main.js +540 -0
  7. python/taf_browser.py +793 -0
  8. style.css +173 -0
.github/workflows/deploy.yml ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: Deploy to GitHub Pages
2
+ on:
3
+ push:
4
+ branches: [main]
5
+ workflow_dispatch:
6
+
7
+ permissions:
8
+ contents: read
9
+ pages: write
10
+ id-token: write
11
+
12
+ concurrency:
13
+ group: pages
14
+ cancel-in-progress: false
15
+
16
+ jobs:
17
+ deploy:
18
+ environment:
19
+ name: github-pages
20
+ url: ${{ steps.deployment.outputs.page_url }}
21
+ runs-on: ubuntu-latest
22
+ steps:
23
+ - uses: actions/checkout@v4
24
+ - uses: actions/configure-pages@v4
25
+ - uses: actions/upload-pages-artifact@v3
26
+ with:
27
+ path: '.'
28
+ - id: deployment
29
+ uses: actions/deploy-pages@v4
.gitignore ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Python
2
+ __pycache__/
3
+ *.pyc
4
+ *.pyo
5
+ *.egg-info/
6
+ .venv/
7
+ venv/
8
+
9
+ # Editors
10
+ .vscode/
11
+ .idea/
12
+ *.swp
13
+ .DS_Store
14
+
15
+ # Build artefacts
16
+ dist/
17
+ build/
18
+ node_modules/
19
+ *.log
20
+
21
+ # Local sandbox
22
+ local/
23
+ .cache/
LICENSE ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Apache License
2
+ Version 2.0, January 2004
3
+ http://www.apache.org/licenses/
4
+
5
+ Copyright 2026 Carles Marin
6
+
7
+ Licensed under the Apache License, Version 2.0 (the "License");
8
+ you may not use this file except in compliance with the License.
9
+ You may obtain a copy of the License at
10
+
11
+ http://www.apache.org/licenses/LICENSE-2.0
12
+
13
+ Unless required by applicable law or agreed to in writing, software
14
+ distributed under the License is distributed on an "AS IS" BASIS,
15
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
16
+ See the License for the specific language governing permissions and
17
+ limitations under the License.
README.md ADDED
@@ -0,0 +1,101 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🔬 TAF Agent
2
+
3
+ > **Transformer LLM diagnostic in your browser.** Free. Unlimited. Auditable.
4
+
5
+ Drop in a model config (or paste any HuggingFace model id), get a falsifiable answer to *"will it work?"* — backed by the Thermodynamic Attention Framework (TAF) formulas.
6
+
7
+ **🌐 Live demo**: https://transformerkmarin.github.io/tafagent *(once GitHub Pages is enabled)*
8
+
9
+ ---
10
+
11
+ ## What it does
12
+
13
+ Answers practical viability questions for transformer LLMs, with **zero servers**:
14
+
15
+ - *Will Llama-3-8B serve 32K context with NIAH retrieval?* → **X-2**
16
+ - *Should I train a custom 7B model or use GPT-4 API?* → **X-1**
17
+ - *I have $5K — what model can I afford to train?* → **X-3**
18
+ - *Cheapest GPU to serve Llama-70B at 100M tokens/day?* → **X-5**
19
+ - *Should I use soft KV decay or hard cutoff for compression?* → **X-19**
20
+
21
+ …each as a chain of TAF formulas (paper §17, §19, §20, §24, §26) rendered with full audit trail.
22
+
23
+ ## Two modes
24
+
25
+ - **💬 Ask in plain English** → in-browser LLM picks the right recipe and runs it
26
+ - **📋 Recipe + form** → manual selection, full control over every parameter
27
+
28
+ ## How it's free + unlimited
29
+
30
+ - Static HTML/JS hosted on **GitHub Pages** (truly unlimited bandwidth)
31
+ - Python TAF computation runs in your browser via **Pyodide** (no server)
32
+ - Plain-English synthesis runs **Llama-3.2-1B-Instruct** in your browser via **WebLLM** (your GPU)
33
+ - Model weights cached in IndexedDB after first load (~700MB, one-time)
34
+ - **Your data never leaves your browser**
35
+
36
+ ## Architecture
37
+
38
+ ```
39
+ GitHub Pages (HTML/JS)
40
+ ↓ (one-time download)
41
+ Your browser:
42
+ ├─ Pyodide → Python TAF formulas (CPU, instant)
43
+ └─ WebLLM → Llama-3.2-1B (GPU/CPU, deterministic-ish)
44
+ ```
45
+
46
+ ## How to add new models
47
+
48
+ 1. **Preset list** — 11 popular models curated, instant autofill
49
+ 2. **HF Hub fetch** — paste any model id (`Qwen/Qwen2.5-32B`, `meta-llama/Llama-3.3-70B-Instruct`, ...) → browser fetches `config.json` → autofill form
50
+ 3. **Manual** — fill the form fields directly
51
+
52
+ Works for any public RoPE / GQA / MHA / SWA / ALiBi / AbsPE model. Gated models (Llama family) require accepting the licence on HF first.
53
+
54
+ ## Status
55
+
56
+ - ✅ **Phase 1**: Pyodide + TAF formulas
57
+ - ✅ **Phase 2**: WebLLM synthesis (plain-English answer)
58
+ - ✅ **Phase 3**: Free-form question router (NLU → recipe selection)
59
+ - ✅ **5 recipes**: X-1, X-2, X-3, X-5, X-19
60
+ - 🚧 Phase 4: 15 more recipes (X-4, X-6...X-20) + advanced UI
61
+
62
+ ## Local development
63
+
64
+ ```bash
65
+ git clone https://github.com/karlesmarin/tafagent
66
+ cd tafagent
67
+ python -m http.server 8000
68
+ # open http://localhost:8000
69
+ ```
70
+
71
+ ## Browser requirements
72
+
73
+ - Chrome / Edge / Firefox 113+ for WebGPU acceleration (recommended)
74
+ - Older browsers fall back to CPU inference (slower but works)
75
+ - ~2 GB free RAM for Llama-3.2-1B
76
+ - ~700 MB disk for model cache (one-time)
77
+
78
+ ## Citation
79
+
80
+ If you use this tool, please cite the underlying paper:
81
+
82
+ ```bibtex
83
+ @article{marin2026transformer_thermodynamics,
84
+ author = {Marin, Carles},
85
+ title = {Transformer Thermodynamics: A Closed-Form Theory of Attention Decay,
86
+ Phase Transitions, and Context-Length Limits in RoPE Language Models},
87
+ year = {2026},
88
+ }
89
+ ```
90
+
91
+ ## License
92
+
93
+ Apache-2.0 (this code). Llama-3.2-1B distributed under the [Meta Llama 3.2 license](https://www.llama.com/llama3_2/license/).
94
+
95
+ ---
96
+
97
+ **Acknowledgements**: this tool would not exist without the open-weights commons
98
+ (Meta, Mistral, Qwen, EleutherAI, AI2 and many more), the Pyodide + WebLLM
99
+ projects, GitHub Pages free hosting, and the wider ML community keeping all
100
+ the tooling honest and accessible. Full list in the
101
+ [paper Acknowledgements](https://github.com/karlesmarin/NeurIPS).
index.html ADDED
@@ -0,0 +1,108 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!DOCTYPE html>
2
+ <html lang="en">
3
+ <head>
4
+ <meta charset="UTF-8" />
5
+ <meta name="viewport" content="width=device-width, initial-scale=1.0" />
6
+ <title>TAF Agent — Transformer Diagnostic in your Browser</title>
7
+ <meta name="description" content="Predict transformer LLM behaviour from config alone. Free, unlimited, runs entirely in your browser." />
8
+ <link rel="stylesheet" href="style.css" />
9
+ <script src="https://cdn.jsdelivr.net/pyodide/v0.26.4/full/pyodide.js"></script>
10
+ </head>
11
+ <body>
12
+ <header>
13
+ <h1>🔬 TAF Agent</h1>
14
+ <p class="tagline">
15
+ Transformer diagnostic in your browser. <strong>Free. Unlimited. Auditable.</strong>
16
+ </p>
17
+ <p class="subtle">
18
+ All computation happens locally — your data never leaves this page.
19
+ </p>
20
+ </header>
21
+
22
+ <main>
23
+ <!-- Status -->
24
+ <section id="status-bar"><div id="status">⏳ Loading Python runtime...</div></section>
25
+
26
+ <!-- Mode toggle -->
27
+ <section id="mode-section">
28
+ <h2>🎯 Mode</h2>
29
+ <div class="mode-tabs">
30
+ <button class="mode-btn active" data-mode="ask">💬 Ask in plain English</button>
31
+ <button class="mode-btn" data-mode="recipe">📋 Pick recipe + fill form</button>
32
+ </div>
33
+ <p id="mode-desc" class="recipe-desc">
34
+ Type a free-form question (e.g. "Will Llama-3-8B work at 32K context?"). The
35
+ in-browser LLM picks the right recipe and runs it.
36
+ </p>
37
+ </section>
38
+
39
+ <!-- Free-form question (mode=ask) -->
40
+ <section id="ask-section">
41
+ <h2>❓ Your question</h2>
42
+ <textarea id="question" rows="3" placeholder="e.g. Will Mistral-7B handle 16K NIAH retrieval? Or: I have $5,000, what model can I train? Or: Cheapest GPU to serve Llama-70B at 100M tokens/day?"></textarea>
43
+ <div style="display:flex; gap:0.5rem; margin-top:0.5rem; flex-wrap:wrap;">
44
+ <button id="ask-btn" disabled>🚀 Analyze</button>
45
+ <button id="example-btn" type="button" class="secondary">💡 Try an example</button>
46
+ </div>
47
+ </section>
48
+
49
+ <!-- Recipe selector (mode=recipe) -->
50
+ <section id="recipe-section" style="display:none;">
51
+ <h2>📋 Recipe</h2>
52
+ <select id="recipe-select" disabled>
53
+ <option value="">— select a recipe —</option>
54
+ </select>
55
+ <p id="recipe-desc-display" class="recipe-desc"></p>
56
+ </section>
57
+
58
+ <!-- Form (mode=recipe) -->
59
+ <section id="form-section" style="display:none;">
60
+ <h2>🎯 Inputs</h2>
61
+
62
+ <div class="form-row">
63
+ <label for="preset">Preset model:</label>
64
+ <select id="preset" disabled>
65
+ <option value="">— select to autofill —</option>
66
+ </select>
67
+ </div>
68
+
69
+ <div class="form-row">
70
+ <label for="hf-id">Or any HF model:</label>
71
+ <input type="text" id="hf-id" placeholder="e.g. Qwen/Qwen2.5-32B-Instruct" style="flex:1;" />
72
+ <button id="hf-fetch-btn" type="button" class="secondary">📥 Fetch</button>
73
+ </div>
74
+ <div id="hf-status" class="subtle" style="margin: -0.5rem 0 1rem; min-height:1.2em;"></div>
75
+
76
+ <!-- Dynamic form fields based on recipe -->
77
+ <div id="dynamic-form" class="form-grid"></div>
78
+
79
+ <button id="run-btn" disabled>🚀 Analyze</button>
80
+ </section>
81
+
82
+ <!-- Output -->
83
+ <section id="output-section" style="display:none;">
84
+ <h2>📊 Verdict</h2>
85
+ <div id="verdict-box"></div>
86
+
87
+ <h2>🔍 Computation Chain</h2>
88
+ <p class="subtle">Every number below is deterministic Python. Click a step to expand.</p>
89
+ <div id="chain-box"></div>
90
+
91
+ <h2 id="answer-header" style="display:none;">💬 Plain-English Answer</h2>
92
+ <div id="answer-box" style="display:none;"></div>
93
+ </section>
94
+ </main>
95
+
96
+ <footer>
97
+ <p>
98
+ © 2026 Carles Marin · Apache-2.0 ·
99
+ <a href="https://github.com/karlesmarin/tafagent" target="_blank">Source on GitHub</a>
100
+ </p>
101
+ <p class="subtle">
102
+ Computation: Pyodide (Python in browser) · Synthesis: WebLLM (Llama-3.2-1B local) · Hosting: GitHub Pages
103
+ </p>
104
+ </footer>
105
+
106
+ <script type="module" src="js/main.js"></script>
107
+ </body>
108
+ </html>
js/main.js ADDED
@@ -0,0 +1,540 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ // TAF Agent — main orchestration (Phases 1-3 complete)
2
+ //
3
+ // Phases:
4
+ // 1. Pyodide loads + TAF formulas → deterministic computation
5
+ // 2. WebLLM loads on demand → plain-English synthesis
6
+ // 3. Router (LLM) → free-form question → recipe + params
7
+
8
+ const TAF_BROWSER_URL = "python/taf_browser.py";
9
+ const ENABLE_WEBLLM = true;
10
+ const WEBLLM_MODEL = "Llama-3.2-1B-Instruct-q4f32_1-MLC";
11
+
12
+ const $ = (id) => document.getElementById(id);
13
+
14
+ const state = {
15
+ pyodide: null,
16
+ webllm: null,
17
+ presets: [],
18
+ recipes: [],
19
+ recipesById: {},
20
+ currentMode: "ask",
21
+ currentRecipe: null,
22
+ };
23
+
24
+ const EXAMPLES = [
25
+ "Will Meta-Llama-3-8B handle 32000-token NIAH retrieval reliably?",
26
+ "I have $5000 to spend on training. What model can I afford?",
27
+ "Should I use Mistral-7B-v0.1 at 16K context or extend it first?",
28
+ "Compare cheapest GPU to serve Llama-3-8B at 10 million tokens per day.",
29
+ "Should I use soft KV decay or hard cutoff for Qwen2.5-7B at 32K?",
30
+ "Is it cheaper to train an 8B custom model or use GPT-4o for 50M tokens/month?",
31
+ ];
32
+
33
+ // ════════════════════════════════════════════════════════════════════
34
+ // Bootstrap
35
+ // ════════════════════════════════════════════════════════════════════
36
+ async function loadPyodideAndTaf() {
37
+ setStatus("⏳ Loading Pyodide (Python runtime ~10MB)...");
38
+ state.pyodide = await loadPyodide({
39
+ indexURL: "https://cdn.jsdelivr.net/pyodide/v0.26.4/full/",
40
+ });
41
+ setStatus("⏳ Loading TAF formulas + recipes...");
42
+ const tafCode = await fetch(TAF_BROWSER_URL).then(r => r.text());
43
+ await state.pyodide.runPythonAsync(tafCode);
44
+
45
+ state.presets = JSON.parse(state.pyodide.runPython("list_presets()"));
46
+ state.recipes = JSON.parse(state.pyodide.runPython("list_recipes()"));
47
+ state.recipesById = Object.fromEntries(state.recipes.map(r => [r.id, r]));
48
+
49
+ populatePresets();
50
+ populateRecipes();
51
+ enableUI();
52
+ setStatus("✅ Ready. Ask a question or pick a recipe.");
53
+ }
54
+
55
+ function populatePresets() {
56
+ const sel = $("preset");
57
+ sel.innerHTML = '<option value="">— select to autofill —</option>';
58
+ state.presets.forEach(p => {
59
+ const opt = document.createElement("option");
60
+ opt.value = p.id;
61
+ opt.textContent = `${p.label} (θ=${p.theta.toLocaleString()}, T_train=${p.T_train})`;
62
+ sel.appendChild(opt);
63
+ });
64
+ }
65
+
66
+ function populateRecipes() {
67
+ const sel = $("recipe-select");
68
+ sel.innerHTML = '<option value="">— select a recipe —</option>';
69
+ state.recipes.forEach(r => {
70
+ const opt = document.createElement("option");
71
+ opt.value = r.id;
72
+ opt.textContent = `${r.id} — ${r.name}`;
73
+ sel.appendChild(opt);
74
+ });
75
+ }
76
+
77
+ function enableUI() {
78
+ $("ask-btn").disabled = false;
79
+ $("recipe-select").disabled = false;
80
+ $("preset").disabled = false;
81
+ }
82
+
83
+ function setStatus(msg) { $("status").textContent = msg; }
84
+
85
+ // ════════════════════════════════════════════════════════════════════
86
+ // Mode toggle
87
+ // ════════════════════════════════════════════════════════════════════
88
+ document.querySelectorAll(".mode-btn").forEach(btn => {
89
+ btn.addEventListener("click", () => {
90
+ document.querySelectorAll(".mode-btn").forEach(b => b.classList.remove("active"));
91
+ btn.classList.add("active");
92
+ const mode = btn.dataset.mode;
93
+ state.currentMode = mode;
94
+ if (mode === "ask") {
95
+ $("ask-section").style.display = "";
96
+ $("recipe-section").style.display = "none";
97
+ $("form-section").style.display = "none";
98
+ $("mode-desc").textContent =
99
+ "Type a free-form question. The in-browser LLM picks the right recipe and runs it.";
100
+ } else {
101
+ $("ask-section").style.display = "none";
102
+ $("recipe-section").style.display = "";
103
+ $("mode-desc").textContent =
104
+ "Pick a recipe directly and fill the form. Same result as Ask mode but fully manual.";
105
+ }
106
+ });
107
+ });
108
+
109
+ // ════════════════════════════════════════════════════════════════════
110
+ // Recipe selector
111
+ // ════════════════════════════════════════════════════════════════════
112
+ $("recipe-select").addEventListener("change", (e) => {
113
+ const rid = e.target.value;
114
+ if (!rid) {
115
+ $("form-section").style.display = "none";
116
+ return;
117
+ }
118
+ const r = state.recipesById[rid];
119
+ state.currentRecipe = r;
120
+ $("recipe-desc-display").textContent = r.description;
121
+ $("form-section").style.display = "";
122
+ buildDynamicForm(r);
123
+ });
124
+
125
+ function buildDynamicForm(recipe) {
126
+ const container = $("dynamic-form");
127
+ container.innerHTML = "";
128
+ const defaults = getRecipeDefaults(recipe.id);
129
+ recipe.params.forEach(name => {
130
+ const div = document.createElement("div");
131
+ div.className = "form-field";
132
+ const label = document.createElement("label");
133
+ label.textContent = paramLabel(name);
134
+ label.htmlFor = `param_${name}`;
135
+ const input = document.createElement("input");
136
+ input.type = "text";
137
+ input.id = `param_${name}`;
138
+ input.dataset.param = name;
139
+ input.value = defaults[name] !== undefined ? String(defaults[name]) : "";
140
+ div.appendChild(label);
141
+ div.appendChild(input);
142
+ container.appendChild(div);
143
+ });
144
+ $("run-btn").disabled = false;
145
+ }
146
+
147
+ function paramLabel(name) {
148
+ const labels = {
149
+ theta: "θ (rope_theta)", T_train: "T_train", T_eval: "T_eval (target context)",
150
+ n_attention_heads: "num_attention_heads", n_kv_heads: "num_key_value_heads",
151
+ d_head: "head_dim", n_layers: "num_hidden_layers", n_params: "n_params (e.g. 8e9)",
152
+ has_SWA: "Has SWA? (true/false)",
153
+ N_params: "N_params (e.g. 8e9)", D_tokens: "D_tokens (or empty for Chinchilla)",
154
+ gpu: "GPU", n_gpus: "n_gpus", mfu: "MFU (default 0.45)",
155
+ api_model: "API model to compare", monthly_tokens_M: "Monthly tokens (M)",
156
+ USD_budget: "USD budget", bytes_per_weight: "Bytes per weight (BF16=2)",
157
+ target_tokens_per_day: "Target tokens/day", concurrent_users: "Concurrent users",
158
+ };
159
+ return labels[name] || name;
160
+ }
161
+
162
+ function getRecipeDefaults(recipeId) {
163
+ const D = {
164
+ "X-1": { N_params: "8e9", D_tokens: "", gpu: "H100 SXM", n_gpus: 8, mfu: 0.45,
165
+ api_model: "GPT-4o", monthly_tokens_M: 10.0 },
166
+ "X-2": { theta: 500000, T_train: 8192, T_eval: 32000,
167
+ n_attention_heads: 32, n_kv_heads: 8, d_head: 128,
168
+ n_layers: 32, n_params: "8e9", has_SWA: false },
169
+ "X-3": { USD_budget: 5000, gpu: "H100 SXM", mfu: 0.45, n_gpus: 1 },
170
+ "X-5": { N_params: "8e9", T_eval: 4096, n_layers: 32, n_kv_heads: 8, d_head: 128,
171
+ bytes_per_weight: 2.0, target_tokens_per_day: 10000000, concurrent_users: 1 },
172
+ "X-19": { theta: 500000, T_train: 8192, T_eval: 8192,
173
+ n_attention_heads: 32, n_kv_heads: 8, d_head: 128,
174
+ n_layers: 32, n_params: "8e9", has_SWA: false },
175
+ };
176
+ return D[recipeId] || {};
177
+ }
178
+
179
+ // ════════════════════════════════════════════════════════════════════
180
+ // Preset autofill (works in recipe mode)
181
+ // ════════════════════════════════════════════════════════════════════
182
+ $("preset").addEventListener("change", (e) => {
183
+ if (!e.target.value) return;
184
+ const proxy = state.pyodide.runPython(`get_preset(${JSON.stringify(e.target.value)})`);
185
+ const preset = proxy.toJs ? proxy.toJs({ dict_converter: Object.fromEntries }) : proxy;
186
+ if (!preset || Object.keys(preset).length === 0) return;
187
+ fillRecipeForm(preset);
188
+ });
189
+
190
+ function fillRecipeForm(p) {
191
+ // Fill any matching field in dynamic form
192
+ Object.entries(p).forEach(([k, v]) => {
193
+ const map = {
194
+ theta: "theta", T_train: "T_train",
195
+ n_attention_heads: "n_attention_heads", n_kv_heads: "n_kv_heads",
196
+ d_head: "d_head", n_layers: "n_layers", n_params: "n_params",
197
+ has_SWA: "has_SWA",
198
+ };
199
+ const formId = "param_" + (map[k] || k);
200
+ const el = $(formId);
201
+ if (el) el.value = (typeof v === "number" && (k === "n_params" || v > 1e6))
202
+ ? v.toExponential(2) : String(v);
203
+ // Also fill N_params for cost recipes
204
+ if (k === "n_params") {
205
+ const np = $("param_N_params");
206
+ if (np) np.value = (typeof v === "number" ? v.toExponential(2) : String(v));
207
+ }
208
+ });
209
+ }
210
+
211
+ // ════════════════════════════════════════════════════════════════════
212
+ // HF Hub fetch (any model)
213
+ // ════════════════════════════════════════════════════════════════════
214
+ $("hf-fetch-btn").addEventListener("click", async () => {
215
+ const modelId = $("hf-id").value.trim();
216
+ if (!modelId) {
217
+ $("hf-status").textContent = "⚠ Enter a model id like 'Qwen/Qwen2.5-32B-Instruct'";
218
+ return;
219
+ }
220
+ $("hf-status").textContent = `⏳ Fetching config.json from HF Hub for ${modelId}...`;
221
+ $("hf-fetch-btn").disabled = true;
222
+ try {
223
+ const url = `https://huggingface.co/${modelId}/raw/main/config.json`;
224
+ const resp = await fetch(url);
225
+ if (!resp.ok) {
226
+ if (resp.status === 401 || resp.status === 403) {
227
+ throw new Error(`Model is gated (${resp.status}). Accept license on HF Hub first, or fill manually.`);
228
+ }
229
+ throw new Error(`HTTP ${resp.status} — config.json not found`);
230
+ }
231
+ const cfg = await resp.json();
232
+ const preset = configToPreset(cfg, modelId);
233
+ fillRecipeForm(preset);
234
+ $("hf-status").innerHTML = `✅ Config loaded for <strong>${modelId}</strong> (family: ${preset._family}). Verify values, click Analyze.`;
235
+ } catch (err) {
236
+ $("hf-status").textContent = `❌ ${err.message}`;
237
+ } finally {
238
+ $("hf-fetch-btn").disabled = false;
239
+ }
240
+ });
241
+
242
+ function configToPreset(cfg, modelId) {
243
+ const n_attn = cfg.num_attention_heads || cfg.n_head || 0;
244
+ const n_kv = cfg.num_key_value_heads || cfg.num_attention_heads || cfg.n_head || 0;
245
+ const hidden = cfg.hidden_size || cfg.d_model || cfg.n_embd || 0;
246
+ const d_head = cfg.head_dim || (n_attn > 0 ? Math.floor(hidden / n_attn) : 0);
247
+ const theta = cfg.rope_theta || cfg.rotary_emb_base ||
248
+ (cfg.alibi ? null : (cfg.position_embedding_type === "absolute" ? null : 10000));
249
+ const T_train = cfg.max_position_embeddings || cfg.max_sequence_length ||
250
+ cfg.n_positions || cfg.n_ctx || 0;
251
+ const n_layers = cfg.num_hidden_layers || cfg.n_layer || 0;
252
+ const has_SWA = !!(cfg.sliding_window || cfg.use_sliding_window);
253
+
254
+ let family = "rope-mha";
255
+ if (cfg.alibi) family = "alibi";
256
+ else if (cfg.model_type === "mamba" || cfg.model_type === "mamba2") family = "ssm";
257
+ else if (theta == null) family = "abspe";
258
+ else if (n_kv < n_attn) family = "rope-gqa";
259
+
260
+ const n_params_est = estimateParams(cfg);
261
+ return {
262
+ theta: theta || 10000, T_train: T_train || 2048,
263
+ n_attention_heads: n_attn, n_kv_heads: n_kv, d_head: d_head,
264
+ n_layers: n_layers, n_params: n_params_est, has_SWA: has_SWA,
265
+ _family: family, _model_id: modelId,
266
+ };
267
+ }
268
+
269
+ function estimateParams(cfg) {
270
+ const h = cfg.hidden_size || cfg.d_model || 0;
271
+ const L = cfg.num_hidden_layers || cfg.n_layer || 0;
272
+ const V = cfg.vocab_size || 32000;
273
+ return Math.round(12 * h * h * L + 2 * V * h);
274
+ }
275
+
276
+ // ════════════════════════════════════════════════════════════════════
277
+ // Run recipe (manual mode)
278
+ // ════════════════════════════════════════════════════════════════════
279
+ $("run-btn").addEventListener("click", async () => {
280
+ if (!state.currentRecipe) {
281
+ alert("Select a recipe first.");
282
+ return;
283
+ }
284
+ const rid = state.currentRecipe.id;
285
+ const params = collectParams(state.currentRecipe.params);
286
+ await runAndDisplay(rid, params);
287
+ });
288
+
289
+ function collectParams(paramNames) {
290
+ const p = {};
291
+ paramNames.forEach(name => {
292
+ const el = $("param_" + name);
293
+ if (!el || el.value === "") return;
294
+ let v = el.value;
295
+ if (v === "true" || v === "false") {
296
+ p[name] = (v === "true");
297
+ } else if (!isNaN(parseFloat(v)) && isFinite(v)) {
298
+ p[name] = parseFloat(v);
299
+ } else {
300
+ p[name] = v;
301
+ }
302
+ });
303
+ return p;
304
+ }
305
+
306
+ // ════════════════════════════════════════════════════════════════════
307
+ // Ask mode (free-form question via router)
308
+ // ════════════════════════════════════════════════════════════════════
309
+ $("ask-btn").addEventListener("click", async () => {
310
+ const q = $("question").value.trim();
311
+ if (!q) {
312
+ alert("Please type a question.");
313
+ return;
314
+ }
315
+ $("ask-btn").disabled = true;
316
+ setStatus("🤔 Asking the in-browser LLM to pick a recipe...");
317
+
318
+ try {
319
+ const route = await routeQuestion(q);
320
+ setStatus(`📋 Selected recipe ${route.recipe_id}. Running...`);
321
+ await runAndDisplay(route.recipe_id, route.params, q);
322
+ } catch (err) {
323
+ setStatus(`❌ Routing failed: ${err.message}`);
324
+ $("output-section").style.display = "block";
325
+ $("verdict-box").className = "verdict-no";
326
+ $("verdict-box").innerHTML = `<strong>Could not route question.</strong><br>${escapeHtml(err.message)}<br><br>Try the Recipe mode for full manual control.`;
327
+ } finally {
328
+ $("ask-btn").disabled = false;
329
+ }
330
+ });
331
+
332
+ $("example-btn").addEventListener("click", () => {
333
+ const ex = EXAMPLES[Math.floor(Math.random() * EXAMPLES.length)];
334
+ $("question").value = ex;
335
+ });
336
+
337
+ async function routeQuestion(question) {
338
+ const engine = await loadWebLLM();
339
+ const recipesDesc = state.recipes.map(r =>
340
+ ` ${r.id}: ${r.name} — ${r.description}\n params: ${r.params.join(", ")}`
341
+ ).join("\n");
342
+ const systemPrompt = `You are a routing function. Given a user's free-form question
343
+ about transformer LLM viability, you MUST output a single JSON object with two fields:
344
+ - recipe_id: one of [${state.recipes.map(r => r.id).join(", ")}]
345
+ - params: an object with parameter values inferred from the question
346
+
347
+ Available recipes:
348
+ ${recipesDesc}
349
+
350
+ Common model facts you may use:
351
+ Meta-Llama-3-8B: theta=500000, T_train=8192, n_attention_heads=32, n_kv_heads=8, d_head=128, n_layers=32, n_params=8e9
352
+ Mistral-7B-v0.1: theta=10000, T_train=8192, n_attention_heads=32, n_kv_heads=8, d_head=128, n_layers=32, n_params=7e9, has_SWA=true
353
+ Qwen2.5-7B: theta=1000000, T_train=32768, n_attention_heads=28, n_kv_heads=4, d_head=128, n_layers=28, n_params=7.6e9
354
+ Llama-3.3-70B-Instruct: theta=500000, T_train=131072, n_attention_heads=64, n_kv_heads=8, d_head=128, n_layers=80, n_params=70e9
355
+
356
+ Respond with ONLY the JSON object. No prose, no markdown fences, no explanation.`;
357
+
358
+ const reply = await engine.chat.completions.create({
359
+ messages: [
360
+ { role: "system", content: systemPrompt },
361
+ { role: "user", content: question },
362
+ ],
363
+ max_tokens: 400,
364
+ temperature: 0.0,
365
+ response_format: { type: "json_object" },
366
+ });
367
+ const raw = reply.choices[0].message.content.trim();
368
+ let parsed;
369
+ try {
370
+ parsed = JSON.parse(raw);
371
+ } catch (e) {
372
+ // Try extracting JSON from markdown fences
373
+ const m = raw.match(/\{[\s\S]*\}/);
374
+ if (!m) throw new Error(`LLM returned non-JSON: ${raw.slice(0, 200)}`);
375
+ parsed = JSON.parse(m[0]);
376
+ }
377
+ if (!parsed.recipe_id || !state.recipesById[parsed.recipe_id]) {
378
+ throw new Error(`Unknown recipe: ${parsed.recipe_id}`);
379
+ }
380
+ return parsed;
381
+ }
382
+
383
+ // ════════════════════════════════════════════════════════════════════
384
+ // Run + display + synthesize
385
+ // ════════════════════════════════════════════════════════════════════
386
+ async function runAndDisplay(recipeId, params, originalQuestion=null) {
387
+ setStatus("🧮 Computing TAF chain...");
388
+ state.pyodide.globals.set("__rid", recipeId);
389
+ state.pyodide.globals.set("__params", state.pyodide.toPy(params));
390
+ const resultJSON = state.pyodide.runPython(`
391
+ import json
392
+ result = run_recipe(__rid, **__params)
393
+ json.dumps(result)
394
+ `);
395
+ const result = JSON.parse(resultJSON);
396
+ result._original_question = originalQuestion;
397
+ renderResult(result);
398
+ $("output-section").style.display = "block";
399
+ setStatus("✅ Done. Numbers below.");
400
+ if (ENABLE_WEBLLM) {
401
+ await synthesizeAnswer(result);
402
+ }
403
+ }
404
+
405
+ function renderResult(r) {
406
+ if (r.error) {
407
+ $("verdict-box").className = "verdict-no";
408
+ $("verdict-box").innerHTML = `<strong>Error</strong>: ${escapeHtml(r.error)}`;
409
+ $("chain-box").innerHTML = "";
410
+ return;
411
+ }
412
+ const vBox = $("verdict-box");
413
+ let vClass = "";
414
+ if (r.verdict.startsWith("YES") || r.verdict === "GO") vClass = "verdict-yes";
415
+ else if (r.verdict.startsWith("NO")) vClass = "verdict-no";
416
+ else vClass = "verdict-degraded";
417
+ vBox.className = vClass;
418
+ vBox.innerHTML = `
419
+ <div style="display:flex; justify-content:space-between; align-items:center; margin-bottom:0.5rem;">
420
+ <div style="font-size:1.3rem; font-weight:700;">${escapeHtml(r.verdict)}</div>
421
+ <div class="recipe-tag">${r.recipe_id} — ${escapeHtml(r.recipe_name)}</div>
422
+ </div>
423
+ <div><strong>Reason:</strong> ${escapeHtml(r.reason)}</div>
424
+ ${r.mitigation && r.mitigation !== "None required." && r.mitigation !== "None — proceed with Chinchilla-optimal recipe."
425
+ ? `<div style="margin-top:0.5rem;"><strong>Action:</strong> ${escapeHtml(r.mitigation)}</div>`
426
+ : ""}
427
+ `;
428
+
429
+ const cBox = $("chain-box");
430
+ cBox.innerHTML = "";
431
+ r.chain.forEach(step => {
432
+ const div = document.createElement("details");
433
+ div.className = "chain-step";
434
+ div.innerHTML = `
435
+ <summary>
436
+ <span><strong>Step ${step.step}</strong> — ${escapeHtml(step.name)}</span>
437
+ <span class="step-section">${escapeHtml(step.section)}</span>
438
+ </summary>
439
+ <div class="step-formula">${escapeHtml(step.formula)}</div>
440
+ <div><strong>Inputs:</strong> ${escapeHtml(JSON.stringify(step.inputs))}</div>
441
+ <div class="step-result"><strong>Result:</strong> ${formatResult(step.result)}</div>
442
+ ${step.interpretation ? `<div class="step-interp">${escapeHtml(step.interpretation)}</div>` : ""}
443
+ `;
444
+ cBox.appendChild(div);
445
+ });
446
+ }
447
+
448
+ function formatResult(r) {
449
+ if (r === null || r === undefined) return "n/a (not applicable)";
450
+ if (typeof r === "number") return r.toLocaleString(undefined, { maximumFractionDigits: 4 });
451
+ if (typeof r === "object") return `<pre>${escapeHtml(JSON.stringify(r, null, 2))}</pre>`;
452
+ return String(r);
453
+ }
454
+
455
+ function escapeHtml(s) {
456
+ return String(s)
457
+ .replace(/&/g, "&amp;").replace(/</g, "&lt;").replace(/>/g, "&gt;")
458
+ .replace(/"/g, "&quot;").replace(/'/g, "&#39;");
459
+ }
460
+
461
+ // ════════════════════════════════════════════════════════════════════
462
+ // WebLLM (synthesis + router)
463
+ // ════════════════════════════════════════════════════════════════════
464
+ async function loadWebLLM() {
465
+ if (state.webllm) return state.webllm;
466
+ setStatus("⏳ Loading WebLLM library + Llama-3.2-1B (~700MB first time, cached after)...");
467
+ const { CreateMLCEngine } = await import("https://esm.run/@mlc-ai/web-llm");
468
+ state.webllm = await CreateMLCEngine(WEBLLM_MODEL, {
469
+ initProgressCallback: (info) => setStatus(`⏳ ${info.text || "Loading model..."}`),
470
+ });
471
+ return state.webllm;
472
+ }
473
+
474
+ async function synthesizeAnswer(result) {
475
+ $("answer-header").style.display = "block";
476
+ $("answer-box").style.display = "block";
477
+ $("answer-box").innerHTML = '<em style="color:var(--fg-dim);">Generating plain-English summary...</em>';
478
+
479
+ let engine;
480
+ try {
481
+ engine = await loadWebLLM();
482
+ } catch (err) {
483
+ $("answer-box").innerHTML = `<em style="color:var(--warning);">⚠ WebLLM failed: ${escapeHtml(String(err))}<br>Numbers above are still correct.</em>`;
484
+ return;
485
+ }
486
+ const prompt = buildSynthesisPrompt(result);
487
+ let answer = "";
488
+ try {
489
+ const reply = await engine.chat.completions.create({
490
+ messages: [
491
+ { role: "system", content: "You are a precise transformer LLM diagnostic assistant. Summarise pre-computed TAF results in 4-6 sentences. Cite section numbers. Always recommend an action. Never invent numbers." },
492
+ { role: "user", content: prompt },
493
+ ],
494
+ max_tokens: 400,
495
+ temperature: 0.2,
496
+ });
497
+ answer = reply.choices[0].message.content;
498
+ } catch (err) {
499
+ $("answer-box").innerHTML = `<em style="color:var(--warning);">⚠ Synthesis failed: ${escapeHtml(String(err))}</em>`;
500
+ return;
501
+ }
502
+ $("answer-box").innerHTML = `
503
+ <div style="white-space:pre-wrap; line-height:1.7;">${escapeHtml(answer)}</div>
504
+ <div style="margin-top:0.75rem; font-size:0.85rem; color:var(--fg-dim);">
505
+ ↑ Synthesised by Llama-3.2-1B in your browser. Numbers are deterministic Python.
506
+ </div>
507
+ `;
508
+ setStatus("✅ Done.");
509
+ }
510
+
511
+ function buildSynthesisPrompt(r) {
512
+ const numbersBlock = r.chain.map(s =>
513
+ `Step ${s.step} (${s.section}) ${s.name}: ${formatResultPlain(s.result)} — ${s.interpretation || ""}`
514
+ ).join("\n");
515
+ return `Recipe: ${r.recipe_id} — ${r.recipe_name}
516
+ ${r._original_question ? `User question: "${r._original_question}"\n` : ""}
517
+ Computed chain:
518
+ ${numbersBlock}
519
+
520
+ Verdict: ${r.verdict}
521
+ Reason: ${r.reason}
522
+ Action: ${r.mitigation}
523
+
524
+ Summarize for non-technical user in 4-6 sentences. Cite section numbers (§X.Y). Mention verdict and most important action.`;
525
+ }
526
+
527
+ function formatResultPlain(r) {
528
+ if (r === null || r === undefined) return "n/a";
529
+ if (typeof r === "number") return r.toLocaleString(undefined, { maximumFractionDigits: 4 });
530
+ if (typeof r === "object") return JSON.stringify(r);
531
+ return String(r);
532
+ }
533
+
534
+ // ════════════════════════════════════════════════════════════════════
535
+ // Bootstrap
536
+ // ════════════════════════════════════════════════════════════════════
537
+ loadPyodideAndTaf().catch(err => {
538
+ setStatus(`❌ Failed to initialise: ${err.message || err}`);
539
+ console.error(err);
540
+ });
python/taf_browser.py ADDED
@@ -0,0 +1,793 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ TAF Browser — Pyodide-compatible TAF formulas + recipes.
3
+
4
+ Pure-Python deterministic computations of TAF (Thermodynamic Attention Framework)
5
+ formulas, plus 5 cross-section recipes for the most common viability questions.
6
+
7
+ Author: Carles Marin <transformerkmarin@gmail.com>
8
+ License: Apache-2.0
9
+ """
10
+ from __future__ import annotations
11
+ import math
12
+ import json
13
+
14
+
15
+ # ════════════════════════════════════════════════════════════════════════════
16
+ # §26 — γ-Thermodynamics (OUR contribution)
17
+ # ════════════════════════════════════════════════════════════════════════════
18
+ def gamma_pade(theta: float, T_eval: int) -> float:
19
+ """§26.1 — γ = (2θ - T√2)/(2θ + T√2)"""
20
+ z_sqrt2 = T_eval * math.sqrt(2)
21
+ return (2 * theta - z_sqrt2) / (2 * theta + z_sqrt2)
22
+
23
+
24
+ def gamma_decompose(gamma_pade_val, has_GQA=False, has_SWA=False, n_params=0.0) -> dict:
25
+ """§26.10 — 5-axis decomposition (n=23 OLS, paper sesión 28)."""
26
+ delta_GQA = +0.11 if has_GQA else 0.0
27
+ delta_SWA = -0.21 if has_SWA else 0.0
28
+ delta_post_IH = -0.15 if n_params >= 4e8 else 0.0
29
+ return {
30
+ "pade_centroid": gamma_pade_val,
31
+ "delta_GQA": delta_GQA,
32
+ "delta_SWA": delta_SWA,
33
+ "delta_post_IH": delta_post_IH,
34
+ "gamma_corrected": gamma_pade_val + delta_GQA + delta_SWA + delta_post_IH,
35
+ }
36
+
37
+
38
+ def d_horizon(theta: float, gamma: float):
39
+ """§26.2 — d_h = θ(1-γ)√2/(1+γ). None if γ outside (0,1)."""
40
+ if gamma <= 0 or gamma >= 1:
41
+ return None
42
+ return theta * (1 - gamma) * math.sqrt(2) / (1 + gamma)
43
+
44
+
45
+ def l_niah_c(d_horizon_val):
46
+ """§26.5 — L_NIAH^c = 2·d_horizon."""
47
+ return None if d_horizon_val is None else 2 * d_horizon_val
48
+
49
+
50
+ def chi_susceptibility(gamma: float) -> float:
51
+ """§26.16 — χ = 1/|γ-1|."""
52
+ return float('inf') if gamma == 1.0 else 1.0 / abs(gamma - 1.0)
53
+
54
+
55
+ def p_hallucinate(L: int, theta: float, gamma: float):
56
+ """§26.9 — Horizon-overshoot probability."""
57
+ dh = d_horizon(theta, gamma)
58
+ if dh is None or L <= 0:
59
+ return None
60
+ chi = chi_susceptibility(gamma)
61
+ if chi == float('inf'):
62
+ return None
63
+ geom = max(0.0, 1.0 - (dh / L) ** (1 - gamma))
64
+ return geom * (math.sqrt(chi) / (1 + math.sqrt(chi)))
65
+
66
+
67
+ def theta_design(gamma_target: float, T_eval: int) -> float:
68
+ """§26.3 — θ to land at γ_target at T_eval (Padé inverse)."""
69
+ if gamma_target >= 1 or gamma_target <= -1:
70
+ raise ValueError("gamma_target must be in (-1, 1)")
71
+ return T_eval * math.sqrt(2) * (1 + gamma_target) / (2 * (1 - gamma_target))
72
+
73
+
74
+ def alpha_opt(gamma_target: float, T_eval: int, theta_nominal: float) -> float:
75
+ """§26.4 — α = θ_design / θ_nominal."""
76
+ return theta_design(gamma_target, T_eval) / theta_nominal
77
+
78
+
79
+ def df_window(gamma: float, N: int, f: float = 0.90):
80
+ """§26.7 — KV compression window. None outside [0.65, 0.85] zone."""
81
+ if not (0.65 <= gamma <= 0.85):
82
+ return None
83
+ if gamma >= 1:
84
+ return int(f * N)
85
+ inner = (1 - f) + f * N ** (1 - gamma)
86
+ return int(math.ceil(inner ** (1 / (1 - gamma))))
87
+
88
+
89
+ def kv_soft_decay_regime(theta: float, gamma: float, T_train: int) -> str:
90
+ """§26.8 — Soft decay régimen-bound. d_h ≳ T_train/2 ⇒ applies."""
91
+ dh = d_horizon(theta, gamma)
92
+ if dh is None:
93
+ return "use-hard-cutoff"
94
+ ratio = dh / max(1, T_train / 2)
95
+ if ratio >= 1.2:
96
+ return "applies"
97
+ if ratio >= 0.8:
98
+ return "borderline"
99
+ return "use-hard-cutoff"
100
+
101
+
102
+ # ════════════════════════════════════════════════════════════════════════════
103
+ # §17 — Pre-training viability formulas
104
+ # ════════════════════════════════════════════════════════════════════════════
105
+ def chinchilla_optimal_tokens(N_params: float, ratio: float = 20.0) -> float:
106
+ """§17.30 — Chinchilla 20:1 token budget. D = ratio · N."""
107
+ return ratio * N_params
108
+
109
+
110
+ def chinchilla_optimal_N(D_tokens: float, ratio: float = 20.0) -> float:
111
+ """§17.30 inverse — given D tokens, optimal N = D/20."""
112
+ return D_tokens / ratio
113
+
114
+
115
+ def training_flops(N_params: float, D_tokens: float) -> float:
116
+ """§17.10 — C ≈ 6·N·D total training FLOPs."""
117
+ return 6 * N_params * D_tokens
118
+
119
+
120
+ def training_memory_16N(N_params: float) -> dict:
121
+ """§17.20 — total memory ≈ 16·N bytes (model + grads + Adam moments)."""
122
+ bytes_total = 16 * N_params
123
+ return {
124
+ "bytes": bytes_total,
125
+ "GB": bytes_total / 1e9,
126
+ }
127
+
128
+
129
+ def emergent_threshold(N_params: float) -> str:
130
+ """§17.60 — capability threshold heuristic (Wei 2022)."""
131
+ if N_params >= 1e11:
132
+ return "above 100B — strong reasoning capabilities expected"
133
+ if N_params >= 1e10:
134
+ return "above 10B — most emergent capabilities present"
135
+ if N_params >= 1e9:
136
+ return "above 1B — basic instruction-following, not strong reasoning"
137
+ if N_params >= 1e8:
138
+ return "above 100M — useful for narrow tasks, no emergence"
139
+ return "below 100M — domain-specific tasks only"
140
+
141
+
142
+ # ════════════════════════════════════════════════════════════════════════════
143
+ # §19 — Inference economics
144
+ # ════════════════════════════════════════════════════════════════════════════
145
+ def kv_cache_memory(n_layers, n_kv_heads, d_head, seq_len, bytes_per_element=2.0) -> dict:
146
+ """§19.1 — bytes = 2·L·n_kv·d_h·seq·B."""
147
+ bytes_total = 2 * n_layers * n_kv_heads * d_head * seq_len * bytes_per_element
148
+ return {"bytes": bytes_total, "MB": bytes_total / 1e6, "GB": bytes_total / 1e9}
149
+
150
+
151
+ def model_weights_memory(N_params, bytes_per_element=2.0) -> dict:
152
+ """Inference memory for model weights only (BF16=2, INT8=1, INT4=0.5)."""
153
+ return {"GB": N_params * bytes_per_element / 1e9}
154
+
155
+
156
+ def inference_decode_throughput(N_params, hbm_GB_per_s, bytes_per_element=2.0) -> float:
157
+ """§19.7 — memory-bound decode: tokens/sec = HBM_BW / model_size."""
158
+ model_GB = N_params * bytes_per_element / 1e9
159
+ return hbm_GB_per_s / model_GB
160
+
161
+
162
+ # ════════════════════════════════════════════════════════════════════════════
163
+ # §20 — Hardware catalog (curated from vendor docs 2026)
164
+ # ════════════════════════════════════════════════════════════════════════════
165
+ GPU_CATALOG = {
166
+ # name: {bf16_TFLOPs, hbm_GB, hbm_GB_s, cloud_USD_per_h_spot, tdp_W}
167
+ "H100 SXM": {"flops": 989, "vram_GB": 80, "bw_GB_s": 3350, "usd_h": 2.5, "tdp": 700},
168
+ "H100 PCIe": {"flops": 756, "vram_GB": 80, "bw_GB_s": 2000, "usd_h": 2.0, "tdp": 350},
169
+ "H200": {"flops": 989, "vram_GB": 141, "bw_GB_s": 4800, "usd_h": 3.5, "tdp": 700},
170
+ "B200": {"flops": 2250, "vram_GB": 192, "bw_GB_s": 8000, "usd_h": 5.0, "tdp": 1000},
171
+ "A100 80GB": {"flops": 312, "vram_GB": 80, "bw_GB_s": 2000, "usd_h": 1.2, "tdp": 400},
172
+ "A100 40GB": {"flops": 312, "vram_GB": 40, "bw_GB_s": 1555, "usd_h": 1.0, "tdp": 400},
173
+ "L40S": {"flops": 362, "vram_GB": 48, "bw_GB_s": 864, "usd_h": 0.7, "tdp": 350},
174
+ "MI300X": {"flops": 1307, "vram_GB": 192, "bw_GB_s": 5300, "usd_h": 2.1, "tdp": 750},
175
+ "RTX 4090": {"flops": 165, "vram_GB": 24, "bw_GB_s": 1008, "usd_h": 0.4, "tdp": 450},
176
+ "RTX 5090": {"flops": 419, "vram_GB": 32, "bw_GB_s": 1792, "usd_h": 0.7, "tdp": 575},
177
+ "RTX 5060Ti":{"flops": 36, "vram_GB": 16, "bw_GB_s": 448, "usd_h": 0.0, "tdp": 180}, # local
178
+ }
179
+
180
+
181
+ def cost_per_training_run(N_params: float, D_tokens: float, gpu: str = "H100 SXM",
182
+ n_gpus: int = 8, mfu: float = 0.45) -> dict:
183
+ """§20.11 — cost = (flops_total / (peak·MFU·n_gpus)) · USD/h."""
184
+ info = GPU_CATALOG.get(gpu)
185
+ if info is None:
186
+ return {"error": f"unknown gpu '{gpu}'", "available": list(GPU_CATALOG.keys())}
187
+ total_flops = training_flops(N_params, D_tokens) # absolute FLOPs
188
+ effective_flops_per_sec = info["flops"] * 1e12 * mfu * n_gpus
189
+ seconds = total_flops / effective_flops_per_sec
190
+ hours = seconds / 3600
191
+ usd = hours * info["usd_h"] * n_gpus
192
+ return {
193
+ "total_FLOPs": total_flops,
194
+ "hours": hours,
195
+ "days": hours / 24,
196
+ "USD": usd,
197
+ "gpu": gpu, "n_gpus": n_gpus, "mfu": mfu,
198
+ }
199
+
200
+
201
+ def cost_per_inference_token(model_GB: float, gpu: str, batch: int = 1) -> dict:
202
+ """§19.9 / §20.12 — derived $/Mtok from memory-bound decode."""
203
+ info = GPU_CATALOG.get(gpu)
204
+ if info is None:
205
+ return {"error": f"unknown gpu '{gpu}'"}
206
+ tok_per_sec = info["bw_GB_s"] / model_GB * batch
207
+ sec_per_Mtok = 1e6 / tok_per_sec
208
+ h_per_Mtok = sec_per_Mtok / 3600
209
+ usd_per_Mtok = h_per_Mtok * info["usd_h"]
210
+ return {
211
+ "tok_per_sec": tok_per_sec,
212
+ "USD_per_Mtok": usd_per_Mtok,
213
+ "gpu": gpu, "batch": batch,
214
+ }
215
+
216
+
217
+ # ════════════════════════════════════���═══════════════════════════════════════
218
+ # §24 — Cost / ROI
219
+ # ════════════════════════════════════════════════════════════════════════════
220
+ API_PRICING = {
221
+ # USD per million tokens (input/output blended typical)
222
+ "GPT-4o": {"input": 2.5, "output": 10.0},
223
+ "GPT-4o-mini": {"input": 0.15, "output": 0.60},
224
+ "Claude-Opus-4": {"input": 15.0, "output": 75.0},
225
+ "Claude-Sonnet-4":{"input": 3.0, "output": 15.0},
226
+ "Claude-Haiku-4": {"input": 0.80, "output": 4.0},
227
+ "Gemini-1.5-Pro": {"input": 1.25, "output": 5.0},
228
+ "DeepSeek-V3": {"input": 0.27, "output": 1.10},
229
+ "Llama-3.3-70B (Together)": {"input": 0.88, "output": 0.88},
230
+ }
231
+
232
+
233
+ def break_even_volume(training_cost: float, self_inference_per_Mtok: float,
234
+ api_per_Mtok: float, blend_input_output: float = 0.5) -> dict:
235
+ """§24.3 — monthly tokens at which custom training pays off."""
236
+ savings_per_Mtok = api_per_Mtok - self_inference_per_Mtok
237
+ if savings_per_Mtok <= 0:
238
+ return {"error": "self-host more expensive than API per token; never breaks even"}
239
+ Mtok_breakeven = training_cost / savings_per_Mtok
240
+ return {
241
+ "savings_per_Mtok": savings_per_Mtok,
242
+ "Mtok_breakeven": Mtok_breakeven,
243
+ "tokens_breakeven": Mtok_breakeven * 1e6,
244
+ }
245
+
246
+
247
+ # ════════════════════════════════════════════════════════════════════════════
248
+ # RECIPES
249
+ # ════════════════════════════════════════════════════════════════════════════
250
+
251
+ # ─────────────────────────────────────────────────────────────────────
252
+ # X-2 — Long Context Viability
253
+ # ─────────────────────────────────────────────────────────────────────
254
+ def run_recipe_x2(theta, T_train, T_eval, n_attention_heads, n_kv_heads,
255
+ d_head, n_layers, n_params, has_SWA=False,
256
+ bytes_per_element=2.0, **_unused):
257
+ """X-2: will model M serve length L doing NIAH retrieval?"""
258
+ chain = []
259
+ g_pade = gamma_pade(theta, T_eval)
260
+ chain.append(_step(1, "§26.1", "γ_Padé", "γ = (2θ - T√2)/(2θ + T√2)",
261
+ {"theta": theta, "T_eval": T_eval}, g_pade,
262
+ _phase_label(g_pade)))
263
+
264
+ has_GQA = (n_kv_heads < n_attention_heads)
265
+ decomp = gamma_decompose(g_pade, has_GQA=has_GQA, has_SWA=has_SWA, n_params=n_params)
266
+ g_corr = decomp["gamma_corrected"]
267
+ chain.append(_step(2, "§26.10", "γ-decomposition", "γ + δ_GQA + δ_SWA + δ_post_IH",
268
+ {"has_GQA": has_GQA, "has_SWA": has_SWA, "n_params": n_params},
269
+ g_corr, breakdown=decomp))
270
+
271
+ dh = d_horizon(theta, g_corr)
272
+ chain.append(_step(3, "§26.2", "d_horizon", "d_h = θ(1-γ)√2/(1+γ)",
273
+ {"theta": theta, "gamma": g_corr}, dh,
274
+ "n/a — γ outside (0,1)" if dh is None else f"horizon at d={dh:.0f}"))
275
+
276
+ l_niah = l_niah_c(dh)
277
+ chain.append(_step(4, "§26.5", "L_NIAH^c", "L_NIAH^c = 2·d_horizon",
278
+ {"d_horizon": dh}, l_niah,
279
+ "n/a" if l_niah is None else f"NIAH 50% at L={l_niah:.0f}"))
280
+
281
+ p_hallu = p_hallucinate(T_eval, theta, g_corr)
282
+ chain.append(_step(5, "§26.9", "P_hallucinate", "max(0,1-(d_h/L)^(1-γ))·√χ/(1+√χ)",
283
+ {"L": T_eval, "theta": theta, "gamma": g_corr}, p_hallu,
284
+ "n/a (Phase B)" if p_hallu is None else f"{p_hallu*100:.1f}% predicted"))
285
+
286
+ kv = kv_cache_memory(n_layers, n_kv_heads, d_head, T_eval, bytes_per_element)
287
+ chain.append(_step(6, "§19.1", "KV cache memory", "2·L·n_kv·d_h·seq·B",
288
+ {"n_layers": n_layers, "n_kv_heads": n_kv_heads, "d_head": d_head,
289
+ "seq_len": T_eval, "bytes_per_element": bytes_per_element},
290
+ kv, f"{kv['GB']:.2f} GB per request"))
291
+
292
+ if g_corr <= 0 or g_corr >= 1:
293
+ verdict, reason = "NO", "Phase B / geometric collapse (γ_corrected outside (0,1))"
294
+ mit = (f"Apply NTK-aware extension. Required θ for γ=0.85: "
295
+ f"{theta_design(0.85, T_eval):,.0f}. α_opt = {alpha_opt(0.85, T_eval, theta):.2f} "
296
+ f"({'fine-tuning required' if alpha_opt(0.85, T_eval, theta) > 8 else 'zero-shot may work'}).")
297
+ elif dh is not None and T_eval < dh:
298
+ margin = (1 - T_eval / dh) * 100
299
+ verdict, reason = "YES", f"L={T_eval} inside d_horizon={dh:.0f} ({margin:.0f}% margin)."
300
+ mit = "None required."
301
+ elif dh is not None and T_eval < l_niah:
302
+ verdict, reason = "DEGRADED", f"L between d_horizon ({dh:.0f}) and L_NIAH^c ({l_niah:.0f})."
303
+ mit = "Consider context contraction OR NTK extension."
304
+ else:
305
+ verdict, reason = "NO", f"L={T_eval} exceeds NIAH ceiling {l_niah:.0f}."
306
+ mit = f"Apply NTK extension; need θ ≈ {theta_design(0.85, T_eval):,.0f} for γ=0.85."
307
+
308
+ return _wrap("X-2", "Long Context Viability", locals(), chain, verdict, reason, mit)
309
+
310
+
311
+ # ─────────────────────────────────────────────────────────────────────
312
+ # X-1 — Custom training vs API for a domain task
313
+ # ─────────────────────────────────────────────────────────────────────
314
+ def run_recipe_x1(N_params, D_tokens=None, gpu="H100 SXM", n_gpus=8, mfu=0.45,
315
+ api_model="GPT-4o", monthly_tokens_M=10.0, **_unused):
316
+ """X-1: custom training (Chinchilla optimal) vs API."""
317
+ chain = []
318
+
319
+ # Step 1: Chinchilla optimal D
320
+ if D_tokens is None:
321
+ D_tokens = chinchilla_optimal_tokens(N_params)
322
+ chain.append(_step(1, "§17.30", "Chinchilla optimal D", "D = 20·N",
323
+ {"N_params": N_params}, D_tokens,
324
+ f"recommended D = {D_tokens:.2e} tokens"))
325
+
326
+ # Step 2: training FLOPs
327
+ flops = training_flops(N_params, D_tokens)
328
+ chain.append(_step(2, "§17.10", "Training FLOPs", "C = 6·N·D",
329
+ {"N": N_params, "D": D_tokens}, flops,
330
+ f"{flops:.2e} FLOPs total"))
331
+
332
+ # Step 3: training cost
333
+ cost = cost_per_training_run(N_params, D_tokens, gpu=gpu, n_gpus=n_gpus, mfu=mfu)
334
+ chain.append(_step(3, "§20.11", "Training cost",
335
+ "hours·USD/h·n_gpus = total $",
336
+ {"gpu": gpu, "n_gpus": n_gpus, "mfu": mfu}, cost,
337
+ f"${cost['USD']:,.0f} over {cost['days']:.1f} days"))
338
+
339
+ # Step 4: model_GB and decode throughput
340
+ model_GB = N_params * 2 / 1e9 # BF16
341
+ inf = cost_per_inference_token(model_GB, gpu, batch=1)
342
+ chain.append(_step(4, "§19.9 / §20.12", "Self-inference $/Mtok",
343
+ "BW / model_GB → tok/s → $/Mtok",
344
+ {"model_GB": model_GB, "gpu": gpu}, inf,
345
+ f"${inf['USD_per_Mtok']:.2f} per million tokens (single user)"))
346
+
347
+ # Step 5: API blended price
348
+ api = API_PRICING.get(api_model, {"input": 2.0, "output": 8.0})
349
+ api_blend = (api["input"] + api["output"]) / 2
350
+ chain.append(_step(5, "§24.X", f"{api_model} blended price",
351
+ "(input + output) / 2 USD/Mtok",
352
+ {"api_model": api_model}, api_blend,
353
+ f"${api_blend:.2f}/Mtok blended"))
354
+
355
+ # Step 6: break-even
356
+ be = break_even_volume(cost["USD"], inf["USD_per_Mtok"], api_blend)
357
+ chain.append(_step(6, "§24.3", "Break-even tokens", "training$ / (api - self) = Mtok",
358
+ {"training_cost": cost["USD"]}, be,
359
+ _be_interp(be, monthly_tokens_M)))
360
+
361
+ # Verdict
362
+ if "error" in be:
363
+ verdict, reason = "NO", be["error"]
364
+ mit = f"Stick with {api_model} API."
365
+ elif monthly_tokens_M >= be["Mtok_breakeven"]:
366
+ verdict = "YES (custom)"
367
+ months_to_payoff = be["Mtok_breakeven"] / monthly_tokens_M
368
+ reason = (f"At {monthly_tokens_M} M tokens/month, break-even in "
369
+ f"{months_to_payoff:.1f} months. Long-term custom is cheaper.")
370
+ mit = f"Train at {gpu}×{n_gpus}; serve self-hosted."
371
+ else:
372
+ months = be["Mtok_breakeven"] / monthly_tokens_M
373
+ verdict = "NO (API)"
374
+ reason = (f"At {monthly_tokens_M} M tokens/month, break-even in "
375
+ f"{months:.1f} months — too slow.")
376
+ mit = f"Use {api_model} API (cheaper for your volume)."
377
+
378
+ return _wrap("X-1", "Custom training vs API", locals(), chain, verdict, reason, mit)
379
+
380
+
381
+ def _be_interp(be, monthly):
382
+ if "error" in be:
383
+ return be["error"]
384
+ months = be["Mtok_breakeven"] / max(monthly, 0.001)
385
+ return f"break-even at {be['Mtok_breakeven']:.0f} Mtok ({months:.1f} months at {monthly} M/mo)"
386
+
387
+
388
+ # ─────────────────────────────────────────────────────────────────────
389
+ # X-3 — Pre-flight check on $5K training budget
390
+ # ────────────────────────────────────────────────────────────────────���
391
+ def run_recipe_x3(USD_budget=5000.0, gpu="H100 SXM", mfu=0.45, n_gpus=1, **_unused):
392
+ """X-3: given $ budget, what model can I train?"""
393
+ chain = []
394
+ info = GPU_CATALOG[gpu]
395
+
396
+ # Step 1: GPU-hours we can afford
397
+ hours = USD_budget / (info["usd_h"] * n_gpus)
398
+ chain.append(_step(1, "§20.11", "Affordable GPU-hours", "USD / ($/h·n_gpus)",
399
+ {"USD": USD_budget, "gpu": gpu, "n_gpus": n_gpus}, hours,
400
+ f"{hours:.0f} GPU-hours total ({hours/24:.1f} days at full use)"))
401
+
402
+ # Step 2: max FLOPs
403
+ max_flops = info["flops"] * 1e12 * mfu * n_gpus * hours * 3600
404
+ chain.append(_step(2, "§17.10", "Max training FLOPs",
405
+ "peak·MFU·n_gpus·seconds",
406
+ {"peak_TFLOPs": info["flops"], "MFU": mfu}, max_flops,
407
+ f"{max_flops:.2e} effective FLOPs"))
408
+
409
+ # Step 3: Chinchilla-optimal N (with D=20N)
410
+ # 6·N·D = max_flops, D=20N → 120·N² = max_flops → N = sqrt(max_flops/120)
411
+ N_chinchilla = math.sqrt(max_flops / 120)
412
+ D_chinchilla = 20 * N_chinchilla
413
+ chain.append(_step(3, "§17.30", "Chinchilla-optimal N",
414
+ "N = √(C/120) at D=20N", {"max_FLOPs": max_flops},
415
+ N_chinchilla,
416
+ f"N ≈ {N_chinchilla:.2e} params with D = {D_chinchilla:.2e} tokens"))
417
+
418
+ # Step 4: emergence check
419
+ emerg = emergent_threshold(N_chinchilla)
420
+ chain.append(_step(4, "§17.60", "Emergence threshold", "Wei 2022 capability",
421
+ {"N": N_chinchilla}, emerg, emerg))
422
+
423
+ # Step 5: memory budget check
424
+ mem = training_memory_16N(N_chinchilla)
425
+ fits = mem["GB"] <= info["vram_GB"]
426
+ chain.append(_step(5, "§17.20", "16N training memory",
427
+ "model + grads + AdamW",
428
+ {"N": N_chinchilla}, mem,
429
+ f"{mem['GB']:.1f} GB needed; "
430
+ f"{'fits in ' if fits else 'EXCEEDS '}{info['vram_GB']} GB VRAM"))
431
+
432
+ # Verdict
433
+ if N_chinchilla < 1e8:
434
+ verdict, reason = "TINY-MODEL", f"Budget supports only ~{N_chinchilla:.0e} params"
435
+ mit = "Use LoRA fine-tuning of larger pretrained model instead."
436
+ elif not fits:
437
+ verdict, reason = "MEMORY-LIMITED", f"Chinchilla N ({N_chinchilla:.1e}) doesn't fit one {gpu}"
438
+ mit = f"Use ZeRO-3 across multiple GPUs (need ≥{math.ceil(mem['GB']/info['vram_GB'])}× {gpu}) OR train smaller N undertrained."
439
+ else:
440
+ verdict = "GO"
441
+ reason = (f"At ${USD_budget}, train {N_chinchilla:.1e}-param model on "
442
+ f"{D_chinchilla:.1e} tokens in ~{hours/24:.1f} days. "
443
+ f"Capability tier: {emerg.split('—')[0].strip()}.")
444
+ mit = "None — proceed with Chinchilla-optimal recipe."
445
+
446
+ return _wrap("X-3", "Budget pre-flight", locals(), chain, verdict, reason, mit)
447
+
448
+
449
+ # ─────────────────────────────────────────────────────────────────────
450
+ # X-5 — Hardware selection for serving
451
+ # ─────────────────────────────────────────────────────────────────────
452
+ def run_recipe_x5(N_params, T_eval=4096, n_layers=32, n_kv_heads=8, d_head=128,
453
+ bytes_per_weight=2.0, target_tokens_per_day=10_000_000.0,
454
+ concurrent_users=1, **_unused):
455
+ """X-5: which GPU should I use to serve N-param model at L context?"""
456
+ chain = []
457
+
458
+ # Step 1: weights memory
459
+ w_mem = model_weights_memory(N_params, bytes_per_weight)
460
+ chain.append(_step(1, "§19.X", "Model weights memory",
461
+ "N · bytes_per_weight",
462
+ {"N": N_params, "bytes": bytes_per_weight}, w_mem,
463
+ f"{w_mem['GB']:.1f} GB for weights"))
464
+
465
+ # Step 2: KV cache per request
466
+ kv = kv_cache_memory(n_layers, n_kv_heads, d_head, T_eval, bytes_per_weight)
467
+ chain.append(_step(2, "§19.1", "KV cache (per request)",
468
+ "2·L·n_kv·d_h·seq·B",
469
+ {"n_layers": n_layers, "n_kv": n_kv_heads,
470
+ "d_head": d_head, "seq": T_eval}, kv,
471
+ f"{kv['GB']:.2f} GB per concurrent request"))
472
+
473
+ # Step 3: total memory needed
474
+ total_GB = w_mem["GB"] + kv["GB"] * concurrent_users
475
+ chain.append(_step(3, "§20.3", "Total GPU memory",
476
+ "weights + KV·n_concurrent", {}, {"GB": total_GB},
477
+ f"{total_GB:.1f} GB for {concurrent_users} concurrent users"))
478
+
479
+ # Step 4: scan GPU catalog
480
+ candidates = []
481
+ for name, info in GPU_CATALOG.items():
482
+ if info["vram_GB"] < total_GB:
483
+ continue
484
+ # Decode throughput estimate (memory-bound)
485
+ tok_per_s = info["bw_GB_s"] / w_mem["GB"]
486
+ tok_per_day = tok_per_s * 86400
487
+ capacity_users = tok_per_day / target_tokens_per_day
488
+ usd_per_day = info["usd_h"] * 24
489
+ usd_per_Mtok = (usd_per_day / (tok_per_day / 1e6)) if tok_per_day > 0 else float('inf')
490
+ candidates.append({
491
+ "gpu": name, "vram_GB": info["vram_GB"], "bw_GB_s": info["bw_GB_s"],
492
+ "tok_per_sec": tok_per_s, "tok_per_day": tok_per_day,
493
+ "USD_per_day": usd_per_day, "USD_per_Mtok": usd_per_Mtok,
494
+ "users_supported": capacity_users,
495
+ })
496
+ candidates.sort(key=lambda c: c["USD_per_Mtok"])
497
+ chain.append(_step(4, "§20", f"Eligible GPUs (≥{total_GB:.0f}GB)",
498
+ "filter + rank by $/Mtok",
499
+ {"min_VRAM": total_GB}, candidates[:5],
500
+ f"{len(candidates)} GPUs fit; cheapest: {candidates[0]['gpu'] if candidates else 'NONE'}"))
501
+
502
+ # Verdict
503
+ if not candidates:
504
+ verdict, reason = "NO", f"No single GPU has ≥{total_GB:.0f} GB VRAM."
505
+ mit = (f"Use tensor parallelism across multiple GPUs "
506
+ f"(e.g. 2× H100 = 160GB), or quantize to INT8 (halves memory).")
507
+ else:
508
+ best = candidates[0]
509
+ verdict = "YES"
510
+ reason = (f"Best GPU: {best['gpu']} at ${best['USD_per_Mtok']:.2f}/Mtok. "
511
+ f"Supports {best['users_supported']:.1f}× your daily target.")
512
+ mit = f"Provision {best['gpu']}, expected {best['tok_per_sec']:.0f} tok/s decode."
513
+
514
+ return _wrap("X-5", "Hardware selection for serving", locals(), chain, verdict, reason, mit)
515
+
516
+
517
+ # ─────────────────────────────────────────────────────────────────────
518
+ # X-19 — KV compression decision (ours vs literature)
519
+ # ─────────────────────────────────────────────────────────────────────
520
+ def run_recipe_x19(theta, T_train, T_eval, n_attention_heads, n_kv_heads,
521
+ d_head, n_layers, n_params, has_SWA=False, **_unused):
522
+ """X-19: should I use γ-soft KV decay, hard D_f, or literature methods?"""
523
+ chain = []
524
+
525
+ # Step 1: γ_Padé
526
+ g_pade = gamma_pade(theta, T_eval)
527
+ chain.append(_step(1, "§26.1", "γ_Padé", "(2θ-T√2)/(2θ+T√2)",
528
+ {"theta": theta, "T_eval": T_eval}, g_pade, _phase_label(g_pade)))
529
+
530
+ # Step 2: γ-decomposition
531
+ has_GQA = n_kv_heads < n_attention_heads
532
+ decomp = gamma_decompose(g_pade, has_GQA, has_SWA, n_params)
533
+ g_corr = decomp["gamma_corrected"]
534
+ chain.append(_step(2, "§26.10", "γ-decomposition", "5-axis adjustment",
535
+ {"has_GQA": has_GQA, "has_SWA": has_SWA, "n_params": n_params},
536
+ g_corr))
537
+
538
+ # Step 3: §26.7 D_f window applicability
539
+ df = df_window(g_corr, T_eval, f=0.90)
540
+ df_zone_ok = df is not None
541
+ chain.append(_step(3, "§26.7", "D_f window (γ in [0.65, 0.85])",
542
+ "[(1-f)+fN^(1-γ)]^(1/(1-γ))",
543
+ {"gamma": g_corr, "N": T_eval, "f": 0.9}, df,
544
+ f"D_f = {df}" if df_zone_ok
545
+ else f"NOT applicable (γ={g_corr:.3f} outside [0.65, 0.85])"))
546
+
547
+ # Step 4: §26.8 soft decay régimen
548
+ regime = kv_soft_decay_regime(theta, g_corr, T_train)
549
+ dh = d_horizon(theta, g_corr)
550
+ dh_str = f"{dh:.0f}" if dh is not None else "n/a"
551
+ chain.append(_step(4, "§26.8", "Soft decay régimen", "d_h ≳ T_train/2",
552
+ {"theta": theta, "gamma": g_corr, "T_train": T_train}, regime,
553
+ f"d_horizon={dh_str}; regime: {regime}"))
554
+
555
+ # Step 5: KV cache memory baseline
556
+ kv = kv_cache_memory(n_layers, n_kv_heads, d_head, T_eval)
557
+ chain.append(_step(5, "§19.1", "Baseline KV memory", "2·L·n_kv·d_h·seq·B",
558
+ {"L": n_layers, "n_kv": n_kv_heads, "d_h": d_head, "seq": T_eval},
559
+ kv, f"{kv['GB']:.2f} GB without compression"))
560
+
561
+ # Verdict
562
+ if regime == "applies" and df_zone_ok:
563
+ verdict = "USE SOFT DECAY"
564
+ reason = (f"d_horizon ≳ T_train/2 AND γ in compression zone. "
565
+ f"Soft decay (1-d/d_h)^γ best (-21% PPL vs hard cutoff per F17).")
566
+ mit = "Implement as 4D attention_mask additive bias with eager attention."
567
+ elif df_zone_ok:
568
+ verdict = "USE D_f HARD CUTOFF"
569
+ reason = f"γ in [0.65, 0.85] zone but d_h < T_train/2. Hard truncation at D_f={df} works."
570
+ mit = "Set cache_max_len = D_f."
571
+ elif regime == "applies":
572
+ verdict = "USE SOFT DECAY (caveat)"
573
+ reason = "Régimen applies but γ outside D_f validity zone. Soft decay only."
574
+ mit = "Soft decay; do not use D_f window."
575
+ elif g_corr >= 1 or g_corr <= 0:
576
+ verdict = "USE LITERATURE METHODS"
577
+ reason = f"γ={g_corr:.3f} outside Phase A. Our formulas don't apply."
578
+ mit = "Use SnapKV / PyramidKV / FastGen (literature heuristics)."
579
+ else:
580
+ verdict = "USE HARD T_train CUTOFF"
581
+ reason = "Régimen not met AND γ outside zone. Cap context at T_train."
582
+ mit = f"Set seq_len ≤ {T_train}, no extension."
583
+
584
+ return _wrap("X-19", "KV compression decision", locals(), chain, verdict, reason, mit)
585
+
586
+
587
+ # ════════════════════════════════════════════════════════════════════════════
588
+ # Helpers
589
+ # ════════════════════════════════════════════════════════════════════════════
590
+ def _step(n, sec, name, formula, inputs, result, interpretation=None, breakdown=None):
591
+ s = {"step": n, "section": sec, "name": name, "formula": formula,
592
+ "inputs": inputs, "result": result}
593
+ if interpretation:
594
+ s["interpretation"] = interpretation
595
+ if breakdown:
596
+ s["breakdown"] = breakdown
597
+ return s
598
+
599
+
600
+ def _wrap(rid, rname, locals_dict, chain, verdict, reason, mitigation):
601
+ # Clean inputs (drop chain/internal vars)
602
+ inputs = {k: v for k, v in locals_dict.items()
603
+ if not k.startswith("_") and k not in
604
+ ("chain", "verdict", "reason", "mit", "info", "be", "kv", "g_pade", "g_corr",
605
+ "decomp", "dh", "l_niah", "p_hallu", "cost", "model_GB", "inf", "api",
606
+ "api_blend", "fits", "mem", "emerg", "max_flops", "hours",
607
+ "N_chinchilla", "D_chinchilla", "candidates", "best", "tok_per_s",
608
+ "tok_per_day", "capacity_users", "usd_per_day", "usd_per_Mtok",
609
+ "total_GB", "w_mem", "df", "df_zone_ok", "regime", "has_GQA",
610
+ "margin", "months", "months_to_payoff", "name")}
611
+ return {"recipe_id": rid, "recipe_name": rname, "inputs": inputs,
612
+ "chain": chain, "verdict": verdict, "reason": reason,
613
+ "mitigation": mitigation}
614
+
615
+
616
+ def _phase_label(g):
617
+ if 0 < g < 1:
618
+ return "Phase A (long-range OK)"
619
+ if g >= 1:
620
+ return "Phase B / Hagedorn"
621
+ return "Phase B / catastrophic (negative γ — T too large for θ)"
622
+
623
+
624
+ # ════════════════════════════════════════════════════════════════════════════
625
+ # Recipe registry
626
+ # ════════════════════════════════════════════════════════════════════════════
627
+ RECIPES = {
628
+ "X-1": {
629
+ "name": "Custom Training vs API",
630
+ "description": "Should I train a custom model or use a frontier API for my domain task?",
631
+ "fn": run_recipe_x1,
632
+ "params": ["N_params", "D_tokens", "gpu", "n_gpus", "mfu",
633
+ "api_model", "monthly_tokens_M"],
634
+ "category": "build-vs-buy",
635
+ "uses_sections": ["§17", "§19", "§20", "§24"],
636
+ },
637
+ "X-2": {
638
+ "name": "Long Context Viability",
639
+ "description": "Will model M serve length L doing Needle-in-a-Haystack retrieval?",
640
+ "fn": run_recipe_x2,
641
+ "params": ["theta", "T_train", "T_eval", "n_attention_heads", "n_kv_heads",
642
+ "d_head", "n_layers", "n_params", "has_SWA"],
643
+ "category": "long-context",
644
+ "uses_sections": ["§26", "§19"],
645
+ },
646
+ "X-3": {
647
+ "name": "Budget Pre-flight",
648
+ "description": "Given $ budget, what model is feasible to train?",
649
+ "fn": run_recipe_x3,
650
+ "params": ["USD_budget", "gpu", "mfu", "n_gpus"],
651
+ "category": "training-budget",
652
+ "uses_sections": ["§17", "§20"],
653
+ },
654
+ "X-5": {
655
+ "name": "Hardware Selection",
656
+ "description": "Which GPU should I use to serve my model at target throughput?",
657
+ "fn": run_recipe_x5,
658
+ "params": ["N_params", "T_eval", "n_layers", "n_kv_heads", "d_head",
659
+ "bytes_per_weight", "target_tokens_per_day", "concurrent_users"],
660
+ "category": "serving",
661
+ "uses_sections": ["§19", "§20"],
662
+ },
663
+ "X-19": {
664
+ "name": "KV Compression Decision",
665
+ "description": "Should I use soft decay, D_f cutoff, or literature methods to compress KV?",
666
+ "fn": run_recipe_x19,
667
+ "params": ["theta", "T_train", "T_eval", "n_attention_heads", "n_kv_heads",
668
+ "d_head", "n_layers", "n_params", "has_SWA"],
669
+ "category": "kv-compression",
670
+ "uses_sections": ["§26", "§19"],
671
+ },
672
+ }
673
+
674
+
675
+ def list_recipes() -> str:
676
+ """Return JSON of all recipes for UI dropdown."""
677
+ return json.dumps([
678
+ {"id": rid, "name": r["name"], "description": r["description"],
679
+ "category": r["category"], "params": r["params"],
680
+ "uses_sections": r["uses_sections"]}
681
+ for rid, r in RECIPES.items()
682
+ ])
683
+
684
+
685
+ def run_recipe(recipe_id: str, **params) -> dict:
686
+ """Dispatcher — execute recipe by id with given params."""
687
+ r = RECIPES.get(recipe_id)
688
+ if r is None:
689
+ return {"error": f"unknown recipe '{recipe_id}'",
690
+ "available": list(RECIPES.keys())}
691
+ return r["fn"](**params)
692
+
693
+
694
+ # ════════════════════════════════════════════════════════════════════════════
695
+ # Known model presets
696
+ # ════════════════════════════════════════════════════════════════════════════
697
+ PRESETS = {
698
+ "EleutherAI/pythia-2.8b": {
699
+ "theta": 10000, "T_train": 2048,
700
+ "n_attention_heads": 32, "n_kv_heads": 32,
701
+ "d_head": 80, "n_layers": 32, "n_params": 2.8e9, "has_SWA": False,
702
+ },
703
+ "EleutherAI/pythia-1b": {
704
+ "theta": 10000, "T_train": 2048,
705
+ "n_attention_heads": 8, "n_kv_heads": 8,
706
+ "d_head": 256, "n_layers": 16, "n_params": 1e9, "has_SWA": False,
707
+ },
708
+ "EleutherAI/pythia-1.4b": {
709
+ "theta": 10000, "T_train": 2048,
710
+ "n_attention_heads": 16, "n_kv_heads": 16,
711
+ "d_head": 128, "n_layers": 24, "n_params": 1.4e9, "has_SWA": False,
712
+ },
713
+ "meta-llama/Meta-Llama-3-8B": {
714
+ "theta": 500000, "T_train": 8192,
715
+ "n_attention_heads": 32, "n_kv_heads": 8,
716
+ "d_head": 128, "n_layers": 32, "n_params": 8e9, "has_SWA": False,
717
+ },
718
+ "meta-llama/Llama-3.2-1B": {
719
+ "theta": 500000, "T_train": 131072,
720
+ "n_attention_heads": 32, "n_kv_heads": 8,
721
+ "d_head": 64, "n_layers": 16, "n_params": 1.2e9, "has_SWA": False,
722
+ },
723
+ "meta-llama/Llama-3.3-70B-Instruct": {
724
+ "theta": 500000, "T_train": 131072,
725
+ "n_attention_heads": 64, "n_kv_heads": 8,
726
+ "d_head": 128, "n_layers": 80, "n_params": 70e9, "has_SWA": False,
727
+ },
728
+ "mistralai/Mistral-7B-v0.1": {
729
+ "theta": 10000, "T_train": 8192,
730
+ "n_attention_heads": 32, "n_kv_heads": 8,
731
+ "d_head": 128, "n_layers": 32, "n_params": 7e9, "has_SWA": True,
732
+ },
733
+ "Qwen/Qwen2.5-7B": {
734
+ "theta": 1000000, "T_train": 32768,
735
+ "n_attention_heads": 28, "n_kv_heads": 4,
736
+ "d_head": 128, "n_layers": 28, "n_params": 7.6e9, "has_SWA": False,
737
+ },
738
+ "Qwen/Qwen2.5-1.5B": {
739
+ "theta": 1000000, "T_train": 32768,
740
+ "n_attention_heads": 12, "n_kv_heads": 2,
741
+ "d_head": 128, "n_layers": 28, "n_params": 1.5e9, "has_SWA": False,
742
+ },
743
+ "google/gemma-2-9b-it": {
744
+ "theta": 10000, "T_train": 8192,
745
+ "n_attention_heads": 16, "n_kv_heads": 8,
746
+ "d_head": 256, "n_layers": 42, "n_params": 9e9, "has_SWA": True,
747
+ },
748
+ "microsoft/phi-3-mini-4k-instruct": {
749
+ "theta": 10000, "T_train": 4096,
750
+ "n_attention_heads": 32, "n_kv_heads": 32,
751
+ "d_head": 96, "n_layers": 32, "n_params": 3.8e9, "has_SWA": True,
752
+ },
753
+ }
754
+
755
+
756
+ def list_presets() -> str:
757
+ return json.dumps([
758
+ {"id": k, "label": k.split("/")[-1],
759
+ "theta": v["theta"], "T_train": v["T_train"]}
760
+ for k, v in PRESETS.items()
761
+ ])
762
+
763
+
764
+ def get_preset(model_id: str) -> dict:
765
+ return PRESETS.get(model_id, {})
766
+
767
+
768
+ # Smoke test
769
+ if __name__ == "__main__":
770
+ print("─── X-2 Llama-3-8B @ 32K ───")
771
+ r = run_recipe("X-2", theta=500_000, T_train=8192, T_eval=32_000,
772
+ n_attention_heads=32, n_kv_heads=8, d_head=128,
773
+ n_layers=32, n_params=8e9, has_SWA=False)
774
+ print(f"Verdict: {r['verdict']} — {r['reason']}\n")
775
+
776
+ print("─── X-1 Llama-3-8B vs GPT-4o (10M tok/mo) ───")
777
+ r = run_recipe("X-1", N_params=8e9, monthly_tokens_M=10.0, api_model="GPT-4o")
778
+ print(f"Verdict: {r['verdict']} — {r['reason']}\n")
779
+
780
+ print("─── X-3 budget $5K ───")
781
+ r = run_recipe("X-3", USD_budget=5000.0, gpu="H100 SXM", n_gpus=1)
782
+ print(f"Verdict: {r['verdict']} — {r['reason']}\n")
783
+
784
+ print("─── X-5 serve Llama-3-8B at 4K ───")
785
+ r = run_recipe("X-5", N_params=8e9, T_eval=4096, n_layers=32, n_kv_heads=8, d_head=128,
786
+ target_tokens_per_day=10e6, concurrent_users=1)
787
+ print(f"Verdict: {r['verdict']} — {r['reason']}\n")
788
+
789
+ print("─── X-19 KV compression for Llama-3-8B ───")
790
+ r = run_recipe("X-19", theta=500_000, T_train=8192, T_eval=8192,
791
+ n_attention_heads=32, n_kv_heads=8, d_head=128,
792
+ n_layers=32, n_params=8e9)
793
+ print(f"Verdict: {r['verdict']} — {r['reason']}\n")
style.css ADDED
@@ -0,0 +1,173 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ /* TAF Agent — minimal clean styling */
2
+ :root {
3
+ --bg: #0a0e14;
4
+ --bg-card: #12181f;
5
+ --bg-input: #1a2028;
6
+ --fg: #c9d1d9;
7
+ --fg-dim: #8b949e;
8
+ --accent: #58a6ff;
9
+ --accent-dim: #1f6feb;
10
+ --success: #3fb950;
11
+ --warning: #d29922;
12
+ --danger: #f85149;
13
+ --border: #30363d;
14
+ }
15
+
16
+ * { box-sizing: border-box; }
17
+
18
+ body {
19
+ font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, Oxygen,
20
+ Ubuntu, sans-serif;
21
+ background: var(--bg);
22
+ color: var(--fg);
23
+ margin: 0;
24
+ padding: 0;
25
+ line-height: 1.6;
26
+ }
27
+
28
+ header {
29
+ text-align: center;
30
+ padding: 2rem 1rem 1rem;
31
+ border-bottom: 1px solid var(--border);
32
+ }
33
+ header h1 { margin: 0 0 0.5rem 0; font-size: 2rem; }
34
+ .tagline { font-size: 1.1rem; margin: 0 0 0.5rem; }
35
+ .subtle { color: var(--fg-dim); font-size: 0.9rem; }
36
+
37
+ main {
38
+ max-width: 980px;
39
+ margin: 0 auto;
40
+ padding: 1.5rem;
41
+ }
42
+
43
+ section {
44
+ background: var(--bg-card);
45
+ border: 1px solid var(--border);
46
+ border-radius: 8px;
47
+ padding: 1.25rem 1.5rem;
48
+ margin-bottom: 1.25rem;
49
+ }
50
+
51
+ h2 { margin-top: 0; font-size: 1.2rem; color: var(--accent); }
52
+
53
+ #status-bar { padding: 0.75rem 1.25rem; }
54
+ #status { font-family: monospace; }
55
+
56
+ .recipe-desc { color: var(--fg-dim); margin: 0.5rem 0 0 0; }
57
+
58
+ .form-row { display: flex; gap: 1rem; margin-bottom: 1rem; align-items: center; }
59
+ .form-row label { min-width: 120px; }
60
+
61
+ .form-grid {
62
+ display: grid;
63
+ grid-template-columns: repeat(auto-fill, minmax(220px, 1fr));
64
+ gap: 0.75rem;
65
+ margin-bottom: 1rem;
66
+ }
67
+ .form-field { display: flex; flex-direction: column; }
68
+ .form-field label { font-size: 0.85rem; color: var(--fg-dim); margin-bottom: 0.25rem; }
69
+
70
+ input, select {
71
+ background: var(--bg-input);
72
+ color: var(--fg);
73
+ border: 1px solid var(--border);
74
+ border-radius: 4px;
75
+ padding: 0.4rem 0.6rem;
76
+ font-family: monospace;
77
+ font-size: 0.95rem;
78
+ }
79
+ input:focus, select:focus { outline: 1px solid var(--accent); border-color: var(--accent); }
80
+
81
+ button {
82
+ background: var(--accent-dim);
83
+ color: white;
84
+ border: none;
85
+ padding: 0.6rem 1.2rem;
86
+ font-size: 1rem;
87
+ font-weight: 600;
88
+ border-radius: 6px;
89
+ cursor: pointer;
90
+ transition: background 0.2s;
91
+ }
92
+ button:hover:not(:disabled) { background: var(--accent); }
93
+ button:disabled { background: #444; cursor: not-allowed; }
94
+
95
+ #verdict-box {
96
+ font-size: 1.05rem;
97
+ padding: 1rem;
98
+ border-radius: 6px;
99
+ border-left: 4px solid;
100
+ }
101
+ .verdict-yes { border-color: var(--success); background: rgba(63, 185, 80, 0.08); }
102
+ .verdict-no { border-color: var(--danger); background: rgba(248, 81, 73, 0.08); }
103
+ .verdict-degraded { border-color: var(--warning); background: rgba(210, 153, 34, 0.08); }
104
+
105
+ .chain-step {
106
+ background: var(--bg-input);
107
+ border: 1px solid var(--border);
108
+ border-radius: 6px;
109
+ padding: 0.75rem 1rem;
110
+ margin-bottom: 0.5rem;
111
+ }
112
+ .chain-step summary {
113
+ display: flex;
114
+ justify-content: space-between;
115
+ font-weight: 600;
116
+ cursor: pointer;
117
+ list-style: none;
118
+ }
119
+ .chain-step summary::before { content: "▸ "; color: var(--accent); }
120
+ .chain-step[open] summary::before { content: "▾ "; }
121
+ .step-section { color: var(--accent); font-family: monospace; font-size: 0.9rem; }
122
+ .step-formula { color: var(--fg-dim); font-family: monospace; font-size: 0.85rem; margin: 0.5rem 0; }
123
+ .step-result { color: var(--success); font-family: monospace; font-weight: 600; margin-top: 0.25rem; }
124
+ .step-interp { color: var(--fg-dim); font-size: 0.9rem; margin-top: 0.25rem; }
125
+ .step-result pre { background: var(--bg); padding: 0.5rem; border-radius: 4px; overflow-x: auto; }
126
+
127
+ .recipe-tag {
128
+ background: var(--bg-input);
129
+ color: var(--accent);
130
+ font-family: monospace;
131
+ font-size: 0.85rem;
132
+ padding: 0.2rem 0.5rem;
133
+ border-radius: 4px;
134
+ }
135
+
136
+ .mode-tabs { display: flex; gap: 0.5rem; margin-bottom: 0.75rem; flex-wrap: wrap; }
137
+ .mode-btn {
138
+ background: var(--bg-input); color: var(--fg-dim);
139
+ border: 1px solid var(--border); border-radius: 6px;
140
+ padding: 0.5rem 1rem; cursor: pointer; font-size: 0.95rem;
141
+ }
142
+ .mode-btn.active { background: var(--accent-dim); color: white; border-color: var(--accent); }
143
+ button.secondary {
144
+ background: var(--bg-input); color: var(--fg);
145
+ border: 1px solid var(--border); padding: 0.4rem 0.8rem;
146
+ }
147
+ button.secondary:hover:not(:disabled) { border-color: var(--accent); }
148
+
149
+ textarea {
150
+ width: 100%; min-height: 60px;
151
+ background: var(--bg-input); color: var(--fg);
152
+ border: 1px solid var(--border); border-radius: 4px;
153
+ padding: 0.5rem; font-family: inherit; font-size: 0.95rem; resize: vertical;
154
+ }
155
+ textarea:focus { outline: 1px solid var(--accent); border-color: var(--accent); }
156
+
157
+ @media (max-width: 600px) {
158
+ .form-grid { grid-template-columns: 1fr; }
159
+ main { padding: 0.75rem; }
160
+ .form-row { flex-direction: column; align-items: stretch; }
161
+ .form-row label { min-width: auto; }
162
+ }
163
+
164
+ footer {
165
+ text-align: center;
166
+ padding: 1.5rem;
167
+ color: var(--fg-dim);
168
+ font-size: 0.85rem;
169
+ border-top: 1px solid var(--border);
170
+ margin-top: 2rem;
171
+ }
172
+ footer a { color: var(--accent); text-decoration: none; }
173
+ footer a:hover { text-decoration: underline; }