Spaces:

karlexmarin
/

taf-agent

Running

karlexmarin Claude Opus 4.7 (1M context) commited on 17 days ago

Commit

6ab0441

0 Parent(s):

feat: TAF Agent v0.1 — client-side transformer diagnostic

A Pyodide+WebLLM browser app that predicts transformer LLM viability
(long-context, training budget, hardware fit, KV compression, custom-vs-API)
using the TAF (Thermodynamic Attention Framework) formula chains from
Marin 2026.

Phase 1: Pyodide loads taf_browser.py (10 formulas, 11 model presets,
11 GPU catalog, deterministic Python, no server)
Phase 2: WebLLM loads Llama-3.2-1B in browser → plain-English synthesis
Phase 3: Free-form question router (LLM picks recipe + extracts params)

Recipes (5):
X-1 Custom training vs API
X-2 Long Context Viability
X-3 Budget Pre-flight
X-5 Hardware Selection for serving
X-19 KV Compression decision

UI: 2 modes (Ask plain-English / Recipe + form), HF Hub config fetch
for any public model, audit-trail expandable steps, mobile-responsive.

Hosting: GitHub Pages (static); compute: user's browser; cost: \$0/mo.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Files changed (8) hide show

.github/workflows/deploy.yml +29 -0
.gitignore +23 -0
LICENSE +17 -0
README.md +101 -0
index.html +108 -0
js/main.js +540 -0
python/taf_browser.py +793 -0
style.css +173 -0

.github/workflows/deploy.yml ADDED Viewed

	@@ -0,0 +1,29 @@

+name: Deploy to GitHub Pages
+on:
+  push:
+    branches: [main]
+  workflow_dispatch:
+permissions:
+  contents: read
+  pages: write
+  id-token: write
+concurrency:
+  group: pages
+  cancel-in-progress: false
+jobs:
+  deploy:
+    environment:
+      name: github-pages
+      url: ${{ steps.deployment.outputs.page_url }}
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/configure-pages@v4
+      - uses: actions/upload-pages-artifact@v3
+        with:
+          path: '.'
+      - id: deployment
+        uses: actions/deploy-pages@v4

.gitignore ADDED Viewed

	@@ -0,0 +1,23 @@

+# Python
+__pycache__/
+*.pyc
+*.pyo
+*.egg-info/
+.venv/
+venv/
+# Editors
+.vscode/
+.idea/
+*.swp
+.DS_Store
+# Build artefacts
+dist/
+build/
+node_modules/
+*.log
+# Local sandbox
+local/
+.cache/

LICENSE ADDED Viewed

	@@ -0,0 +1,17 @@

+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+   Copyright 2026 Carles Marin
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+       http://www.apache.org/licenses/LICENSE-2.0
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.

README.md ADDED Viewed

	@@ -0,0 +1,101 @@

+# 🔬 TAF Agent
+> **Transformer LLM diagnostic in your browser.** Free. Unlimited. Auditable.
+Drop in a model config (or paste any HuggingFace model id), get a falsifiable answer to *"will it work?"* — backed by the Thermodynamic Attention Framework (TAF) formulas.
+**🌐 Live demo**: https://transformerkmarin.github.io/tafagent  *(once GitHub Pages is enabled)*
+---
+## What it does
+Answers practical viability questions for transformer LLMs, with **zero servers**:
+- *Will Llama-3-8B serve 32K context with NIAH retrieval?*  →  **X-2**
+- *Should I train a custom 7B model or use GPT-4 API?*  →  **X-1**
+- *I have $5K — what model can I afford to train?*  →  **X-3**
+- *Cheapest GPU to serve Llama-70B at 100M tokens/day?*  →  **X-5**
+- *Should I use soft KV decay or hard cutoff for compression?*  →  **X-19**
+…each as a chain of TAF formulas (paper §17, §19, §20, §24, §26) rendered with full audit trail.
+## Two modes
+- **💬 Ask in plain English**  →  in-browser LLM picks the right recipe and runs it
+- **📋 Recipe + form**  →  manual selection, full control over every parameter
+## How it's free + unlimited
+- Static HTML/JS hosted on **GitHub Pages** (truly unlimited bandwidth)
+- Python TAF computation runs in your browser via **Pyodide** (no server)
+- Plain-English synthesis runs **Llama-3.2-1B-Instruct** in your browser via **WebLLM** (your GPU)
+- Model weights cached in IndexedDB after first load (~700MB, one-time)
+- **Your data never leaves your browser**
+## Architecture
+```
+GitHub Pages (HTML/JS)
+      ↓ (one-time download)
+Your browser:
+  ├─ Pyodide  → Python TAF formulas (CPU, instant)
+  └─ WebLLM   → Llama-3.2-1B (GPU/CPU, deterministic-ish)
+```
+## How to add new models
+1. **Preset list** — 11 popular models curated, instant autofill
+2. **HF Hub fetch** — paste any model id (`Qwen/Qwen2.5-32B`, `meta-llama/Llama-3.3-70B-Instruct`, ...) → browser fetches `config.json` → autofill form
+3. **Manual** — fill the form fields directly
+Works for any public RoPE / GQA / MHA / SWA / ALiBi / AbsPE model. Gated models (Llama family) require accepting the licence on HF first.
+## Status
+- ✅ **Phase 1**: Pyodide + TAF formulas
+- ✅ **Phase 2**: WebLLM synthesis (plain-English answer)
+- ✅ **Phase 3**: Free-form question router (NLU → recipe selection)
+- ✅ **5 recipes**: X-1, X-2, X-3, X-5, X-19
+- 🚧 Phase 4: 15 more recipes (X-4, X-6...X-20) + advanced UI
+## Local development
+```bash
+git clone https://github.com/karlesmarin/tafagent
+cd tafagent
+python -m http.server 8000
+# open http://localhost:8000
+```
+## Browser requirements
+- Chrome / Edge / Firefox 113+ for WebGPU acceleration (recommended)
+- Older browsers fall back to CPU inference (slower but works)
+- ~2 GB free RAM for Llama-3.2-1B
+- ~700 MB disk for model cache (one-time)
+## Citation
+If you use this tool, please cite the underlying paper:
+```bibtex
+@article{marin2026transformer_thermodynamics,
+  author  = {Marin, Carles},
+  title   = {Transformer Thermodynamics: A Closed-Form Theory of Attention Decay,
+             Phase Transitions, and Context-Length Limits in RoPE Language Models},
+  year    = {2026},
+}
+```
+## License
+Apache-2.0 (this code). Llama-3.2-1B distributed under the [Meta Llama 3.2 license](https://www.llama.com/llama3_2/license/).
+---
+**Acknowledgements**: this tool would not exist without the open-weights commons
+(Meta, Mistral, Qwen, EleutherAI, AI2 and many more), the Pyodide + WebLLM
+projects, GitHub Pages free hosting, and the wider ML community keeping all
+the tooling honest and accessible. Full list in the
+[paper Acknowledgements](https://github.com/karlesmarin/NeurIPS).

index.html ADDED Viewed

	@@ -0,0 +1,108 @@

+<!DOCTYPE html>
+<html lang="en">
+<head>
+  <meta charset="UTF-8" />
+  <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+  <title>TAF Agent — Transformer Diagnostic in your Browser</title>
+  <meta name="description" content="Predict transformer LLM behaviour from config alone. Free, unlimited, runs entirely in your browser." />
+  <link rel="stylesheet" href="style.css" />
+  <script src="https://cdn.jsdelivr.net/pyodide/v0.26.4/full/pyodide.js"></script>
+</head>
+<body>
+  <header>
+    <h1>🔬 TAF Agent</h1>
+    <p class="tagline">
+      Transformer diagnostic in your browser. <strong>Free. Unlimited. Auditable.</strong>
+    </p>
+    <p class="subtle">
+      All computation happens locally — your data never leaves this page.
+    </p>
+  </header>
+  <main>
+    <!-- Status -->
+    <section id="status-bar"><div id="status">⏳ Loading Python runtime...</div></section>
+    <!-- Mode toggle -->
+    <section id="mode-section">
+      <h2>🎯 Mode</h2>
+      <div class="mode-tabs">
+        <button class="mode-btn active" data-mode="ask">💬 Ask in plain English</button>
+        <button class="mode-btn" data-mode="recipe">📋 Pick recipe + fill form</button>
+      </div>
+      <p id="mode-desc" class="recipe-desc">
+        Type a free-form question (e.g. "Will Llama-3-8B work at 32K context?"). The
+        in-browser LLM picks the right recipe and runs it.
+      </p>
+    </section>
+    <!-- Free-form question (mode=ask) -->
+    <section id="ask-section">
+      <h2>❓ Your question</h2>
+      <textarea id="question" rows="3" placeholder="e.g. Will Mistral-7B handle 16K NIAH retrieval? Or: I have $5,000, what model can I train? Or: Cheapest GPU to serve Llama-70B at 100M tokens/day?"></textarea>
+      <div style="display:flex; gap:0.5rem; margin-top:0.5rem; flex-wrap:wrap;">
+        <button id="ask-btn" disabled>🚀 Analyze</button>
+        <button id="example-btn" type="button" class="secondary">💡 Try an example</button>
+      </div>
+    </section>
+    <!-- Recipe selector (mode=recipe) -->
+    <section id="recipe-section" style="display:none;">
+      <h2>📋 Recipe</h2>
+      <select id="recipe-select" disabled>
+        <option value="">— select a recipe —</option>
+      </select>
+      <p id="recipe-desc-display" class="recipe-desc"></p>
+    </section>
+    <!-- Form (mode=recipe) -->
+    <section id="form-section" style="display:none;">
+      <h2>🎯 Inputs</h2>
+      <div class="form-row">
+        <label for="preset">Preset model:</label>
+        <select id="preset" disabled>
+          <option value="">— select to autofill —</option>
+        </select>
+      </div>
+      <div class="form-row">
+        <label for="hf-id">Or any HF model:</label>
+        <input type="text" id="hf-id" placeholder="e.g. Qwen/Qwen2.5-32B-Instruct" style="flex:1;" />
+        <button id="hf-fetch-btn" type="button" class="secondary">📥 Fetch</button>
+      </div>
+      <div id="hf-status" class="subtle" style="margin: -0.5rem 0 1rem; min-height:1.2em;"></div>
+      <!-- Dynamic form fields based on recipe -->
+      <div id="dynamic-form" class="form-grid"></div>
+      <button id="run-btn" disabled>🚀 Analyze</button>
+    </section>
+    <!-- Output -->
+    <section id="output-section" style="display:none;">
+      <h2>📊 Verdict</h2>
+      <div id="verdict-box"></div>
+      <h2>🔍 Computation Chain</h2>
+      <p class="subtle">Every number below is deterministic Python. Click a step to expand.</p>
+      <div id="chain-box"></div>
+      <h2 id="answer-header" style="display:none;">💬 Plain-English Answer</h2>
+      <div id="answer-box" style="display:none;"></div>
+    </section>
+  </main>
+  <footer>
+    <p>
+      © 2026 Carles Marin · Apache-2.0 ·
+      <a href="https://github.com/karlesmarin/tafagent" target="_blank">Source on GitHub</a>
+    </p>
+    <p class="subtle">
+      Computation: Pyodide (Python in browser) · Synthesis: WebLLM (Llama-3.2-1B local) · Hosting: GitHub Pages
+    </p>
+  </footer>
+  <script type="module" src="js/main.js"></script>
+</body>
+</html>

js/main.js ADDED Viewed

	@@ -0,0 +1,540 @@

+// TAF Agent — main orchestration (Phases 1-3 complete)
+//
+// Phases:
+//  1. Pyodide loads + TAF formulas      → deterministic computation
+//  2. WebLLM loads on demand            → plain-English synthesis
+//  3. Router (LLM)                      → free-form question → recipe + params
+const TAF_BROWSER_URL = "python/taf_browser.py";
+const ENABLE_WEBLLM = true;
+const WEBLLM_MODEL = "Llama-3.2-1B-Instruct-q4f32_1-MLC";
+const $ = (id) => document.getElementById(id);
+const state = {
+  pyodide: null,
+  webllm: null,
+  presets: [],
+  recipes: [],
+  recipesById: {},
+  currentMode: "ask",
+  currentRecipe: null,
+};
+const EXAMPLES = [
+  "Will Meta-Llama-3-8B handle 32000-token NIAH retrieval reliably?",
+  "I have $5000 to spend on training. What model can I afford?",
+  "Should I use Mistral-7B-v0.1 at 16K context or extend it first?",
+  "Compare cheapest GPU to serve Llama-3-8B at 10 million tokens per day.",
+  "Should I use soft KV decay or hard cutoff for Qwen2.5-7B at 32K?",
+  "Is it cheaper to train an 8B custom model or use GPT-4o for 50M tokens/month?",
+];
+// ════════════════════════════════════════════════════════════════════
+// Bootstrap
+// ════════════════════════════════════════════════════════════════════
+async function loadPyodideAndTaf() {
+  setStatus("⏳ Loading Pyodide (Python runtime ~10MB)...");
+  state.pyodide = await loadPyodide({
+    indexURL: "https://cdn.jsdelivr.net/pyodide/v0.26.4/full/",
+  });
+  setStatus("⏳ Loading TAF formulas + recipes...");
+  const tafCode = await fetch(TAF_BROWSER_URL).then(r => r.text());
+  await state.pyodide.runPythonAsync(tafCode);
+  state.presets = JSON.parse(state.pyodide.runPython("list_presets()"));
+  state.recipes = JSON.parse(state.pyodide.runPython("list_recipes()"));
+  state.recipesById = Object.fromEntries(state.recipes.map(r => [r.id, r]));
+  populatePresets();
+  populateRecipes();
+  enableUI();
+  setStatus("✅ Ready. Ask a question or pick a recipe.");
+}
+function populatePresets() {
+  const sel = $("preset");
+  sel.innerHTML = '<option value="">— select to autofill —</option>';
+  state.presets.forEach(p => {
+    const opt = document.createElement("option");
+    opt.value = p.id;
+    opt.textContent = `${p.label}  (θ=${p.theta.toLocaleString()}, T_train=${p.T_train})`;
+    sel.appendChild(opt);
+  });
+}
+function populateRecipes() {
+  const sel = $("recipe-select");
+  sel.innerHTML = '<option value="">— select a recipe —</option>';
+  state.recipes.forEach(r => {
+    const opt = document.createElement("option");
+    opt.value = r.id;
+    opt.textContent = `${r.id} — ${r.name}`;
+    sel.appendChild(opt);
+  });
+}
+function enableUI() {
+  $("ask-btn").disabled = false;
+  $("recipe-select").disabled = false;
+  $("preset").disabled = false;
+}
+function setStatus(msg) { $("status").textContent = msg; }
+// ════════════════════════════════════════════════════════════════════
+// Mode toggle
+// ════════════════════════════════════════════════════════════════════
+document.querySelectorAll(".mode-btn").forEach(btn => {
+  btn.addEventListener("click", () => {
+    document.querySelectorAll(".mode-btn").forEach(b => b.classList.remove("active"));
+    btn.classList.add("active");
+    const mode = btn.dataset.mode;
+    state.currentMode = mode;
+    if (mode === "ask") {
+      $("ask-section").style.display = "";
+      $("recipe-section").style.display = "none";
+      $("form-section").style.display = "none";
+      $("mode-desc").textContent =
+        "Type a free-form question. The in-browser LLM picks the right recipe and runs it.";
+    } else {
+      $("ask-section").style.display = "none";
+      $("recipe-section").style.display = "";
+      $("mode-desc").textContent =
+        "Pick a recipe directly and fill the form. Same result as Ask mode but fully manual.";
+    }
+  });
+});
+// ════════════════════════════════════════════════════════════════════
+// Recipe selector
+// ════════════════════════════════════════════════════════════════════
+$("recipe-select").addEventListener("change", (e) => {
+  const rid = e.target.value;
+  if (!rid) {
+    $("form-section").style.display = "none";
+    return;
+  }
+  const r = state.recipesById[rid];
+  state.currentRecipe = r;
+  $("recipe-desc-display").textContent = r.description;
+  $("form-section").style.display = "";
+  buildDynamicForm(r);
+});
+function buildDynamicForm(recipe) {
+  const container = $("dynamic-form");
+  container.innerHTML = "";
+  const defaults = getRecipeDefaults(recipe.id);
+  recipe.params.forEach(name => {
+    const div = document.createElement("div");
+    div.className = "form-field";
+    const label = document.createElement("label");
+    label.textContent = paramLabel(name);
+    label.htmlFor = `param_${name}`;
+    const input = document.createElement("input");
+    input.type = "text";
+    input.id = `param_${name}`;
+    input.dataset.param = name;
+    input.value = defaults[name] !== undefined ? String(defaults[name]) : "";
+    div.appendChild(label);
+    div.appendChild(input);
+    container.appendChild(div);
+  });
+  $("run-btn").disabled = false;
+}
+function paramLabel(name) {
+  const labels = {
+    theta: "θ (rope_theta)", T_train: "T_train", T_eval: "T_eval (target context)",
+    n_attention_heads: "num_attention_heads", n_kv_heads: "num_key_value_heads",
+    d_head: "head_dim", n_layers: "num_hidden_layers", n_params: "n_params (e.g. 8e9)",
+    has_SWA: "Has SWA? (true/false)",
+    N_params: "N_params (e.g. 8e9)", D_tokens: "D_tokens (or empty for Chinchilla)",
+    gpu: "GPU", n_gpus: "n_gpus", mfu: "MFU (default 0.45)",
+    api_model: "API model to compare", monthly_tokens_M: "Monthly tokens (M)",
+    USD_budget: "USD budget", bytes_per_weight: "Bytes per weight (BF16=2)",
+    target_tokens_per_day: "Target tokens/day", concurrent_users: "Concurrent users",
+  };
+  return labels[name] || name;
+}
+function getRecipeDefaults(recipeId) {
+  const D = {
+    "X-1": { N_params: "8e9", D_tokens: "", gpu: "H100 SXM", n_gpus: 8, mfu: 0.45,
+             api_model: "GPT-4o", monthly_tokens_M: 10.0 },
+    "X-2": { theta: 500000, T_train: 8192, T_eval: 32000,
+             n_attention_heads: 32, n_kv_heads: 8, d_head: 128,
+             n_layers: 32, n_params: "8e9", has_SWA: false },
+    "X-3": { USD_budget: 5000, gpu: "H100 SXM", mfu: 0.45, n_gpus: 1 },
+    "X-5": { N_params: "8e9", T_eval: 4096, n_layers: 32, n_kv_heads: 8, d_head: 128,
+             bytes_per_weight: 2.0, target_tokens_per_day: 10000000, concurrent_users: 1 },
+    "X-19": { theta: 500000, T_train: 8192, T_eval: 8192,
+              n_attention_heads: 32, n_kv_heads: 8, d_head: 128,
+              n_layers: 32, n_params: "8e9", has_SWA: false },
+  };
+  return D[recipeId] || {};
+}
+// ════════════════════════════════════════════════════════════════════
+// Preset autofill (works in recipe mode)
+// ════════════════════════════════════════════════════════════════════
+$("preset").addEventListener("change", (e) => {
+  if (!e.target.value) return;
+  const proxy = state.pyodide.runPython(`get_preset(${JSON.stringify(e.target.value)})`);
+  const preset = proxy.toJs ? proxy.toJs({ dict_converter: Object.fromEntries }) : proxy;
+  if (!preset || Object.keys(preset).length === 0) return;
+  fillRecipeForm(preset);
+});
+function fillRecipeForm(p) {
+  // Fill any matching field in dynamic form
+  Object.entries(p).forEach(([k, v]) => {
+    const map = {
+      theta: "theta", T_train: "T_train",
+      n_attention_heads: "n_attention_heads", n_kv_heads: "n_kv_heads",
+      d_head: "d_head", n_layers: "n_layers", n_params: "n_params",
+      has_SWA: "has_SWA",
+    };
+    const formId = "param_" + (map[k] || k);
+    const el = $(formId);
+    if (el) el.value = (typeof v === "number" && (k === "n_params" || v > 1e6))
+      ? v.toExponential(2) : String(v);
+    // Also fill N_params for cost recipes
+    if (k === "n_params") {
+      const np = $("param_N_params");
+      if (np) np.value = (typeof v === "number" ? v.toExponential(2) : String(v));
+    }
+  });
+}
+// ════════════════════════════════════════════════════════════════════
+// HF Hub fetch (any model)
+// ════════════════════════════════════════════════════════════════════
+$("hf-fetch-btn").addEventListener("click", async () => {
+  const modelId = $("hf-id").value.trim();
+  if (!modelId) {
+    $("hf-status").textContent = "⚠ Enter a model id like 'Qwen/Qwen2.5-32B-Instruct'";
+    return;
+  }
+  $("hf-status").textContent = `⏳ Fetching config.json from HF Hub for ${modelId}...`;
+  $("hf-fetch-btn").disabled = true;
+  try {
+    const url = `https://huggingface.co/${modelId}/raw/main/config.json`;
+    const resp = await fetch(url);
+    if (!resp.ok) {
+      if (resp.status === 401 || resp.status === 403) {
+        throw new Error(`Model is gated (${resp.status}). Accept license on HF Hub first, or fill manually.`);
+      }
+      throw new Error(`HTTP ${resp.status} — config.json not found`);
+    }
+    const cfg = await resp.json();
+    const preset = configToPreset(cfg, modelId);
+    fillRecipeForm(preset);
+    $("hf-status").innerHTML = `✅ Config loaded for <strong>${modelId}</strong> (family: ${preset._family}). Verify values, click Analyze.`;
+  } catch (err) {
+    $("hf-status").textContent = `❌ ${err.message}`;
+  } finally {
+    $("hf-fetch-btn").disabled = false;
+  }
+});
+function configToPreset(cfg, modelId) {
+  const n_attn = cfg.num_attention_heads || cfg.n_head || 0;
+  const n_kv = cfg.num_key_value_heads || cfg.num_attention_heads || cfg.n_head || 0;
+  const hidden = cfg.hidden_size || cfg.d_model || cfg.n_embd || 0;
+  const d_head = cfg.head_dim || (n_attn > 0 ? Math.floor(hidden / n_attn) : 0);
+  const theta = cfg.rope_theta || cfg.rotary_emb_base ||
+                (cfg.alibi ? null : (cfg.position_embedding_type === "absolute" ? null : 10000));
+  const T_train = cfg.max_position_embeddings || cfg.max_sequence_length ||
+                  cfg.n_positions || cfg.n_ctx || 0;
+  const n_layers = cfg.num_hidden_layers || cfg.n_layer || 0;
+  const has_SWA = !!(cfg.sliding_window || cfg.use_sliding_window);
+  let family = "rope-mha";
+  if (cfg.alibi) family = "alibi";
+  else if (cfg.model_type === "mamba" || cfg.model_type === "mamba2") family = "ssm";
+  else if (theta == null) family = "abspe";
+  else if (n_kv < n_attn) family = "rope-gqa";
+  const n_params_est = estimateParams(cfg);
+  return {
+    theta: theta || 10000, T_train: T_train || 2048,
+    n_attention_heads: n_attn, n_kv_heads: n_kv, d_head: d_head,
+    n_layers: n_layers, n_params: n_params_est, has_SWA: has_SWA,
+    _family: family, _model_id: modelId,
+  };
+}
+function estimateParams(cfg) {
+  const h = cfg.hidden_size || cfg.d_model || 0;
+  const L = cfg.num_hidden_layers || cfg.n_layer || 0;
+  const V = cfg.vocab_size || 32000;
+  return Math.round(12 * h * h * L + 2 * V * h);
+}
+// ════════════════════════════════════════════════════════════════════
+// Run recipe (manual mode)
+// ════════════════════════════════════════════════════════════════════
+$("run-btn").addEventListener("click", async () => {
+  if (!state.currentRecipe) {
+    alert("Select a recipe first.");
+    return;
+  }
+  const rid = state.currentRecipe.id;
+  const params = collectParams(state.currentRecipe.params);
+  await runAndDisplay(rid, params);
+});
+function collectParams(paramNames) {
+  const p = {};
+  paramNames.forEach(name => {
+    const el = $("param_" + name);
+    if (!el || el.value === "") return;
+    let v = el.value;
+    if (v === "true" || v === "false") {
+      p[name] = (v === "true");
+    } else if (!isNaN(parseFloat(v)) && isFinite(v)) {
+      p[name] = parseFloat(v);
+    } else {
+      p[name] = v;
+    }
+  });
+  return p;
+}
+// ════════════════════════════════════════════════════════════════════
+// Ask mode (free-form question via router)
+// ════════════════════════════════════════════════════════════════════
+$("ask-btn").addEventListener("click", async () => {
+  const q = $("question").value.trim();
+  if (!q) {
+    alert("Please type a question.");
+    return;
+  }
+  $("ask-btn").disabled = true;
+  setStatus("🤔 Asking the in-browser LLM to pick a recipe...");
+  try {
+    const route = await routeQuestion(q);
+    setStatus(`📋 Selected recipe ${route.recipe_id}. Running...`);
+    await runAndDisplay(route.recipe_id, route.params, q);
+  } catch (err) {
+    setStatus(`❌ Routing failed: ${err.message}`);
+    $("output-section").style.display = "block";
+    $("verdict-box").className = "verdict-no";
+    $("verdict-box").innerHTML = `<strong>Could not route question.</strong><br>${escapeHtml(err.message)}<br><br>Try the Recipe mode for full manual control.`;
+  } finally {
+    $("ask-btn").disabled = false;
+  }
+});
+$("example-btn").addEventListener("click", () => {
+  const ex = EXAMPLES[Math.floor(Math.random() * EXAMPLES.length)];
+  $("question").value = ex;
+});
+async function routeQuestion(question) {
+  const engine = await loadWebLLM();
+  const recipesDesc = state.recipes.map(r =>
+    `  ${r.id}: ${r.name} — ${r.description}\n    params: ${r.params.join(", ")}`
+  ).join("\n");
+  const systemPrompt = `You are a routing function. Given a user's free-form question
+about transformer LLM viability, you MUST output a single JSON object with two fields:
+  - recipe_id: one of [${state.recipes.map(r => r.id).join(", ")}]
+  - params: an object with parameter values inferred from the question
+Available recipes:
+${recipesDesc}
+Common model facts you may use:
+  Meta-Llama-3-8B: theta=500000, T_train=8192, n_attention_heads=32, n_kv_heads=8, d_head=128, n_layers=32, n_params=8e9
+  Mistral-7B-v0.1: theta=10000, T_train=8192, n_attention_heads=32, n_kv_heads=8, d_head=128, n_layers=32, n_params=7e9, has_SWA=true
+  Qwen2.5-7B: theta=1000000, T_train=32768, n_attention_heads=28, n_kv_heads=4, d_head=128, n_layers=28, n_params=7.6e9
+  Llama-3.3-70B-Instruct: theta=500000, T_train=131072, n_attention_heads=64, n_kv_heads=8, d_head=128, n_layers=80, n_params=70e9
+Respond with ONLY the JSON object. No prose, no markdown fences, no explanation.`;
+  const reply = await engine.chat.completions.create({
+    messages: [
+      { role: "system", content: systemPrompt },
+      { role: "user", content: question },
+    ],
+    max_tokens: 400,
+    temperature: 0.0,
+    response_format: { type: "json_object" },
+  });
+  const raw = reply.choices[0].message.content.trim();
+  let parsed;
+  try {
+    parsed = JSON.parse(raw);
+  } catch (e) {
+    // Try extracting JSON from markdown fences
+    const m = raw.match(/\{[\s\S]*\}/);
+    if (!m) throw new Error(`LLM returned non-JSON: ${raw.slice(0, 200)}`);
+    parsed = JSON.parse(m[0]);
+  }
+  if (!parsed.recipe_id || !state.recipesById[parsed.recipe_id]) {
+    throw new Error(`Unknown recipe: ${parsed.recipe_id}`);
+  }
+  return parsed;
+}
+// ════════════════════════════════════════════════════════════════════
+// Run + display + synthesize
+// ════════════════════════════════════════════════════════════════════
+async function runAndDisplay(recipeId, params, originalQuestion=null) {
+  setStatus("🧮 Computing TAF chain...");
+  state.pyodide.globals.set("__rid", recipeId);
+  state.pyodide.globals.set("__params", state.pyodide.toPy(params));
+  const resultJSON = state.pyodide.runPython(`
+import json
+result = run_recipe(__rid, **__params)
+json.dumps(result)
+`);
+  const result = JSON.parse(resultJSON);
+  result._original_question = originalQuestion;
+  renderResult(result);
+  $("output-section").style.display = "block";
+  setStatus("✅ Done. Numbers below.");
+  if (ENABLE_WEBLLM) {
+    await synthesizeAnswer(result);
+  }
+}
+function renderResult(r) {
+  if (r.error) {
+    $("verdict-box").className = "verdict-no";
+    $("verdict-box").innerHTML = `<strong>Error</strong>: ${escapeHtml(r.error)}`;
+    $("chain-box").innerHTML = "";
+    return;
+  }
+  const vBox = $("verdict-box");
+  let vClass = "";
+  if (r.verdict.startsWith("YES") || r.verdict === "GO") vClass = "verdict-yes";
+  else if (r.verdict.startsWith("NO")) vClass = "verdict-no";
+  else vClass = "verdict-degraded";
+  vBox.className = vClass;
+  vBox.innerHTML = `
+    <div style="display:flex; justify-content:space-between; align-items:center; margin-bottom:0.5rem;">
+      <div style="font-size:1.3rem; font-weight:700;">${escapeHtml(r.verdict)}</div>
+      <div class="recipe-tag">${r.recipe_id} — ${escapeHtml(r.recipe_name)}</div>
+    </div>
+    <div><strong>Reason:</strong> ${escapeHtml(r.reason)}</div>
+    ${r.mitigation && r.mitigation !== "None required." && r.mitigation !== "None — proceed with Chinchilla-optimal recipe."
+      ? `<div style="margin-top:0.5rem;"><strong>Action:</strong> ${escapeHtml(r.mitigation)}</div>`
+      : ""}
+  `;
+  const cBox = $("chain-box");
+  cBox.innerHTML = "";
+  r.chain.forEach(step => {
+    const div = document.createElement("details");
+    div.className = "chain-step";
+    div.innerHTML = `
+      <summary>
+        <span><strong>Step ${step.step}</strong> — ${escapeHtml(step.name)}</span>
+        <span class="step-section">${escapeHtml(step.section)}</span>
+      </summary>
+      <div class="step-formula">${escapeHtml(step.formula)}</div>
+      <div><strong>Inputs:</strong> ${escapeHtml(JSON.stringify(step.inputs))}</div>
+      <div class="step-result"><strong>Result:</strong> ${formatResult(step.result)}</div>
+      ${step.interpretation ? `<div class="step-interp">${escapeHtml(step.interpretation)}</div>` : ""}
+    `;
+    cBox.appendChild(div);
+  });
+}
+function formatResult(r) {
+  if (r === null || r === undefined) return "n/a (not applicable)";
+  if (typeof r === "number") return r.toLocaleString(undefined, { maximumFractionDigits: 4 });
+  if (typeof r === "object") return `<pre>${escapeHtml(JSON.stringify(r, null, 2))}</pre>`;
+  return String(r);
+}
+function escapeHtml(s) {
+  return String(s)
+    .replace(/&/g, "&amp;").replace(/</g, "&lt;").replace(/>/g, "&gt;")
+    .replace(/"/g, "&quot;").replace(/'/g, "&#39;");
+}
+// ════════════════════════════════════════════════════════════════════
+// WebLLM (synthesis + router)
+// ════════════════════════════════════════════════════════════════════
+async function loadWebLLM() {
+  if (state.webllm) return state.webllm;
+  setStatus("⏳ Loading WebLLM library + Llama-3.2-1B (~700MB first time, cached after)...");
+  const { CreateMLCEngine } = await import("https://esm.run/@mlc-ai/web-llm");
+  state.webllm = await CreateMLCEngine(WEBLLM_MODEL, {
+    initProgressCallback: (info) => setStatus(`⏳ ${info.text || "Loading model..."}`),
+  });
+  return state.webllm;
+}
+async function synthesizeAnswer(result) {
+  $("answer-header").style.display = "block";
+  $("answer-box").style.display = "block";
+  $("answer-box").innerHTML = '<em style="color:var(--fg-dim);">Generating plain-English summary...</em>';
+  let engine;
+  try {
+    engine = await loadWebLLM();
+  } catch (err) {
+    $("answer-box").innerHTML = `<em style="color:var(--warning);">⚠ WebLLM failed: ${escapeHtml(String(err))}<br>Numbers above are still correct.</em>`;
+    return;
+  }
+  const prompt = buildSynthesisPrompt(result);
+  let answer = "";
+  try {
+    const reply = await engine.chat.completions.create({
+      messages: [
+        { role: "system", content: "You are a precise transformer LLM diagnostic assistant. Summarise pre-computed TAF results in 4-6 sentences. Cite section numbers. Always recommend an action. Never invent numbers." },
+        { role: "user", content: prompt },
+      ],
+      max_tokens: 400,
+      temperature: 0.2,
+    });
+    answer = reply.choices[0].message.content;
+  } catch (err) {
+    $("answer-box").innerHTML = `<em style="color:var(--warning);">⚠ Synthesis failed: ${escapeHtml(String(err))}</em>`;
+    return;
+  }
+  $("answer-box").innerHTML = `
+    <div style="white-space:pre-wrap; line-height:1.7;">${escapeHtml(answer)}</div>
+    <div style="margin-top:0.75rem; font-size:0.85rem; color:var(--fg-dim);">
+      ↑ Synthesised by Llama-3.2-1B in your browser. Numbers are deterministic Python.
+    </div>
+  `;
+  setStatus("✅ Done.");
+}
+function buildSynthesisPrompt(r) {
+  const numbersBlock = r.chain.map(s =>
+    `Step ${s.step} (${s.section}) ${s.name}: ${formatResultPlain(s.result)} — ${s.interpretation || ""}`
+  ).join("\n");
+  return `Recipe: ${r.recipe_id} — ${r.recipe_name}
+${r._original_question ? `User question: "${r._original_question}"\n` : ""}
+Computed chain:
+${numbersBlock}
+Verdict: ${r.verdict}
+Reason: ${r.reason}
+Action: ${r.mitigation}
+Summarize for non-technical user in 4-6 sentences. Cite section numbers (§X.Y). Mention verdict and most important action.`;
+}
+function formatResultPlain(r) {
+  if (r === null || r === undefined) return "n/a";
+  if (typeof r === "number") return r.toLocaleString(undefined, { maximumFractionDigits: 4 });
+  if (typeof r === "object") return JSON.stringify(r);
+  return String(r);
+}
+// ════════════════════════════════════════════════════════════════════
+// Bootstrap
+// ════════════════════════════════════════════════════════════════════
+loadPyodideAndTaf().catch(err => {
+  setStatus(`❌ Failed to initialise: ${err.message || err}`);
+  console.error(err);
+});

python/taf_browser.py ADDED Viewed

	@@ -0,0 +1,793 @@

+"""
+TAF Browser — Pyodide-compatible TAF formulas + recipes.
+Pure-Python deterministic computations of TAF (Thermodynamic Attention Framework)
+formulas, plus 5 cross-section recipes for the most common viability questions.
+Author: Carles Marin <transformerkmarin@gmail.com>
+License: Apache-2.0
+"""
+from __future__ import annotations
+import math
+import json
+# ════════════════════════════════════════════════════════════════════════════
+# §26 — γ-Thermodynamics (OUR contribution)
+# ════════════════════════════════════════════════════════════════════════════
+def gamma_pade(theta: float, T_eval: int) -> float:
+    """§26.1 — γ = (2θ - T√2)/(2θ + T√2)"""
+    z_sqrt2 = T_eval * math.sqrt(2)
+    return (2 * theta - z_sqrt2) / (2 * theta + z_sqrt2)
+def gamma_decompose(gamma_pade_val, has_GQA=False, has_SWA=False, n_params=0.0) -> dict:
+    """§26.10 — 5-axis decomposition (n=23 OLS, paper sesión 28)."""
+    delta_GQA = +0.11 if has_GQA else 0.0
+    delta_SWA = -0.21 if has_SWA else 0.0
+    delta_post_IH = -0.15 if n_params >= 4e8 else 0.0
+    return {
+        "pade_centroid":   gamma_pade_val,
+        "delta_GQA":       delta_GQA,
+        "delta_SWA":       delta_SWA,
+        "delta_post_IH":   delta_post_IH,
+        "gamma_corrected": gamma_pade_val + delta_GQA + delta_SWA + delta_post_IH,
+    }
+def d_horizon(theta: float, gamma: float):
+    """§26.2 — d_h = θ(1-γ)√2/(1+γ). None if γ outside (0,1)."""
+    if gamma <= 0 or gamma >= 1:
+        return None
+    return theta * (1 - gamma) * math.sqrt(2) / (1 + gamma)
+def l_niah_c(d_horizon_val):
+    """§26.5 — L_NIAH^c = 2·d_horizon."""
+    return None if d_horizon_val is None else 2 * d_horizon_val
+def chi_susceptibility(gamma: float) -> float:
+    """§26.16 — χ = 1/|γ-1|."""
+    return float('inf') if gamma == 1.0 else 1.0 / abs(gamma - 1.0)
+def p_hallucinate(L: int, theta: float, gamma: float):
+    """§26.9 — Horizon-overshoot probability."""
+    dh = d_horizon(theta, gamma)
+    if dh is None or L <= 0:
+        return None
+    chi = chi_susceptibility(gamma)
+    if chi == float('inf'):
+        return None
+    geom = max(0.0, 1.0 - (dh / L) ** (1 - gamma))
+    return geom * (math.sqrt(chi) / (1 + math.sqrt(chi)))
+def theta_design(gamma_target: float, T_eval: int) -> float:
+    """§26.3 — θ to land at γ_target at T_eval (Padé inverse)."""
+    if gamma_target >= 1 or gamma_target <= -1:
+        raise ValueError("gamma_target must be in (-1, 1)")
+    return T_eval * math.sqrt(2) * (1 + gamma_target) / (2 * (1 - gamma_target))
+def alpha_opt(gamma_target: float, T_eval: int, theta_nominal: float) -> float:
+    """§26.4 — α = θ_design / θ_nominal."""
+    return theta_design(gamma_target, T_eval) / theta_nominal
+def df_window(gamma: float, N: int, f: float = 0.90):
+    """§26.7 — KV compression window. None outside [0.65, 0.85] zone."""
+    if not (0.65 <= gamma <= 0.85):
+        return None
+    if gamma >= 1:
+        return int(f * N)
+    inner = (1 - f) + f * N ** (1 - gamma)
+    return int(math.ceil(inner ** (1 / (1 - gamma))))
+def kv_soft_decay_regime(theta: float, gamma: float, T_train: int) -> str:
+    """§26.8 — Soft decay régimen-bound. d_h ≳ T_train/2 ⇒ applies."""
+    dh = d_horizon(theta, gamma)
+    if dh is None:
+        return "use-hard-cutoff"
+    ratio = dh / max(1, T_train / 2)
+    if ratio >= 1.2:
+        return "applies"
+    if ratio >= 0.8:
+        return "borderline"
+    return "use-hard-cutoff"
+# ════════════════════════════════════════════════════════════════════════════
+# §17 — Pre-training viability formulas
+# ════════════════════════════════════════════════════════════════════════════
+def chinchilla_optimal_tokens(N_params: float, ratio: float = 20.0) -> float:
+    """§17.30 — Chinchilla 20:1 token budget. D = ratio · N."""
+    return ratio * N_params
+def chinchilla_optimal_N(D_tokens: float, ratio: float = 20.0) -> float:
+    """§17.30 inverse — given D tokens, optimal N = D/20."""
+    return D_tokens / ratio
+def training_flops(N_params: float, D_tokens: float) -> float:
+    """§17.10 — C ≈ 6·N·D total training FLOPs."""
+    return 6 * N_params * D_tokens
+def training_memory_16N(N_params: float) -> dict:
+    """§17.20 — total memory ≈ 16·N bytes (model + grads + Adam moments)."""
+    bytes_total = 16 * N_params
+    return {
+        "bytes": bytes_total,
+        "GB": bytes_total / 1e9,
+    }
+def emergent_threshold(N_params: float) -> str:
+    """§17.60 — capability threshold heuristic (Wei 2022)."""
+    if N_params >= 1e11:
+        return "above 100B — strong reasoning capabilities expected"
+    if N_params >= 1e10:
+        return "above 10B — most emergent capabilities present"
+    if N_params >= 1e9:
+        return "above 1B — basic instruction-following, not strong reasoning"
+    if N_params >= 1e8:
+        return "above 100M — useful for narrow tasks, no emergence"
+    return "below 100M — domain-specific tasks only"
+# ════════════════════════════════════════════════════════════════════════════
+# §19 — Inference economics
+# ════════════════════════════════════════════════════════════════════════════
+def kv_cache_memory(n_layers, n_kv_heads, d_head, seq_len, bytes_per_element=2.0) -> dict:
+    """§19.1 — bytes = 2·L·n_kv·d_h·seq·B."""
+    bytes_total = 2 * n_layers * n_kv_heads * d_head * seq_len * bytes_per_element
+    return {"bytes": bytes_total, "MB": bytes_total / 1e6, "GB": bytes_total / 1e9}
+def model_weights_memory(N_params, bytes_per_element=2.0) -> dict:
+    """Inference memory for model weights only (BF16=2, INT8=1, INT4=0.5)."""
+    return {"GB": N_params * bytes_per_element / 1e9}
+def inference_decode_throughput(N_params, hbm_GB_per_s, bytes_per_element=2.0) -> float:
+    """§19.7 — memory-bound decode: tokens/sec = HBM_BW / model_size."""
+    model_GB = N_params * bytes_per_element / 1e9
+    return hbm_GB_per_s / model_GB
+# ════════════════════════════════════════════════════════════════════════════
+# §20 — Hardware catalog (curated from vendor docs 2026)
+# ════════════════════════════════════════════════════════════════════════════
+GPU_CATALOG = {
+    # name: {bf16_TFLOPs, hbm_GB, hbm_GB_s, cloud_USD_per_h_spot, tdp_W}
+    "H100 SXM":  {"flops": 989,  "vram_GB": 80,  "bw_GB_s": 3350, "usd_h": 2.5,  "tdp": 700},
+    "H100 PCIe": {"flops": 756,  "vram_GB": 80,  "bw_GB_s": 2000, "usd_h": 2.0,  "tdp": 350},
+    "H200":      {"flops": 989,  "vram_GB": 141, "bw_GB_s": 4800, "usd_h": 3.5,  "tdp": 700},
+    "B200":      {"flops": 2250, "vram_GB": 192, "bw_GB_s": 8000, "usd_h": 5.0,  "tdp": 1000},
+    "A100 80GB": {"flops": 312,  "vram_GB": 80,  "bw_GB_s": 2000, "usd_h": 1.2,  "tdp": 400},
+    "A100 40GB": {"flops": 312,  "vram_GB": 40,  "bw_GB_s": 1555, "usd_h": 1.0,  "tdp": 400},
+    "L40S":      {"flops": 362,  "vram_GB": 48,  "bw_GB_s": 864,  "usd_h": 0.7,  "tdp": 350},
+    "MI300X":    {"flops": 1307, "vram_GB": 192, "bw_GB_s": 5300, "usd_h": 2.1,  "tdp": 750},
+    "RTX 4090":  {"flops": 165,  "vram_GB": 24,  "bw_GB_s": 1008, "usd_h": 0.4,  "tdp": 450},
+    "RTX 5090":  {"flops": 419,  "vram_GB": 32,  "bw_GB_s": 1792, "usd_h": 0.7,  "tdp": 575},
+    "RTX 5060Ti":{"flops": 36,   "vram_GB": 16,  "bw_GB_s": 448,  "usd_h": 0.0,  "tdp": 180},  # local
+}
+def cost_per_training_run(N_params: float, D_tokens: float, gpu: str = "H100 SXM",
+                          n_gpus: int = 8, mfu: float = 0.45) -> dict:
+    """§20.11 — cost = (flops_total / (peak·MFU·n_gpus)) · USD/h."""
+    info = GPU_CATALOG.get(gpu)
+    if info is None:
+        return {"error": f"unknown gpu '{gpu}'", "available": list(GPU_CATALOG.keys())}
+    total_flops = training_flops(N_params, D_tokens)  # absolute FLOPs
+    effective_flops_per_sec = info["flops"] * 1e12 * mfu * n_gpus
+    seconds = total_flops / effective_flops_per_sec
+    hours = seconds / 3600
+    usd = hours * info["usd_h"] * n_gpus
+    return {
+        "total_FLOPs": total_flops,
+        "hours": hours,
+        "days": hours / 24,
+        "USD": usd,
+        "gpu": gpu, "n_gpus": n_gpus, "mfu": mfu,
+    }
+def cost_per_inference_token(model_GB: float, gpu: str, batch: int = 1) -> dict:
+    """§19.9 / §20.12 — derived $/Mtok from memory-bound decode."""
+    info = GPU_CATALOG.get(gpu)
+    if info is None:
+        return {"error": f"unknown gpu '{gpu}'"}
+    tok_per_sec = info["bw_GB_s"] / model_GB * batch
+    sec_per_Mtok = 1e6 / tok_per_sec
+    h_per_Mtok = sec_per_Mtok / 3600
+    usd_per_Mtok = h_per_Mtok * info["usd_h"]
+    return {
+        "tok_per_sec": tok_per_sec,
+        "USD_per_Mtok": usd_per_Mtok,
+        "gpu": gpu, "batch": batch,
+    }
+# ════════════════════════════════════���═══════════════════════════════════════
+# §24 — Cost / ROI
+# ════════════════════════════════════════════════════════════════════════════
+API_PRICING = {
+    # USD per million tokens (input/output blended typical)
+    "GPT-4o":         {"input": 2.5,  "output": 10.0},
+    "GPT-4o-mini":    {"input": 0.15, "output": 0.60},
+    "Claude-Opus-4":  {"input": 15.0, "output": 75.0},
+    "Claude-Sonnet-4":{"input": 3.0,  "output": 15.0},
+    "Claude-Haiku-4": {"input": 0.80, "output": 4.0},
+    "Gemini-1.5-Pro": {"input": 1.25, "output": 5.0},
+    "DeepSeek-V3":    {"input": 0.27, "output": 1.10},
+    "Llama-3.3-70B (Together)": {"input": 0.88, "output": 0.88},
+}
+def break_even_volume(training_cost: float, self_inference_per_Mtok: float,
+                      api_per_Mtok: float, blend_input_output: float = 0.5) -> dict:
+    """§24.3 — monthly tokens at which custom training pays off."""
+    savings_per_Mtok = api_per_Mtok - self_inference_per_Mtok
+    if savings_per_Mtok <= 0:
+        return {"error": "self-host more expensive than API per token; never breaks even"}
+    Mtok_breakeven = training_cost / savings_per_Mtok
+    return {
+        "savings_per_Mtok": savings_per_Mtok,
+        "Mtok_breakeven": Mtok_breakeven,
+        "tokens_breakeven": Mtok_breakeven * 1e6,
+    }
+# ════════════════════════════════════════════════════════════════════════════
+# RECIPES
+# ════════════════════════════════════════════════════════════════════════════
+# ─────────────────────────────────────────────────────────────────────
+# X-2 — Long Context Viability
+# ─────────────────────────────────────────────────────────────────────
+def run_recipe_x2(theta, T_train, T_eval, n_attention_heads, n_kv_heads,
+                  d_head, n_layers, n_params, has_SWA=False,
+                  bytes_per_element=2.0, **_unused):
+    """X-2: will model M serve length L doing NIAH retrieval?"""
+    chain = []
+    g_pade = gamma_pade(theta, T_eval)
+    chain.append(_step(1, "§26.1", "γ_Padé", "γ = (2θ - T√2)/(2θ + T√2)",
+                       {"theta": theta, "T_eval": T_eval}, g_pade,
+                       _phase_label(g_pade)))
+    has_GQA = (n_kv_heads < n_attention_heads)
+    decomp = gamma_decompose(g_pade, has_GQA=has_GQA, has_SWA=has_SWA, n_params=n_params)
+    g_corr = decomp["gamma_corrected"]
+    chain.append(_step(2, "§26.10", "γ-decomposition", "γ + δ_GQA + δ_SWA + δ_post_IH",
+                       {"has_GQA": has_GQA, "has_SWA": has_SWA, "n_params": n_params},
+                       g_corr, breakdown=decomp))
+    dh = d_horizon(theta, g_corr)
+    chain.append(_step(3, "§26.2", "d_horizon", "d_h = θ(1-γ)√2/(1+γ)",
+                       {"theta": theta, "gamma": g_corr}, dh,
+                       "n/a — γ outside (0,1)" if dh is None else f"horizon at d={dh:.0f}"))
+    l_niah = l_niah_c(dh)
+    chain.append(_step(4, "§26.5", "L_NIAH^c", "L_NIAH^c = 2·d_horizon",
+                       {"d_horizon": dh}, l_niah,
+                       "n/a" if l_niah is None else f"NIAH 50% at L={l_niah:.0f}"))
+    p_hallu = p_hallucinate(T_eval, theta, g_corr)
+    chain.append(_step(5, "§26.9", "P_hallucinate", "max(0,1-(d_h/L)^(1-γ))·√χ/(1+√χ)",
+                       {"L": T_eval, "theta": theta, "gamma": g_corr}, p_hallu,
+                       "n/a (Phase B)" if p_hallu is None else f"{p_hallu*100:.1f}% predicted"))
+    kv = kv_cache_memory(n_layers, n_kv_heads, d_head, T_eval, bytes_per_element)
+    chain.append(_step(6, "§19.1", "KV cache memory", "2·L·n_kv·d_h·seq·B",
+                       {"n_layers": n_layers, "n_kv_heads": n_kv_heads, "d_head": d_head,
+                        "seq_len": T_eval, "bytes_per_element": bytes_per_element},
+                       kv, f"{kv['GB']:.2f} GB per request"))
+    if g_corr <= 0 or g_corr >= 1:
+        verdict, reason = "NO", "Phase B / geometric collapse (γ_corrected outside (0,1))"
+        mit = (f"Apply NTK-aware extension. Required θ for γ=0.85: "
+               f"{theta_design(0.85, T_eval):,.0f}. α_opt = {alpha_opt(0.85, T_eval, theta):.2f} "
+               f"({'fine-tuning required' if alpha_opt(0.85, T_eval, theta) > 8 else 'zero-shot may work'}).")
+    elif dh is not None and T_eval < dh:
+        margin = (1 - T_eval / dh) * 100
+        verdict, reason = "YES", f"L={T_eval} inside d_horizon={dh:.0f} ({margin:.0f}% margin)."
+        mit = "None required."
+    elif dh is not None and T_eval < l_niah:
+        verdict, reason = "DEGRADED", f"L between d_horizon ({dh:.0f}) and L_NIAH^c ({l_niah:.0f})."
+        mit = "Consider context contraction OR NTK extension."
+    else:
+        verdict, reason = "NO", f"L={T_eval} exceeds NIAH ceiling {l_niah:.0f}."
+        mit = f"Apply NTK extension; need θ ≈ {theta_design(0.85, T_eval):,.0f} for γ=0.85."
+    return _wrap("X-2", "Long Context Viability", locals(), chain, verdict, reason, mit)
+# ─────────────────────────────────────────────────────────────────────
+# X-1 — Custom training vs API for a domain task
+# ─────────────────────────────────────────────────────────────────────
+def run_recipe_x1(N_params, D_tokens=None, gpu="H100 SXM", n_gpus=8, mfu=0.45,
+                  api_model="GPT-4o", monthly_tokens_M=10.0, **_unused):
+    """X-1: custom training (Chinchilla optimal) vs API."""
+    chain = []
+    # Step 1: Chinchilla optimal D
+    if D_tokens is None:
+        D_tokens = chinchilla_optimal_tokens(N_params)
+    chain.append(_step(1, "§17.30", "Chinchilla optimal D", "D = 20·N",
+                       {"N_params": N_params}, D_tokens,
+                       f"recommended D = {D_tokens:.2e} tokens"))
+    # Step 2: training FLOPs
+    flops = training_flops(N_params, D_tokens)
+    chain.append(_step(2, "§17.10", "Training FLOPs", "C = 6·N·D",
+                       {"N": N_params, "D": D_tokens}, flops,
+                       f"{flops:.2e} FLOPs total"))
+    # Step 3: training cost
+    cost = cost_per_training_run(N_params, D_tokens, gpu=gpu, n_gpus=n_gpus, mfu=mfu)
+    chain.append(_step(3, "§20.11", "Training cost",
+                       "hours·USD/h·n_gpus = total $",
+                       {"gpu": gpu, "n_gpus": n_gpus, "mfu": mfu}, cost,
+                       f"${cost['USD']:,.0f} over {cost['days']:.1f} days"))
+    # Step 4: model_GB and decode throughput
+    model_GB = N_params * 2 / 1e9  # BF16
+    inf = cost_per_inference_token(model_GB, gpu, batch=1)
+    chain.append(_step(4, "§19.9 / §20.12", "Self-inference $/Mtok",
+                       "BW / model_GB → tok/s → $/Mtok",
+                       {"model_GB": model_GB, "gpu": gpu}, inf,
+                       f"${inf['USD_per_Mtok']:.2f} per million tokens (single user)"))
+    # Step 5: API blended price
+    api = API_PRICING.get(api_model, {"input": 2.0, "output": 8.0})
+    api_blend = (api["input"] + api["output"]) / 2
+    chain.append(_step(5, "§24.X", f"{api_model} blended price",
+                       "(input + output) / 2 USD/Mtok",
+                       {"api_model": api_model}, api_blend,
+                       f"${api_blend:.2f}/Mtok blended"))
+    # Step 6: break-even
+    be = break_even_volume(cost["USD"], inf["USD_per_Mtok"], api_blend)
+    chain.append(_step(6, "§24.3", "Break-even tokens", "training$ / (api - self) = Mtok",
+                       {"training_cost": cost["USD"]}, be,
+                       _be_interp(be, monthly_tokens_M)))
+    # Verdict
+    if "error" in be:
+        verdict, reason = "NO", be["error"]
+        mit = f"Stick with {api_model} API."
+    elif monthly_tokens_M >= be["Mtok_breakeven"]:
+        verdict = "YES (custom)"
+        months_to_payoff = be["Mtok_breakeven"] / monthly_tokens_M
+        reason = (f"At {monthly_tokens_M} M tokens/month, break-even in "
+                  f"{months_to_payoff:.1f} months. Long-term custom is cheaper.")
+        mit = f"Train at {gpu}×{n_gpus}; serve self-hosted."
+    else:
+        months = be["Mtok_breakeven"] / monthly_tokens_M
+        verdict = "NO (API)"
+        reason = (f"At {monthly_tokens_M} M tokens/month, break-even in "
+                  f"{months:.1f} months — too slow.")
+        mit = f"Use {api_model} API (cheaper for your volume)."
+    return _wrap("X-1", "Custom training vs API", locals(), chain, verdict, reason, mit)
+def _be_interp(be, monthly):
+    if "error" in be:
+        return be["error"]
+    months = be["Mtok_breakeven"] / max(monthly, 0.001)
+    return f"break-even at {be['Mtok_breakeven']:.0f} Mtok ({months:.1f} months at {monthly} M/mo)"
+# ─────────────────────────────────────────────────────────────────────
+# X-3 — Pre-flight check on $5K training budget
+# ────────────────────────────────────────────────────────────────────���
+def run_recipe_x3(USD_budget=5000.0, gpu="H100 SXM", mfu=0.45, n_gpus=1, **_unused):
+    """X-3: given $ budget, what model can I train?"""
+    chain = []
+    info = GPU_CATALOG[gpu]
+    # Step 1: GPU-hours we can afford
+    hours = USD_budget / (info["usd_h"] * n_gpus)
+    chain.append(_step(1, "§20.11", "Affordable GPU-hours", "USD / ($/h·n_gpus)",
+                       {"USD": USD_budget, "gpu": gpu, "n_gpus": n_gpus}, hours,
+                       f"{hours:.0f} GPU-hours total ({hours/24:.1f} days at full use)"))
+    # Step 2: max FLOPs
+    max_flops = info["flops"] * 1e12 * mfu * n_gpus * hours * 3600
+    chain.append(_step(2, "§17.10", "Max training FLOPs",
+                       "peak·MFU·n_gpus·seconds",
+                       {"peak_TFLOPs": info["flops"], "MFU": mfu}, max_flops,
+                       f"{max_flops:.2e} effective FLOPs"))
+    # Step 3: Chinchilla-optimal N (with D=20N)
+    # 6·N·D = max_flops, D=20N → 120·N² = max_flops → N = sqrt(max_flops/120)
+    N_chinchilla = math.sqrt(max_flops / 120)
+    D_chinchilla = 20 * N_chinchilla
+    chain.append(_step(3, "§17.30", "Chinchilla-optimal N",
+                       "N = √(C/120) at D=20N", {"max_FLOPs": max_flops},
+                       N_chinchilla,
+                       f"N ≈ {N_chinchilla:.2e} params with D = {D_chinchilla:.2e} tokens"))
+    # Step 4: emergence check
+    emerg = emergent_threshold(N_chinchilla)
+    chain.append(_step(4, "§17.60", "Emergence threshold", "Wei 2022 capability",
+                       {"N": N_chinchilla}, emerg, emerg))
+    # Step 5: memory budget check
+    mem = training_memory_16N(N_chinchilla)
+    fits = mem["GB"] <= info["vram_GB"]
+    chain.append(_step(5, "§17.20", "16N training memory",
+                       "model + grads + AdamW",
+                       {"N": N_chinchilla}, mem,
+                       f"{mem['GB']:.1f} GB needed; "
+                       f"{'fits in ' if fits else 'EXCEEDS '}{info['vram_GB']} GB VRAM"))
+    # Verdict
+    if N_chinchilla < 1e8:
+        verdict, reason = "TINY-MODEL", f"Budget supports only ~{N_chinchilla:.0e} params"
+        mit = "Use LoRA fine-tuning of larger pretrained model instead."
+    elif not fits:
+        verdict, reason = "MEMORY-LIMITED", f"Chinchilla N ({N_chinchilla:.1e}) doesn't fit one {gpu}"
+        mit = f"Use ZeRO-3 across multiple GPUs (need ≥{math.ceil(mem['GB']/info['vram_GB'])}× {gpu}) OR train smaller N undertrained."
+    else:
+        verdict = "GO"
+        reason = (f"At ${USD_budget}, train {N_chinchilla:.1e}-param model on "
+                  f"{D_chinchilla:.1e} tokens in ~{hours/24:.1f} days. "
+                  f"Capability tier: {emerg.split('—')[0].strip()}.")
+        mit = "None — proceed with Chinchilla-optimal recipe."
+    return _wrap("X-3", "Budget pre-flight", locals(), chain, verdict, reason, mit)
+# ─────────────────────────────────────────────────────────────────────
+# X-5 — Hardware selection for serving
+# ─────────────────────────────────────────────────────────────────────
+def run_recipe_x5(N_params, T_eval=4096, n_layers=32, n_kv_heads=8, d_head=128,
+                  bytes_per_weight=2.0, target_tokens_per_day=10_000_000.0,
+                  concurrent_users=1, **_unused):
+    """X-5: which GPU should I use to serve N-param model at L context?"""
+    chain = []
+    # Step 1: weights memory
+    w_mem = model_weights_memory(N_params, bytes_per_weight)
+    chain.append(_step(1, "§19.X", "Model weights memory",
+                       "N · bytes_per_weight",
+                       {"N": N_params, "bytes": bytes_per_weight}, w_mem,
+                       f"{w_mem['GB']:.1f} GB for weights"))
+    # Step 2: KV cache per request
+    kv = kv_cache_memory(n_layers, n_kv_heads, d_head, T_eval, bytes_per_weight)
+    chain.append(_step(2, "§19.1", "KV cache (per request)",
+                       "2·L·n_kv·d_h·seq·B",
+                       {"n_layers": n_layers, "n_kv": n_kv_heads,
+                        "d_head": d_head, "seq": T_eval}, kv,
+                       f"{kv['GB']:.2f} GB per concurrent request"))
+    # Step 3: total memory needed
+    total_GB = w_mem["GB"] + kv["GB"] * concurrent_users
+    chain.append(_step(3, "§20.3", "Total GPU memory",
+                       "weights + KV·n_concurrent", {}, {"GB": total_GB},
+                       f"{total_GB:.1f} GB for {concurrent_users} concurrent users"))
+    # Step 4: scan GPU catalog
+    candidates = []
+    for name, info in GPU_CATALOG.items():
+        if info["vram_GB"] < total_GB:
+            continue
+        # Decode throughput estimate (memory-bound)
+        tok_per_s = info["bw_GB_s"] / w_mem["GB"]
+        tok_per_day = tok_per_s * 86400
+        capacity_users = tok_per_day / target_tokens_per_day
+        usd_per_day = info["usd_h"] * 24
+        usd_per_Mtok = (usd_per_day / (tok_per_day / 1e6)) if tok_per_day > 0 else float('inf')
+        candidates.append({
+            "gpu": name, "vram_GB": info["vram_GB"], "bw_GB_s": info["bw_GB_s"],
+            "tok_per_sec": tok_per_s, "tok_per_day": tok_per_day,
+            "USD_per_day": usd_per_day, "USD_per_Mtok": usd_per_Mtok,
+            "users_supported": capacity_users,
+        })
+    candidates.sort(key=lambda c: c["USD_per_Mtok"])
+    chain.append(_step(4, "§20", f"Eligible GPUs (≥{total_GB:.0f}GB)",
+                       "filter + rank by $/Mtok",
+                       {"min_VRAM": total_GB}, candidates[:5],
+                       f"{len(candidates)} GPUs fit; cheapest: {candidates[0]['gpu'] if candidates else 'NONE'}"))
+    # Verdict
+    if not candidates:
+        verdict, reason = "NO", f"No single GPU has ≥{total_GB:.0f} GB VRAM."
+        mit = (f"Use tensor parallelism across multiple GPUs "
+               f"(e.g. 2× H100 = 160GB), or quantize to INT8 (halves memory).")
+    else:
+        best = candidates[0]
+        verdict = "YES"
+        reason = (f"Best GPU: {best['gpu']} at ${best['USD_per_Mtok']:.2f}/Mtok. "
+                  f"Supports {best['users_supported']:.1f}× your daily target.")
+        mit = f"Provision {best['gpu']}, expected {best['tok_per_sec']:.0f} tok/s decode."
+    return _wrap("X-5", "Hardware selection for serving", locals(), chain, verdict, reason, mit)
+# ─────────────────────────────────────────────────────────────────────
+# X-19 — KV compression decision (ours vs literature)
+# ─────────────────────────────────────────────────────────────────────
+def run_recipe_x19(theta, T_train, T_eval, n_attention_heads, n_kv_heads,
+                   d_head, n_layers, n_params, has_SWA=False, **_unused):
+    """X-19: should I use γ-soft KV decay, hard D_f, or literature methods?"""
+    chain = []
+    # Step 1: γ_Padé
+    g_pade = gamma_pade(theta, T_eval)
+    chain.append(_step(1, "§26.1", "γ_Padé", "(2θ-T√2)/(2θ+T√2)",
+                       {"theta": theta, "T_eval": T_eval}, g_pade, _phase_label(g_pade)))
+    # Step 2: γ-decomposition
+    has_GQA = n_kv_heads < n_attention_heads
+    decomp = gamma_decompose(g_pade, has_GQA, has_SWA, n_params)
+    g_corr = decomp["gamma_corrected"]
+    chain.append(_step(2, "§26.10", "γ-decomposition", "5-axis adjustment",
+                       {"has_GQA": has_GQA, "has_SWA": has_SWA, "n_params": n_params},
+                       g_corr))
+    # Step 3: §26.7 D_f window applicability
+    df = df_window(g_corr, T_eval, f=0.90)
+    df_zone_ok = df is not None
+    chain.append(_step(3, "§26.7", "D_f window (γ in [0.65, 0.85])",
+                       "[(1-f)+fN^(1-γ)]^(1/(1-γ))",
+                       {"gamma": g_corr, "N": T_eval, "f": 0.9}, df,
+                       f"D_f = {df}" if df_zone_ok
+                       else f"NOT applicable (γ={g_corr:.3f} outside [0.65, 0.85])"))
+    # Step 4: §26.8 soft decay régimen
+    regime = kv_soft_decay_regime(theta, g_corr, T_train)
+    dh = d_horizon(theta, g_corr)
+    dh_str = f"{dh:.0f}" if dh is not None else "n/a"
+    chain.append(_step(4, "§26.8", "Soft decay régimen", "d_h ≳ T_train/2",
+                       {"theta": theta, "gamma": g_corr, "T_train": T_train}, regime,
+                       f"d_horizon={dh_str}; regime: {regime}"))
+    # Step 5: KV cache memory baseline
+    kv = kv_cache_memory(n_layers, n_kv_heads, d_head, T_eval)
+    chain.append(_step(5, "§19.1", "Baseline KV memory", "2·L·n_kv·d_h·seq·B",
+                       {"L": n_layers, "n_kv": n_kv_heads, "d_h": d_head, "seq": T_eval},
+                       kv, f"{kv['GB']:.2f} GB without compression"))
+    # Verdict
+    if regime == "applies" and df_zone_ok:
+        verdict = "USE SOFT DECAY"
+        reason = (f"d_horizon ≳ T_train/2 AND γ in compression zone. "
+                  f"Soft decay (1-d/d_h)^γ best (-21% PPL vs hard cutoff per F17).")
+        mit = "Implement as 4D attention_mask additive bias with eager attention."
+    elif df_zone_ok:
+        verdict = "USE D_f HARD CUTOFF"
+        reason = f"γ in [0.65, 0.85] zone but d_h < T_train/2. Hard truncation at D_f={df} works."
+        mit = "Set cache_max_len = D_f."
+    elif regime == "applies":
+        verdict = "USE SOFT DECAY (caveat)"
+        reason = "Régimen applies but γ outside D_f validity zone. Soft decay only."
+        mit = "Soft decay; do not use D_f window."
+    elif g_corr >= 1 or g_corr <= 0:
+        verdict = "USE LITERATURE METHODS"
+        reason = f"γ={g_corr:.3f} outside Phase A. Our formulas don't apply."
+        mit = "Use SnapKV / PyramidKV / FastGen (literature heuristics)."
+    else:
+        verdict = "USE HARD T_train CUTOFF"
+        reason = "Régimen not met AND γ outside zone. Cap context at T_train."
+        mit = f"Set seq_len ≤ {T_train}, no extension."
+    return _wrap("X-19", "KV compression decision", locals(), chain, verdict, reason, mit)
+# ════════════════════════════════════════════════════════════════════════════
+# Helpers
+# ════════════════════════════════════════════════════════════════════════════
+def _step(n, sec, name, formula, inputs, result, interpretation=None, breakdown=None):
+    s = {"step": n, "section": sec, "name": name, "formula": formula,
+         "inputs": inputs, "result": result}
+    if interpretation:
+        s["interpretation"] = interpretation
+    if breakdown:
+        s["breakdown"] = breakdown
+    return s
+def _wrap(rid, rname, locals_dict, chain, verdict, reason, mitigation):
+    # Clean inputs (drop chain/internal vars)
+    inputs = {k: v for k, v in locals_dict.items()
+              if not k.startswith("_") and k not in
+              ("chain", "verdict", "reason", "mit", "info", "be", "kv", "g_pade", "g_corr",
+               "decomp", "dh", "l_niah", "p_hallu", "cost", "model_GB", "inf", "api",
+               "api_blend", "fits", "mem", "emerg", "max_flops", "hours",
+               "N_chinchilla", "D_chinchilla", "candidates", "best", "tok_per_s",
+               "tok_per_day", "capacity_users", "usd_per_day", "usd_per_Mtok",
+               "total_GB", "w_mem", "df", "df_zone_ok", "regime", "has_GQA",
+               "margin", "months", "months_to_payoff", "name")}
+    return {"recipe_id": rid, "recipe_name": rname, "inputs": inputs,
+            "chain": chain, "verdict": verdict, "reason": reason,
+            "mitigation": mitigation}
+def _phase_label(g):
+    if 0 < g < 1:
+        return "Phase A (long-range OK)"
+    if g >= 1:
+        return "Phase B / Hagedorn"
+    return "Phase B / catastrophic (negative γ — T too large for θ)"
+# ════════════════════════════════════════════════════════════════════════════
+# Recipe registry
+# ════════════════════════════════════════════════════════════════════════════
+RECIPES = {
+    "X-1": {
+        "name": "Custom Training vs API",
+        "description": "Should I train a custom model or use a frontier API for my domain task?",
+        "fn": run_recipe_x1,
+        "params": ["N_params", "D_tokens", "gpu", "n_gpus", "mfu",
+                   "api_model", "monthly_tokens_M"],
+        "category": "build-vs-buy",
+        "uses_sections": ["§17", "§19", "§20", "§24"],
+    },
+    "X-2": {
+        "name": "Long Context Viability",
+        "description": "Will model M serve length L doing Needle-in-a-Haystack retrieval?",
+        "fn": run_recipe_x2,
+        "params": ["theta", "T_train", "T_eval", "n_attention_heads", "n_kv_heads",
+                   "d_head", "n_layers", "n_params", "has_SWA"],
+        "category": "long-context",
+        "uses_sections": ["§26", "§19"],
+    },
+    "X-3": {
+        "name": "Budget Pre-flight",
+        "description": "Given $ budget, what model is feasible to train?",
+        "fn": run_recipe_x3,
+        "params": ["USD_budget", "gpu", "mfu", "n_gpus"],
+        "category": "training-budget",
+        "uses_sections": ["§17", "§20"],
+    },
+    "X-5": {
+        "name": "Hardware Selection",
+        "description": "Which GPU should I use to serve my model at target throughput?",
+        "fn": run_recipe_x5,
+        "params": ["N_params", "T_eval", "n_layers", "n_kv_heads", "d_head",
+                   "bytes_per_weight", "target_tokens_per_day", "concurrent_users"],
+        "category": "serving",
+        "uses_sections": ["§19", "§20"],
+    },
+    "X-19": {
+        "name": "KV Compression Decision",
+        "description": "Should I use soft decay, D_f cutoff, or literature methods to compress KV?",
+        "fn": run_recipe_x19,
+        "params": ["theta", "T_train", "T_eval", "n_attention_heads", "n_kv_heads",
+                   "d_head", "n_layers", "n_params", "has_SWA"],
+        "category": "kv-compression",
+        "uses_sections": ["§26", "§19"],
+    },
+}
+def list_recipes() -> str:
+    """Return JSON of all recipes for UI dropdown."""
+    return json.dumps([
+        {"id": rid, "name": r["name"], "description": r["description"],
+         "category": r["category"], "params": r["params"],
+         "uses_sections": r["uses_sections"]}
+        for rid, r in RECIPES.items()
+    ])
+def run_recipe(recipe_id: str, **params) -> dict:
+    """Dispatcher — execute recipe by id with given params."""
+    r = RECIPES.get(recipe_id)
+    if r is None:
+        return {"error": f"unknown recipe '{recipe_id}'",
+                "available": list(RECIPES.keys())}
+    return r["fn"](**params)
+# ════════════════════════════════════════════════════════════════════════════
+# Known model presets
+# ════════════════════════════════════════════════════════════════════════════
+PRESETS = {
+    "EleutherAI/pythia-2.8b": {
+        "theta": 10000, "T_train": 2048,
+        "n_attention_heads": 32, "n_kv_heads": 32,
+        "d_head": 80, "n_layers": 32, "n_params": 2.8e9, "has_SWA": False,
+    },
+    "EleutherAI/pythia-1b": {
+        "theta": 10000, "T_train": 2048,
+        "n_attention_heads": 8, "n_kv_heads": 8,
+        "d_head": 256, "n_layers": 16, "n_params": 1e9, "has_SWA": False,
+    },
+    "EleutherAI/pythia-1.4b": {
+        "theta": 10000, "T_train": 2048,
+        "n_attention_heads": 16, "n_kv_heads": 16,
+        "d_head": 128, "n_layers": 24, "n_params": 1.4e9, "has_SWA": False,
+    },
+    "meta-llama/Meta-Llama-3-8B": {
+        "theta": 500000, "T_train": 8192,
+        "n_attention_heads": 32, "n_kv_heads": 8,
+        "d_head": 128, "n_layers": 32, "n_params": 8e9, "has_SWA": False,
+    },
+    "meta-llama/Llama-3.2-1B": {
+        "theta": 500000, "T_train": 131072,
+        "n_attention_heads": 32, "n_kv_heads": 8,
+        "d_head": 64, "n_layers": 16, "n_params": 1.2e9, "has_SWA": False,
+    },
+    "meta-llama/Llama-3.3-70B-Instruct": {
+        "theta": 500000, "T_train": 131072,
+        "n_attention_heads": 64, "n_kv_heads": 8,
+        "d_head": 128, "n_layers": 80, "n_params": 70e9, "has_SWA": False,
+    },
+    "mistralai/Mistral-7B-v0.1": {
+        "theta": 10000, "T_train": 8192,
+        "n_attention_heads": 32, "n_kv_heads": 8,
+        "d_head": 128, "n_layers": 32, "n_params": 7e9, "has_SWA": True,
+    },
+    "Qwen/Qwen2.5-7B": {
+        "theta": 1000000, "T_train": 32768,
+        "n_attention_heads": 28, "n_kv_heads": 4,
+        "d_head": 128, "n_layers": 28, "n_params": 7.6e9, "has_SWA": False,
+    },
+    "Qwen/Qwen2.5-1.5B": {
+        "theta": 1000000, "T_train": 32768,
+        "n_attention_heads": 12, "n_kv_heads": 2,
+        "d_head": 128, "n_layers": 28, "n_params": 1.5e9, "has_SWA": False,
+    },
+    "google/gemma-2-9b-it": {
+        "theta": 10000, "T_train": 8192,
+        "n_attention_heads": 16, "n_kv_heads": 8,
+        "d_head": 256, "n_layers": 42, "n_params": 9e9, "has_SWA": True,
+    },
+    "microsoft/phi-3-mini-4k-instruct": {
+        "theta": 10000, "T_train": 4096,
+        "n_attention_heads": 32, "n_kv_heads": 32,
+        "d_head": 96, "n_layers": 32, "n_params": 3.8e9, "has_SWA": True,
+    },
+}
+def list_presets() -> str:
+    return json.dumps([
+        {"id": k, "label": k.split("/")[-1],
+         "theta": v["theta"], "T_train": v["T_train"]}
+        for k, v in PRESETS.items()
+    ])
+def get_preset(model_id: str) -> dict:
+    return PRESETS.get(model_id, {})
+# Smoke test
+if __name__ == "__main__":
+    print("─── X-2 Llama-3-8B @ 32K ───")
+    r = run_recipe("X-2", theta=500_000, T_train=8192, T_eval=32_000,
+                   n_attention_heads=32, n_kv_heads=8, d_head=128,
+                   n_layers=32, n_params=8e9, has_SWA=False)
+    print(f"Verdict: {r['verdict']} — {r['reason']}\n")
+    print("─── X-1 Llama-3-8B vs GPT-4o (10M tok/mo) ───")
+    r = run_recipe("X-1", N_params=8e9, monthly_tokens_M=10.0, api_model="GPT-4o")
+    print(f"Verdict: {r['verdict']} — {r['reason']}\n")
+    print("─── X-3 budget $5K ───")
+    r = run_recipe("X-3", USD_budget=5000.0, gpu="H100 SXM", n_gpus=1)
+    print(f"Verdict: {r['verdict']} — {r['reason']}\n")
+    print("─── X-5 serve Llama-3-8B at 4K ───")
+    r = run_recipe("X-5", N_params=8e9, T_eval=4096, n_layers=32, n_kv_heads=8, d_head=128,
+                   target_tokens_per_day=10e6, concurrent_users=1)
+    print(f"Verdict: {r['verdict']} — {r['reason']}\n")
+    print("─── X-19 KV compression for Llama-3-8B ───")
+    r = run_recipe("X-19", theta=500_000, T_train=8192, T_eval=8192,
+                   n_attention_heads=32, n_kv_heads=8, d_head=128,
+                   n_layers=32, n_params=8e9)
+    print(f"Verdict: {r['verdict']} — {r['reason']}\n")

style.css ADDED Viewed

	@@ -0,0 +1,173 @@

+/* TAF Agent — minimal clean styling */
+:root {
+  --bg: #0a0e14;
+  --bg-card: #12181f;
+  --bg-input: #1a2028;
+  --fg: #c9d1d9;
+  --fg-dim: #8b949e;
+  --accent: #58a6ff;
+  --accent-dim: #1f6feb;
+  --success: #3fb950;
+  --warning: #d29922;
+  --danger: #f85149;
+  --border: #30363d;
+}
+* { box-sizing: border-box; }
+body {
+  font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, Oxygen,
+               Ubuntu, sans-serif;
+  background: var(--bg);
+  color: var(--fg);
+  margin: 0;
+  padding: 0;
+  line-height: 1.6;
+}
+header {
+  text-align: center;
+  padding: 2rem 1rem 1rem;
+  border-bottom: 1px solid var(--border);
+}
+header h1 { margin: 0 0 0.5rem 0; font-size: 2rem; }
+.tagline { font-size: 1.1rem; margin: 0 0 0.5rem; }
+.subtle { color: var(--fg-dim); font-size: 0.9rem; }
+main {
+  max-width: 980px;
+  margin: 0 auto;
+  padding: 1.5rem;
+}
+section {
+  background: var(--bg-card);
+  border: 1px solid var(--border);
+  border-radius: 8px;
+  padding: 1.25rem 1.5rem;
+  margin-bottom: 1.25rem;
+}
+h2 { margin-top: 0; font-size: 1.2rem; color: var(--accent); }
+#status-bar { padding: 0.75rem 1.25rem; }
+#status { font-family: monospace; }
+.recipe-desc { color: var(--fg-dim); margin: 0.5rem 0 0 0; }
+.form-row { display: flex; gap: 1rem; margin-bottom: 1rem; align-items: center; }
+.form-row label { min-width: 120px; }
+.form-grid {
+  display: grid;
+  grid-template-columns: repeat(auto-fill, minmax(220px, 1fr));
+  gap: 0.75rem;
+  margin-bottom: 1rem;
+}
+.form-field { display: flex; flex-direction: column; }
+.form-field label { font-size: 0.85rem; color: var(--fg-dim); margin-bottom: 0.25rem; }
+input, select {
+  background: var(--bg-input);
+  color: var(--fg);
+  border: 1px solid var(--border);
+  border-radius: 4px;
+  padding: 0.4rem 0.6rem;
+  font-family: monospace;
+  font-size: 0.95rem;
+}
+input:focus, select:focus { outline: 1px solid var(--accent); border-color: var(--accent); }
+button {
+  background: var(--accent-dim);
+  color: white;
+  border: none;
+  padding: 0.6rem 1.2rem;
+  font-size: 1rem;
+  font-weight: 600;
+  border-radius: 6px;
+  cursor: pointer;
+  transition: background 0.2s;
+}
+button:hover:not(:disabled) { background: var(--accent); }
+button:disabled { background: #444; cursor: not-allowed; }
+#verdict-box {
+  font-size: 1.05rem;
+  padding: 1rem;
+  border-radius: 6px;
+  border-left: 4px solid;
+}
+.verdict-yes { border-color: var(--success); background: rgba(63, 185, 80, 0.08); }
+.verdict-no { border-color: var(--danger); background: rgba(248, 81, 73, 0.08); }
+.verdict-degraded { border-color: var(--warning); background: rgba(210, 153, 34, 0.08); }
+.chain-step {
+  background: var(--bg-input);
+  border: 1px solid var(--border);
+  border-radius: 6px;
+  padding: 0.75rem 1rem;
+  margin-bottom: 0.5rem;
+}
+.chain-step summary {
+  display: flex;
+  justify-content: space-between;
+  font-weight: 600;
+  cursor: pointer;
+  list-style: none;
+}
+.chain-step summary::before { content: "▸ "; color: var(--accent); }
+.chain-step[open] summary::before { content: "▾ "; }
+.step-section { color: var(--accent); font-family: monospace; font-size: 0.9rem; }
+.step-formula { color: var(--fg-dim); font-family: monospace; font-size: 0.85rem; margin: 0.5rem 0; }
+.step-result { color: var(--success); font-family: monospace; font-weight: 600; margin-top: 0.25rem; }
+.step-interp { color: var(--fg-dim); font-size: 0.9rem; margin-top: 0.25rem; }
+.step-result pre { background: var(--bg); padding: 0.5rem; border-radius: 4px; overflow-x: auto; }
+.recipe-tag {
+  background: var(--bg-input);
+  color: var(--accent);
+  font-family: monospace;
+  font-size: 0.85rem;
+  padding: 0.2rem 0.5rem;
+  border-radius: 4px;
+}
+.mode-tabs { display: flex; gap: 0.5rem; margin-bottom: 0.75rem; flex-wrap: wrap; }
+.mode-btn {
+  background: var(--bg-input); color: var(--fg-dim);
+  border: 1px solid var(--border); border-radius: 6px;
+  padding: 0.5rem 1rem; cursor: pointer; font-size: 0.95rem;
+}
+.mode-btn.active { background: var(--accent-dim); color: white; border-color: var(--accent); }
+button.secondary {
+  background: var(--bg-input); color: var(--fg);
+  border: 1px solid var(--border); padding: 0.4rem 0.8rem;
+}
+button.secondary:hover:not(:disabled) { border-color: var(--accent); }
+textarea {
+  width: 100%; min-height: 60px;
+  background: var(--bg-input); color: var(--fg);
+  border: 1px solid var(--border); border-radius: 4px;
+  padding: 0.5rem; font-family: inherit; font-size: 0.95rem; resize: vertical;
+}
+textarea:focus { outline: 1px solid var(--accent); border-color: var(--accent); }
+@media (max-width: 600px) {
+  .form-grid { grid-template-columns: 1fr; }
+  main { padding: 0.75rem; }
+  .form-row { flex-direction: column; align-items: stretch; }
+  .form-row label { min-width: auto; }
+}
+footer {
+  text-align: center;
+  padding: 1.5rem;
+  color: var(--fg-dim);
+  font-size: 0.85rem;
+  border-top: 1px solid var(--border);
+  margin-top: 2rem;
+}
+footer a { color: var(--accent); text-decoration: none; }
+footer a:hover { text-decoration: underline; }