Spaces:
Running
feat: TAF Agent v0.1 — client-side transformer diagnostic
Browse filesA Pyodide+WebLLM browser app that predicts transformer LLM viability
(long-context, training budget, hardware fit, KV compression, custom-vs-API)
using the TAF (Thermodynamic Attention Framework) formula chains from
Marin 2026.
Phase 1: Pyodide loads taf_browser.py (10 formulas, 11 model presets,
11 GPU catalog, deterministic Python, no server)
Phase 2: WebLLM loads Llama-3.2-1B in browser → plain-English synthesis
Phase 3: Free-form question router (LLM picks recipe + extracts params)
Recipes (5):
X-1 Custom training vs API
X-2 Long Context Viability
X-3 Budget Pre-flight
X-5 Hardware Selection for serving
X-19 KV Compression decision
UI: 2 modes (Ask plain-English / Recipe + form), HF Hub config fetch
for any public model, audit-trail expandable steps, mobile-responsive.
Hosting: GitHub Pages (static); compute: user's browser; cost: \$0/mo.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- .github/workflows/deploy.yml +29 -0
- .gitignore +23 -0
- LICENSE +17 -0
- README.md +101 -0
- index.html +108 -0
- js/main.js +540 -0
- python/taf_browser.py +793 -0
- style.css +173 -0
|
@@ -0,0 +1,29 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
name: Deploy to GitHub Pages
|
| 2 |
+
on:
|
| 3 |
+
push:
|
| 4 |
+
branches: [main]
|
| 5 |
+
workflow_dispatch:
|
| 6 |
+
|
| 7 |
+
permissions:
|
| 8 |
+
contents: read
|
| 9 |
+
pages: write
|
| 10 |
+
id-token: write
|
| 11 |
+
|
| 12 |
+
concurrency:
|
| 13 |
+
group: pages
|
| 14 |
+
cancel-in-progress: false
|
| 15 |
+
|
| 16 |
+
jobs:
|
| 17 |
+
deploy:
|
| 18 |
+
environment:
|
| 19 |
+
name: github-pages
|
| 20 |
+
url: ${{ steps.deployment.outputs.page_url }}
|
| 21 |
+
runs-on: ubuntu-latest
|
| 22 |
+
steps:
|
| 23 |
+
- uses: actions/checkout@v4
|
| 24 |
+
- uses: actions/configure-pages@v4
|
| 25 |
+
- uses: actions/upload-pages-artifact@v3
|
| 26 |
+
with:
|
| 27 |
+
path: '.'
|
| 28 |
+
- id: deployment
|
| 29 |
+
uses: actions/deploy-pages@v4
|
|
@@ -0,0 +1,23 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Python
|
| 2 |
+
__pycache__/
|
| 3 |
+
*.pyc
|
| 4 |
+
*.pyo
|
| 5 |
+
*.egg-info/
|
| 6 |
+
.venv/
|
| 7 |
+
venv/
|
| 8 |
+
|
| 9 |
+
# Editors
|
| 10 |
+
.vscode/
|
| 11 |
+
.idea/
|
| 12 |
+
*.swp
|
| 13 |
+
.DS_Store
|
| 14 |
+
|
| 15 |
+
# Build artefacts
|
| 16 |
+
dist/
|
| 17 |
+
build/
|
| 18 |
+
node_modules/
|
| 19 |
+
*.log
|
| 20 |
+
|
| 21 |
+
# Local sandbox
|
| 22 |
+
local/
|
| 23 |
+
.cache/
|
|
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
Apache License
|
| 2 |
+
Version 2.0, January 2004
|
| 3 |
+
http://www.apache.org/licenses/
|
| 4 |
+
|
| 5 |
+
Copyright 2026 Carles Marin
|
| 6 |
+
|
| 7 |
+
Licensed under the Apache License, Version 2.0 (the "License");
|
| 8 |
+
you may not use this file except in compliance with the License.
|
| 9 |
+
You may obtain a copy of the License at
|
| 10 |
+
|
| 11 |
+
http://www.apache.org/licenses/LICENSE-2.0
|
| 12 |
+
|
| 13 |
+
Unless required by applicable law or agreed to in writing, software
|
| 14 |
+
distributed under the License is distributed on an "AS IS" BASIS,
|
| 15 |
+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
| 16 |
+
See the License for the specific language governing permissions and
|
| 17 |
+
limitations under the License.
|
|
@@ -0,0 +1,101 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# 🔬 TAF Agent
|
| 2 |
+
|
| 3 |
+
> **Transformer LLM diagnostic in your browser.** Free. Unlimited. Auditable.
|
| 4 |
+
|
| 5 |
+
Drop in a model config (or paste any HuggingFace model id), get a falsifiable answer to *"will it work?"* — backed by the Thermodynamic Attention Framework (TAF) formulas.
|
| 6 |
+
|
| 7 |
+
**🌐 Live demo**: https://transformerkmarin.github.io/tafagent *(once GitHub Pages is enabled)*
|
| 8 |
+
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
## What it does
|
| 12 |
+
|
| 13 |
+
Answers practical viability questions for transformer LLMs, with **zero servers**:
|
| 14 |
+
|
| 15 |
+
- *Will Llama-3-8B serve 32K context with NIAH retrieval?* → **X-2**
|
| 16 |
+
- *Should I train a custom 7B model or use GPT-4 API?* → **X-1**
|
| 17 |
+
- *I have $5K — what model can I afford to train?* → **X-3**
|
| 18 |
+
- *Cheapest GPU to serve Llama-70B at 100M tokens/day?* → **X-5**
|
| 19 |
+
- *Should I use soft KV decay or hard cutoff for compression?* → **X-19**
|
| 20 |
+
|
| 21 |
+
…each as a chain of TAF formulas (paper §17, §19, §20, §24, §26) rendered with full audit trail.
|
| 22 |
+
|
| 23 |
+
## Two modes
|
| 24 |
+
|
| 25 |
+
- **💬 Ask in plain English** → in-browser LLM picks the right recipe and runs it
|
| 26 |
+
- **📋 Recipe + form** → manual selection, full control over every parameter
|
| 27 |
+
|
| 28 |
+
## How it's free + unlimited
|
| 29 |
+
|
| 30 |
+
- Static HTML/JS hosted on **GitHub Pages** (truly unlimited bandwidth)
|
| 31 |
+
- Python TAF computation runs in your browser via **Pyodide** (no server)
|
| 32 |
+
- Plain-English synthesis runs **Llama-3.2-1B-Instruct** in your browser via **WebLLM** (your GPU)
|
| 33 |
+
- Model weights cached in IndexedDB after first load (~700MB, one-time)
|
| 34 |
+
- **Your data never leaves your browser**
|
| 35 |
+
|
| 36 |
+
## Architecture
|
| 37 |
+
|
| 38 |
+
```
|
| 39 |
+
GitHub Pages (HTML/JS)
|
| 40 |
+
↓ (one-time download)
|
| 41 |
+
Your browser:
|
| 42 |
+
├─ Pyodide → Python TAF formulas (CPU, instant)
|
| 43 |
+
└─ WebLLM → Llama-3.2-1B (GPU/CPU, deterministic-ish)
|
| 44 |
+
```
|
| 45 |
+
|
| 46 |
+
## How to add new models
|
| 47 |
+
|
| 48 |
+
1. **Preset list** — 11 popular models curated, instant autofill
|
| 49 |
+
2. **HF Hub fetch** — paste any model id (`Qwen/Qwen2.5-32B`, `meta-llama/Llama-3.3-70B-Instruct`, ...) → browser fetches `config.json` → autofill form
|
| 50 |
+
3. **Manual** — fill the form fields directly
|
| 51 |
+
|
| 52 |
+
Works for any public RoPE / GQA / MHA / SWA / ALiBi / AbsPE model. Gated models (Llama family) require accepting the licence on HF first.
|
| 53 |
+
|
| 54 |
+
## Status
|
| 55 |
+
|
| 56 |
+
- ✅ **Phase 1**: Pyodide + TAF formulas
|
| 57 |
+
- ✅ **Phase 2**: WebLLM synthesis (plain-English answer)
|
| 58 |
+
- ✅ **Phase 3**: Free-form question router (NLU → recipe selection)
|
| 59 |
+
- ✅ **5 recipes**: X-1, X-2, X-3, X-5, X-19
|
| 60 |
+
- 🚧 Phase 4: 15 more recipes (X-4, X-6...X-20) + advanced UI
|
| 61 |
+
|
| 62 |
+
## Local development
|
| 63 |
+
|
| 64 |
+
```bash
|
| 65 |
+
git clone https://github.com/karlesmarin/tafagent
|
| 66 |
+
cd tafagent
|
| 67 |
+
python -m http.server 8000
|
| 68 |
+
# open http://localhost:8000
|
| 69 |
+
```
|
| 70 |
+
|
| 71 |
+
## Browser requirements
|
| 72 |
+
|
| 73 |
+
- Chrome / Edge / Firefox 113+ for WebGPU acceleration (recommended)
|
| 74 |
+
- Older browsers fall back to CPU inference (slower but works)
|
| 75 |
+
- ~2 GB free RAM for Llama-3.2-1B
|
| 76 |
+
- ~700 MB disk for model cache (one-time)
|
| 77 |
+
|
| 78 |
+
## Citation
|
| 79 |
+
|
| 80 |
+
If you use this tool, please cite the underlying paper:
|
| 81 |
+
|
| 82 |
+
```bibtex
|
| 83 |
+
@article{marin2026transformer_thermodynamics,
|
| 84 |
+
author = {Marin, Carles},
|
| 85 |
+
title = {Transformer Thermodynamics: A Closed-Form Theory of Attention Decay,
|
| 86 |
+
Phase Transitions, and Context-Length Limits in RoPE Language Models},
|
| 87 |
+
year = {2026},
|
| 88 |
+
}
|
| 89 |
+
```
|
| 90 |
+
|
| 91 |
+
## License
|
| 92 |
+
|
| 93 |
+
Apache-2.0 (this code). Llama-3.2-1B distributed under the [Meta Llama 3.2 license](https://www.llama.com/llama3_2/license/).
|
| 94 |
+
|
| 95 |
+
---
|
| 96 |
+
|
| 97 |
+
**Acknowledgements**: this tool would not exist without the open-weights commons
|
| 98 |
+
(Meta, Mistral, Qwen, EleutherAI, AI2 and many more), the Pyodide + WebLLM
|
| 99 |
+
projects, GitHub Pages free hosting, and the wider ML community keeping all
|
| 100 |
+
the tooling honest and accessible. Full list in the
|
| 101 |
+
[paper Acknowledgements](https://github.com/karlesmarin/NeurIPS).
|
|
@@ -0,0 +1,108 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<!DOCTYPE html>
|
| 2 |
+
<html lang="en">
|
| 3 |
+
<head>
|
| 4 |
+
<meta charset="UTF-8" />
|
| 5 |
+
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
| 6 |
+
<title>TAF Agent — Transformer Diagnostic in your Browser</title>
|
| 7 |
+
<meta name="description" content="Predict transformer LLM behaviour from config alone. Free, unlimited, runs entirely in your browser." />
|
| 8 |
+
<link rel="stylesheet" href="style.css" />
|
| 9 |
+
<script src="https://cdn.jsdelivr.net/pyodide/v0.26.4/full/pyodide.js"></script>
|
| 10 |
+
</head>
|
| 11 |
+
<body>
|
| 12 |
+
<header>
|
| 13 |
+
<h1>🔬 TAF Agent</h1>
|
| 14 |
+
<p class="tagline">
|
| 15 |
+
Transformer diagnostic in your browser. <strong>Free. Unlimited. Auditable.</strong>
|
| 16 |
+
</p>
|
| 17 |
+
<p class="subtle">
|
| 18 |
+
All computation happens locally — your data never leaves this page.
|
| 19 |
+
</p>
|
| 20 |
+
</header>
|
| 21 |
+
|
| 22 |
+
<main>
|
| 23 |
+
<!-- Status -->
|
| 24 |
+
<section id="status-bar"><div id="status">⏳ Loading Python runtime...</div></section>
|
| 25 |
+
|
| 26 |
+
<!-- Mode toggle -->
|
| 27 |
+
<section id="mode-section">
|
| 28 |
+
<h2>🎯 Mode</h2>
|
| 29 |
+
<div class="mode-tabs">
|
| 30 |
+
<button class="mode-btn active" data-mode="ask">💬 Ask in plain English</button>
|
| 31 |
+
<button class="mode-btn" data-mode="recipe">📋 Pick recipe + fill form</button>
|
| 32 |
+
</div>
|
| 33 |
+
<p id="mode-desc" class="recipe-desc">
|
| 34 |
+
Type a free-form question (e.g. "Will Llama-3-8B work at 32K context?"). The
|
| 35 |
+
in-browser LLM picks the right recipe and runs it.
|
| 36 |
+
</p>
|
| 37 |
+
</section>
|
| 38 |
+
|
| 39 |
+
<!-- Free-form question (mode=ask) -->
|
| 40 |
+
<section id="ask-section">
|
| 41 |
+
<h2>❓ Your question</h2>
|
| 42 |
+
<textarea id="question" rows="3" placeholder="e.g. Will Mistral-7B handle 16K NIAH retrieval? Or: I have $5,000, what model can I train? Or: Cheapest GPU to serve Llama-70B at 100M tokens/day?"></textarea>
|
| 43 |
+
<div style="display:flex; gap:0.5rem; margin-top:0.5rem; flex-wrap:wrap;">
|
| 44 |
+
<button id="ask-btn" disabled>🚀 Analyze</button>
|
| 45 |
+
<button id="example-btn" type="button" class="secondary">💡 Try an example</button>
|
| 46 |
+
</div>
|
| 47 |
+
</section>
|
| 48 |
+
|
| 49 |
+
<!-- Recipe selector (mode=recipe) -->
|
| 50 |
+
<section id="recipe-section" style="display:none;">
|
| 51 |
+
<h2>📋 Recipe</h2>
|
| 52 |
+
<select id="recipe-select" disabled>
|
| 53 |
+
<option value="">— select a recipe —</option>
|
| 54 |
+
</select>
|
| 55 |
+
<p id="recipe-desc-display" class="recipe-desc"></p>
|
| 56 |
+
</section>
|
| 57 |
+
|
| 58 |
+
<!-- Form (mode=recipe) -->
|
| 59 |
+
<section id="form-section" style="display:none;">
|
| 60 |
+
<h2>🎯 Inputs</h2>
|
| 61 |
+
|
| 62 |
+
<div class="form-row">
|
| 63 |
+
<label for="preset">Preset model:</label>
|
| 64 |
+
<select id="preset" disabled>
|
| 65 |
+
<option value="">— select to autofill —</option>
|
| 66 |
+
</select>
|
| 67 |
+
</div>
|
| 68 |
+
|
| 69 |
+
<div class="form-row">
|
| 70 |
+
<label for="hf-id">Or any HF model:</label>
|
| 71 |
+
<input type="text" id="hf-id" placeholder="e.g. Qwen/Qwen2.5-32B-Instruct" style="flex:1;" />
|
| 72 |
+
<button id="hf-fetch-btn" type="button" class="secondary">📥 Fetch</button>
|
| 73 |
+
</div>
|
| 74 |
+
<div id="hf-status" class="subtle" style="margin: -0.5rem 0 1rem; min-height:1.2em;"></div>
|
| 75 |
+
|
| 76 |
+
<!-- Dynamic form fields based on recipe -->
|
| 77 |
+
<div id="dynamic-form" class="form-grid"></div>
|
| 78 |
+
|
| 79 |
+
<button id="run-btn" disabled>🚀 Analyze</button>
|
| 80 |
+
</section>
|
| 81 |
+
|
| 82 |
+
<!-- Output -->
|
| 83 |
+
<section id="output-section" style="display:none;">
|
| 84 |
+
<h2>📊 Verdict</h2>
|
| 85 |
+
<div id="verdict-box"></div>
|
| 86 |
+
|
| 87 |
+
<h2>🔍 Computation Chain</h2>
|
| 88 |
+
<p class="subtle">Every number below is deterministic Python. Click a step to expand.</p>
|
| 89 |
+
<div id="chain-box"></div>
|
| 90 |
+
|
| 91 |
+
<h2 id="answer-header" style="display:none;">💬 Plain-English Answer</h2>
|
| 92 |
+
<div id="answer-box" style="display:none;"></div>
|
| 93 |
+
</section>
|
| 94 |
+
</main>
|
| 95 |
+
|
| 96 |
+
<footer>
|
| 97 |
+
<p>
|
| 98 |
+
© 2026 Carles Marin · Apache-2.0 ·
|
| 99 |
+
<a href="https://github.com/karlesmarin/tafagent" target="_blank">Source on GitHub</a>
|
| 100 |
+
</p>
|
| 101 |
+
<p class="subtle">
|
| 102 |
+
Computation: Pyodide (Python in browser) · Synthesis: WebLLM (Llama-3.2-1B local) · Hosting: GitHub Pages
|
| 103 |
+
</p>
|
| 104 |
+
</footer>
|
| 105 |
+
|
| 106 |
+
<script type="module" src="js/main.js"></script>
|
| 107 |
+
</body>
|
| 108 |
+
</html>
|
|
@@ -0,0 +1,540 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
// TAF Agent — main orchestration (Phases 1-3 complete)
|
| 2 |
+
//
|
| 3 |
+
// Phases:
|
| 4 |
+
// 1. Pyodide loads + TAF formulas → deterministic computation
|
| 5 |
+
// 2. WebLLM loads on demand → plain-English synthesis
|
| 6 |
+
// 3. Router (LLM) → free-form question → recipe + params
|
| 7 |
+
|
| 8 |
+
const TAF_BROWSER_URL = "python/taf_browser.py";
|
| 9 |
+
const ENABLE_WEBLLM = true;
|
| 10 |
+
const WEBLLM_MODEL = "Llama-3.2-1B-Instruct-q4f32_1-MLC";
|
| 11 |
+
|
| 12 |
+
const $ = (id) => document.getElementById(id);
|
| 13 |
+
|
| 14 |
+
const state = {
|
| 15 |
+
pyodide: null,
|
| 16 |
+
webllm: null,
|
| 17 |
+
presets: [],
|
| 18 |
+
recipes: [],
|
| 19 |
+
recipesById: {},
|
| 20 |
+
currentMode: "ask",
|
| 21 |
+
currentRecipe: null,
|
| 22 |
+
};
|
| 23 |
+
|
| 24 |
+
const EXAMPLES = [
|
| 25 |
+
"Will Meta-Llama-3-8B handle 32000-token NIAH retrieval reliably?",
|
| 26 |
+
"I have $5000 to spend on training. What model can I afford?",
|
| 27 |
+
"Should I use Mistral-7B-v0.1 at 16K context or extend it first?",
|
| 28 |
+
"Compare cheapest GPU to serve Llama-3-8B at 10 million tokens per day.",
|
| 29 |
+
"Should I use soft KV decay or hard cutoff for Qwen2.5-7B at 32K?",
|
| 30 |
+
"Is it cheaper to train an 8B custom model or use GPT-4o for 50M tokens/month?",
|
| 31 |
+
];
|
| 32 |
+
|
| 33 |
+
// ════════════════════════════════════════════════════════════════════
|
| 34 |
+
// Bootstrap
|
| 35 |
+
// ════════════════════════════════════════════════════════════════════
|
| 36 |
+
async function loadPyodideAndTaf() {
|
| 37 |
+
setStatus("⏳ Loading Pyodide (Python runtime ~10MB)...");
|
| 38 |
+
state.pyodide = await loadPyodide({
|
| 39 |
+
indexURL: "https://cdn.jsdelivr.net/pyodide/v0.26.4/full/",
|
| 40 |
+
});
|
| 41 |
+
setStatus("⏳ Loading TAF formulas + recipes...");
|
| 42 |
+
const tafCode = await fetch(TAF_BROWSER_URL).then(r => r.text());
|
| 43 |
+
await state.pyodide.runPythonAsync(tafCode);
|
| 44 |
+
|
| 45 |
+
state.presets = JSON.parse(state.pyodide.runPython("list_presets()"));
|
| 46 |
+
state.recipes = JSON.parse(state.pyodide.runPython("list_recipes()"));
|
| 47 |
+
state.recipesById = Object.fromEntries(state.recipes.map(r => [r.id, r]));
|
| 48 |
+
|
| 49 |
+
populatePresets();
|
| 50 |
+
populateRecipes();
|
| 51 |
+
enableUI();
|
| 52 |
+
setStatus("✅ Ready. Ask a question or pick a recipe.");
|
| 53 |
+
}
|
| 54 |
+
|
| 55 |
+
function populatePresets() {
|
| 56 |
+
const sel = $("preset");
|
| 57 |
+
sel.innerHTML = '<option value="">— select to autofill —</option>';
|
| 58 |
+
state.presets.forEach(p => {
|
| 59 |
+
const opt = document.createElement("option");
|
| 60 |
+
opt.value = p.id;
|
| 61 |
+
opt.textContent = `${p.label} (θ=${p.theta.toLocaleString()}, T_train=${p.T_train})`;
|
| 62 |
+
sel.appendChild(opt);
|
| 63 |
+
});
|
| 64 |
+
}
|
| 65 |
+
|
| 66 |
+
function populateRecipes() {
|
| 67 |
+
const sel = $("recipe-select");
|
| 68 |
+
sel.innerHTML = '<option value="">— select a recipe —</option>';
|
| 69 |
+
state.recipes.forEach(r => {
|
| 70 |
+
const opt = document.createElement("option");
|
| 71 |
+
opt.value = r.id;
|
| 72 |
+
opt.textContent = `${r.id} — ${r.name}`;
|
| 73 |
+
sel.appendChild(opt);
|
| 74 |
+
});
|
| 75 |
+
}
|
| 76 |
+
|
| 77 |
+
function enableUI() {
|
| 78 |
+
$("ask-btn").disabled = false;
|
| 79 |
+
$("recipe-select").disabled = false;
|
| 80 |
+
$("preset").disabled = false;
|
| 81 |
+
}
|
| 82 |
+
|
| 83 |
+
function setStatus(msg) { $("status").textContent = msg; }
|
| 84 |
+
|
| 85 |
+
// ════════════════════════════════════════════════════════════════════
|
| 86 |
+
// Mode toggle
|
| 87 |
+
// ════════════════════════════════════════════════════════════════════
|
| 88 |
+
document.querySelectorAll(".mode-btn").forEach(btn => {
|
| 89 |
+
btn.addEventListener("click", () => {
|
| 90 |
+
document.querySelectorAll(".mode-btn").forEach(b => b.classList.remove("active"));
|
| 91 |
+
btn.classList.add("active");
|
| 92 |
+
const mode = btn.dataset.mode;
|
| 93 |
+
state.currentMode = mode;
|
| 94 |
+
if (mode === "ask") {
|
| 95 |
+
$("ask-section").style.display = "";
|
| 96 |
+
$("recipe-section").style.display = "none";
|
| 97 |
+
$("form-section").style.display = "none";
|
| 98 |
+
$("mode-desc").textContent =
|
| 99 |
+
"Type a free-form question. The in-browser LLM picks the right recipe and runs it.";
|
| 100 |
+
} else {
|
| 101 |
+
$("ask-section").style.display = "none";
|
| 102 |
+
$("recipe-section").style.display = "";
|
| 103 |
+
$("mode-desc").textContent =
|
| 104 |
+
"Pick a recipe directly and fill the form. Same result as Ask mode but fully manual.";
|
| 105 |
+
}
|
| 106 |
+
});
|
| 107 |
+
});
|
| 108 |
+
|
| 109 |
+
// ════════════════════════════════════════════════════════════════════
|
| 110 |
+
// Recipe selector
|
| 111 |
+
// ════════════════════════════════════════════════════════════════════
|
| 112 |
+
$("recipe-select").addEventListener("change", (e) => {
|
| 113 |
+
const rid = e.target.value;
|
| 114 |
+
if (!rid) {
|
| 115 |
+
$("form-section").style.display = "none";
|
| 116 |
+
return;
|
| 117 |
+
}
|
| 118 |
+
const r = state.recipesById[rid];
|
| 119 |
+
state.currentRecipe = r;
|
| 120 |
+
$("recipe-desc-display").textContent = r.description;
|
| 121 |
+
$("form-section").style.display = "";
|
| 122 |
+
buildDynamicForm(r);
|
| 123 |
+
});
|
| 124 |
+
|
| 125 |
+
function buildDynamicForm(recipe) {
|
| 126 |
+
const container = $("dynamic-form");
|
| 127 |
+
container.innerHTML = "";
|
| 128 |
+
const defaults = getRecipeDefaults(recipe.id);
|
| 129 |
+
recipe.params.forEach(name => {
|
| 130 |
+
const div = document.createElement("div");
|
| 131 |
+
div.className = "form-field";
|
| 132 |
+
const label = document.createElement("label");
|
| 133 |
+
label.textContent = paramLabel(name);
|
| 134 |
+
label.htmlFor = `param_${name}`;
|
| 135 |
+
const input = document.createElement("input");
|
| 136 |
+
input.type = "text";
|
| 137 |
+
input.id = `param_${name}`;
|
| 138 |
+
input.dataset.param = name;
|
| 139 |
+
input.value = defaults[name] !== undefined ? String(defaults[name]) : "";
|
| 140 |
+
div.appendChild(label);
|
| 141 |
+
div.appendChild(input);
|
| 142 |
+
container.appendChild(div);
|
| 143 |
+
});
|
| 144 |
+
$("run-btn").disabled = false;
|
| 145 |
+
}
|
| 146 |
+
|
| 147 |
+
function paramLabel(name) {
|
| 148 |
+
const labels = {
|
| 149 |
+
theta: "θ (rope_theta)", T_train: "T_train", T_eval: "T_eval (target context)",
|
| 150 |
+
n_attention_heads: "num_attention_heads", n_kv_heads: "num_key_value_heads",
|
| 151 |
+
d_head: "head_dim", n_layers: "num_hidden_layers", n_params: "n_params (e.g. 8e9)",
|
| 152 |
+
has_SWA: "Has SWA? (true/false)",
|
| 153 |
+
N_params: "N_params (e.g. 8e9)", D_tokens: "D_tokens (or empty for Chinchilla)",
|
| 154 |
+
gpu: "GPU", n_gpus: "n_gpus", mfu: "MFU (default 0.45)",
|
| 155 |
+
api_model: "API model to compare", monthly_tokens_M: "Monthly tokens (M)",
|
| 156 |
+
USD_budget: "USD budget", bytes_per_weight: "Bytes per weight (BF16=2)",
|
| 157 |
+
target_tokens_per_day: "Target tokens/day", concurrent_users: "Concurrent users",
|
| 158 |
+
};
|
| 159 |
+
return labels[name] || name;
|
| 160 |
+
}
|
| 161 |
+
|
| 162 |
+
function getRecipeDefaults(recipeId) {
|
| 163 |
+
const D = {
|
| 164 |
+
"X-1": { N_params: "8e9", D_tokens: "", gpu: "H100 SXM", n_gpus: 8, mfu: 0.45,
|
| 165 |
+
api_model: "GPT-4o", monthly_tokens_M: 10.0 },
|
| 166 |
+
"X-2": { theta: 500000, T_train: 8192, T_eval: 32000,
|
| 167 |
+
n_attention_heads: 32, n_kv_heads: 8, d_head: 128,
|
| 168 |
+
n_layers: 32, n_params: "8e9", has_SWA: false },
|
| 169 |
+
"X-3": { USD_budget: 5000, gpu: "H100 SXM", mfu: 0.45, n_gpus: 1 },
|
| 170 |
+
"X-5": { N_params: "8e9", T_eval: 4096, n_layers: 32, n_kv_heads: 8, d_head: 128,
|
| 171 |
+
bytes_per_weight: 2.0, target_tokens_per_day: 10000000, concurrent_users: 1 },
|
| 172 |
+
"X-19": { theta: 500000, T_train: 8192, T_eval: 8192,
|
| 173 |
+
n_attention_heads: 32, n_kv_heads: 8, d_head: 128,
|
| 174 |
+
n_layers: 32, n_params: "8e9", has_SWA: false },
|
| 175 |
+
};
|
| 176 |
+
return D[recipeId] || {};
|
| 177 |
+
}
|
| 178 |
+
|
| 179 |
+
// ════════════════════════════════════════════════════════════════════
|
| 180 |
+
// Preset autofill (works in recipe mode)
|
| 181 |
+
// ════════════════════════════════════════════════════════════════════
|
| 182 |
+
$("preset").addEventListener("change", (e) => {
|
| 183 |
+
if (!e.target.value) return;
|
| 184 |
+
const proxy = state.pyodide.runPython(`get_preset(${JSON.stringify(e.target.value)})`);
|
| 185 |
+
const preset = proxy.toJs ? proxy.toJs({ dict_converter: Object.fromEntries }) : proxy;
|
| 186 |
+
if (!preset || Object.keys(preset).length === 0) return;
|
| 187 |
+
fillRecipeForm(preset);
|
| 188 |
+
});
|
| 189 |
+
|
| 190 |
+
function fillRecipeForm(p) {
|
| 191 |
+
// Fill any matching field in dynamic form
|
| 192 |
+
Object.entries(p).forEach(([k, v]) => {
|
| 193 |
+
const map = {
|
| 194 |
+
theta: "theta", T_train: "T_train",
|
| 195 |
+
n_attention_heads: "n_attention_heads", n_kv_heads: "n_kv_heads",
|
| 196 |
+
d_head: "d_head", n_layers: "n_layers", n_params: "n_params",
|
| 197 |
+
has_SWA: "has_SWA",
|
| 198 |
+
};
|
| 199 |
+
const formId = "param_" + (map[k] || k);
|
| 200 |
+
const el = $(formId);
|
| 201 |
+
if (el) el.value = (typeof v === "number" && (k === "n_params" || v > 1e6))
|
| 202 |
+
? v.toExponential(2) : String(v);
|
| 203 |
+
// Also fill N_params for cost recipes
|
| 204 |
+
if (k === "n_params") {
|
| 205 |
+
const np = $("param_N_params");
|
| 206 |
+
if (np) np.value = (typeof v === "number" ? v.toExponential(2) : String(v));
|
| 207 |
+
}
|
| 208 |
+
});
|
| 209 |
+
}
|
| 210 |
+
|
| 211 |
+
// ════════════════════════════════════════════════════════════════════
|
| 212 |
+
// HF Hub fetch (any model)
|
| 213 |
+
// ════════════════════════════════════════════════════════════════════
|
| 214 |
+
$("hf-fetch-btn").addEventListener("click", async () => {
|
| 215 |
+
const modelId = $("hf-id").value.trim();
|
| 216 |
+
if (!modelId) {
|
| 217 |
+
$("hf-status").textContent = "⚠ Enter a model id like 'Qwen/Qwen2.5-32B-Instruct'";
|
| 218 |
+
return;
|
| 219 |
+
}
|
| 220 |
+
$("hf-status").textContent = `⏳ Fetching config.json from HF Hub for ${modelId}...`;
|
| 221 |
+
$("hf-fetch-btn").disabled = true;
|
| 222 |
+
try {
|
| 223 |
+
const url = `https://huggingface.co/${modelId}/raw/main/config.json`;
|
| 224 |
+
const resp = await fetch(url);
|
| 225 |
+
if (!resp.ok) {
|
| 226 |
+
if (resp.status === 401 || resp.status === 403) {
|
| 227 |
+
throw new Error(`Model is gated (${resp.status}). Accept license on HF Hub first, or fill manually.`);
|
| 228 |
+
}
|
| 229 |
+
throw new Error(`HTTP ${resp.status} — config.json not found`);
|
| 230 |
+
}
|
| 231 |
+
const cfg = await resp.json();
|
| 232 |
+
const preset = configToPreset(cfg, modelId);
|
| 233 |
+
fillRecipeForm(preset);
|
| 234 |
+
$("hf-status").innerHTML = `✅ Config loaded for <strong>${modelId}</strong> (family: ${preset._family}). Verify values, click Analyze.`;
|
| 235 |
+
} catch (err) {
|
| 236 |
+
$("hf-status").textContent = `❌ ${err.message}`;
|
| 237 |
+
} finally {
|
| 238 |
+
$("hf-fetch-btn").disabled = false;
|
| 239 |
+
}
|
| 240 |
+
});
|
| 241 |
+
|
| 242 |
+
function configToPreset(cfg, modelId) {
|
| 243 |
+
const n_attn = cfg.num_attention_heads || cfg.n_head || 0;
|
| 244 |
+
const n_kv = cfg.num_key_value_heads || cfg.num_attention_heads || cfg.n_head || 0;
|
| 245 |
+
const hidden = cfg.hidden_size || cfg.d_model || cfg.n_embd || 0;
|
| 246 |
+
const d_head = cfg.head_dim || (n_attn > 0 ? Math.floor(hidden / n_attn) : 0);
|
| 247 |
+
const theta = cfg.rope_theta || cfg.rotary_emb_base ||
|
| 248 |
+
(cfg.alibi ? null : (cfg.position_embedding_type === "absolute" ? null : 10000));
|
| 249 |
+
const T_train = cfg.max_position_embeddings || cfg.max_sequence_length ||
|
| 250 |
+
cfg.n_positions || cfg.n_ctx || 0;
|
| 251 |
+
const n_layers = cfg.num_hidden_layers || cfg.n_layer || 0;
|
| 252 |
+
const has_SWA = !!(cfg.sliding_window || cfg.use_sliding_window);
|
| 253 |
+
|
| 254 |
+
let family = "rope-mha";
|
| 255 |
+
if (cfg.alibi) family = "alibi";
|
| 256 |
+
else if (cfg.model_type === "mamba" || cfg.model_type === "mamba2") family = "ssm";
|
| 257 |
+
else if (theta == null) family = "abspe";
|
| 258 |
+
else if (n_kv < n_attn) family = "rope-gqa";
|
| 259 |
+
|
| 260 |
+
const n_params_est = estimateParams(cfg);
|
| 261 |
+
return {
|
| 262 |
+
theta: theta || 10000, T_train: T_train || 2048,
|
| 263 |
+
n_attention_heads: n_attn, n_kv_heads: n_kv, d_head: d_head,
|
| 264 |
+
n_layers: n_layers, n_params: n_params_est, has_SWA: has_SWA,
|
| 265 |
+
_family: family, _model_id: modelId,
|
| 266 |
+
};
|
| 267 |
+
}
|
| 268 |
+
|
| 269 |
+
function estimateParams(cfg) {
|
| 270 |
+
const h = cfg.hidden_size || cfg.d_model || 0;
|
| 271 |
+
const L = cfg.num_hidden_layers || cfg.n_layer || 0;
|
| 272 |
+
const V = cfg.vocab_size || 32000;
|
| 273 |
+
return Math.round(12 * h * h * L + 2 * V * h);
|
| 274 |
+
}
|
| 275 |
+
|
| 276 |
+
// ════════════════════════════════════════════════════════════════════
|
| 277 |
+
// Run recipe (manual mode)
|
| 278 |
+
// ════════════════════════════════════════════════════════════════════
|
| 279 |
+
$("run-btn").addEventListener("click", async () => {
|
| 280 |
+
if (!state.currentRecipe) {
|
| 281 |
+
alert("Select a recipe first.");
|
| 282 |
+
return;
|
| 283 |
+
}
|
| 284 |
+
const rid = state.currentRecipe.id;
|
| 285 |
+
const params = collectParams(state.currentRecipe.params);
|
| 286 |
+
await runAndDisplay(rid, params);
|
| 287 |
+
});
|
| 288 |
+
|
| 289 |
+
function collectParams(paramNames) {
|
| 290 |
+
const p = {};
|
| 291 |
+
paramNames.forEach(name => {
|
| 292 |
+
const el = $("param_" + name);
|
| 293 |
+
if (!el || el.value === "") return;
|
| 294 |
+
let v = el.value;
|
| 295 |
+
if (v === "true" || v === "false") {
|
| 296 |
+
p[name] = (v === "true");
|
| 297 |
+
} else if (!isNaN(parseFloat(v)) && isFinite(v)) {
|
| 298 |
+
p[name] = parseFloat(v);
|
| 299 |
+
} else {
|
| 300 |
+
p[name] = v;
|
| 301 |
+
}
|
| 302 |
+
});
|
| 303 |
+
return p;
|
| 304 |
+
}
|
| 305 |
+
|
| 306 |
+
// ════════════════════════════════════════════════════════════════════
|
| 307 |
+
// Ask mode (free-form question via router)
|
| 308 |
+
// ════════════════════════════════════════════════════════════════════
|
| 309 |
+
$("ask-btn").addEventListener("click", async () => {
|
| 310 |
+
const q = $("question").value.trim();
|
| 311 |
+
if (!q) {
|
| 312 |
+
alert("Please type a question.");
|
| 313 |
+
return;
|
| 314 |
+
}
|
| 315 |
+
$("ask-btn").disabled = true;
|
| 316 |
+
setStatus("🤔 Asking the in-browser LLM to pick a recipe...");
|
| 317 |
+
|
| 318 |
+
try {
|
| 319 |
+
const route = await routeQuestion(q);
|
| 320 |
+
setStatus(`📋 Selected recipe ${route.recipe_id}. Running...`);
|
| 321 |
+
await runAndDisplay(route.recipe_id, route.params, q);
|
| 322 |
+
} catch (err) {
|
| 323 |
+
setStatus(`❌ Routing failed: ${err.message}`);
|
| 324 |
+
$("output-section").style.display = "block";
|
| 325 |
+
$("verdict-box").className = "verdict-no";
|
| 326 |
+
$("verdict-box").innerHTML = `<strong>Could not route question.</strong><br>${escapeHtml(err.message)}<br><br>Try the Recipe mode for full manual control.`;
|
| 327 |
+
} finally {
|
| 328 |
+
$("ask-btn").disabled = false;
|
| 329 |
+
}
|
| 330 |
+
});
|
| 331 |
+
|
| 332 |
+
$("example-btn").addEventListener("click", () => {
|
| 333 |
+
const ex = EXAMPLES[Math.floor(Math.random() * EXAMPLES.length)];
|
| 334 |
+
$("question").value = ex;
|
| 335 |
+
});
|
| 336 |
+
|
| 337 |
+
async function routeQuestion(question) {
|
| 338 |
+
const engine = await loadWebLLM();
|
| 339 |
+
const recipesDesc = state.recipes.map(r =>
|
| 340 |
+
` ${r.id}: ${r.name} — ${r.description}\n params: ${r.params.join(", ")}`
|
| 341 |
+
).join("\n");
|
| 342 |
+
const systemPrompt = `You are a routing function. Given a user's free-form question
|
| 343 |
+
about transformer LLM viability, you MUST output a single JSON object with two fields:
|
| 344 |
+
- recipe_id: one of [${state.recipes.map(r => r.id).join(", ")}]
|
| 345 |
+
- params: an object with parameter values inferred from the question
|
| 346 |
+
|
| 347 |
+
Available recipes:
|
| 348 |
+
${recipesDesc}
|
| 349 |
+
|
| 350 |
+
Common model facts you may use:
|
| 351 |
+
Meta-Llama-3-8B: theta=500000, T_train=8192, n_attention_heads=32, n_kv_heads=8, d_head=128, n_layers=32, n_params=8e9
|
| 352 |
+
Mistral-7B-v0.1: theta=10000, T_train=8192, n_attention_heads=32, n_kv_heads=8, d_head=128, n_layers=32, n_params=7e9, has_SWA=true
|
| 353 |
+
Qwen2.5-7B: theta=1000000, T_train=32768, n_attention_heads=28, n_kv_heads=4, d_head=128, n_layers=28, n_params=7.6e9
|
| 354 |
+
Llama-3.3-70B-Instruct: theta=500000, T_train=131072, n_attention_heads=64, n_kv_heads=8, d_head=128, n_layers=80, n_params=70e9
|
| 355 |
+
|
| 356 |
+
Respond with ONLY the JSON object. No prose, no markdown fences, no explanation.`;
|
| 357 |
+
|
| 358 |
+
const reply = await engine.chat.completions.create({
|
| 359 |
+
messages: [
|
| 360 |
+
{ role: "system", content: systemPrompt },
|
| 361 |
+
{ role: "user", content: question },
|
| 362 |
+
],
|
| 363 |
+
max_tokens: 400,
|
| 364 |
+
temperature: 0.0,
|
| 365 |
+
response_format: { type: "json_object" },
|
| 366 |
+
});
|
| 367 |
+
const raw = reply.choices[0].message.content.trim();
|
| 368 |
+
let parsed;
|
| 369 |
+
try {
|
| 370 |
+
parsed = JSON.parse(raw);
|
| 371 |
+
} catch (e) {
|
| 372 |
+
// Try extracting JSON from markdown fences
|
| 373 |
+
const m = raw.match(/\{[\s\S]*\}/);
|
| 374 |
+
if (!m) throw new Error(`LLM returned non-JSON: ${raw.slice(0, 200)}`);
|
| 375 |
+
parsed = JSON.parse(m[0]);
|
| 376 |
+
}
|
| 377 |
+
if (!parsed.recipe_id || !state.recipesById[parsed.recipe_id]) {
|
| 378 |
+
throw new Error(`Unknown recipe: ${parsed.recipe_id}`);
|
| 379 |
+
}
|
| 380 |
+
return parsed;
|
| 381 |
+
}
|
| 382 |
+
|
| 383 |
+
// ════════════════════════════════════════════════════════════════════
|
| 384 |
+
// Run + display + synthesize
|
| 385 |
+
// ════════════════════════════════════════════════════════════════════
|
| 386 |
+
async function runAndDisplay(recipeId, params, originalQuestion=null) {
|
| 387 |
+
setStatus("🧮 Computing TAF chain...");
|
| 388 |
+
state.pyodide.globals.set("__rid", recipeId);
|
| 389 |
+
state.pyodide.globals.set("__params", state.pyodide.toPy(params));
|
| 390 |
+
const resultJSON = state.pyodide.runPython(`
|
| 391 |
+
import json
|
| 392 |
+
result = run_recipe(__rid, **__params)
|
| 393 |
+
json.dumps(result)
|
| 394 |
+
`);
|
| 395 |
+
const result = JSON.parse(resultJSON);
|
| 396 |
+
result._original_question = originalQuestion;
|
| 397 |
+
renderResult(result);
|
| 398 |
+
$("output-section").style.display = "block";
|
| 399 |
+
setStatus("✅ Done. Numbers below.");
|
| 400 |
+
if (ENABLE_WEBLLM) {
|
| 401 |
+
await synthesizeAnswer(result);
|
| 402 |
+
}
|
| 403 |
+
}
|
| 404 |
+
|
| 405 |
+
function renderResult(r) {
|
| 406 |
+
if (r.error) {
|
| 407 |
+
$("verdict-box").className = "verdict-no";
|
| 408 |
+
$("verdict-box").innerHTML = `<strong>Error</strong>: ${escapeHtml(r.error)}`;
|
| 409 |
+
$("chain-box").innerHTML = "";
|
| 410 |
+
return;
|
| 411 |
+
}
|
| 412 |
+
const vBox = $("verdict-box");
|
| 413 |
+
let vClass = "";
|
| 414 |
+
if (r.verdict.startsWith("YES") || r.verdict === "GO") vClass = "verdict-yes";
|
| 415 |
+
else if (r.verdict.startsWith("NO")) vClass = "verdict-no";
|
| 416 |
+
else vClass = "verdict-degraded";
|
| 417 |
+
vBox.className = vClass;
|
| 418 |
+
vBox.innerHTML = `
|
| 419 |
+
<div style="display:flex; justify-content:space-between; align-items:center; margin-bottom:0.5rem;">
|
| 420 |
+
<div style="font-size:1.3rem; font-weight:700;">${escapeHtml(r.verdict)}</div>
|
| 421 |
+
<div class="recipe-tag">${r.recipe_id} — ${escapeHtml(r.recipe_name)}</div>
|
| 422 |
+
</div>
|
| 423 |
+
<div><strong>Reason:</strong> ${escapeHtml(r.reason)}</div>
|
| 424 |
+
${r.mitigation && r.mitigation !== "None required." && r.mitigation !== "None — proceed with Chinchilla-optimal recipe."
|
| 425 |
+
? `<div style="margin-top:0.5rem;"><strong>Action:</strong> ${escapeHtml(r.mitigation)}</div>`
|
| 426 |
+
: ""}
|
| 427 |
+
`;
|
| 428 |
+
|
| 429 |
+
const cBox = $("chain-box");
|
| 430 |
+
cBox.innerHTML = "";
|
| 431 |
+
r.chain.forEach(step => {
|
| 432 |
+
const div = document.createElement("details");
|
| 433 |
+
div.className = "chain-step";
|
| 434 |
+
div.innerHTML = `
|
| 435 |
+
<summary>
|
| 436 |
+
<span><strong>Step ${step.step}</strong> — ${escapeHtml(step.name)}</span>
|
| 437 |
+
<span class="step-section">${escapeHtml(step.section)}</span>
|
| 438 |
+
</summary>
|
| 439 |
+
<div class="step-formula">${escapeHtml(step.formula)}</div>
|
| 440 |
+
<div><strong>Inputs:</strong> ${escapeHtml(JSON.stringify(step.inputs))}</div>
|
| 441 |
+
<div class="step-result"><strong>Result:</strong> ${formatResult(step.result)}</div>
|
| 442 |
+
${step.interpretation ? `<div class="step-interp">${escapeHtml(step.interpretation)}</div>` : ""}
|
| 443 |
+
`;
|
| 444 |
+
cBox.appendChild(div);
|
| 445 |
+
});
|
| 446 |
+
}
|
| 447 |
+
|
| 448 |
+
function formatResult(r) {
|
| 449 |
+
if (r === null || r === undefined) return "n/a (not applicable)";
|
| 450 |
+
if (typeof r === "number") return r.toLocaleString(undefined, { maximumFractionDigits: 4 });
|
| 451 |
+
if (typeof r === "object") return `<pre>${escapeHtml(JSON.stringify(r, null, 2))}</pre>`;
|
| 452 |
+
return String(r);
|
| 453 |
+
}
|
| 454 |
+
|
| 455 |
+
function escapeHtml(s) {
|
| 456 |
+
return String(s)
|
| 457 |
+
.replace(/&/g, "&").replace(/</g, "<").replace(/>/g, ">")
|
| 458 |
+
.replace(/"/g, """).replace(/'/g, "'");
|
| 459 |
+
}
|
| 460 |
+
|
| 461 |
+
// ════════════════════════════════════════════════════════════════════
|
| 462 |
+
// WebLLM (synthesis + router)
|
| 463 |
+
// ════════════════════════════════════════════════════════════════════
|
| 464 |
+
async function loadWebLLM() {
|
| 465 |
+
if (state.webllm) return state.webllm;
|
| 466 |
+
setStatus("⏳ Loading WebLLM library + Llama-3.2-1B (~700MB first time, cached after)...");
|
| 467 |
+
const { CreateMLCEngine } = await import("https://esm.run/@mlc-ai/web-llm");
|
| 468 |
+
state.webllm = await CreateMLCEngine(WEBLLM_MODEL, {
|
| 469 |
+
initProgressCallback: (info) => setStatus(`⏳ ${info.text || "Loading model..."}`),
|
| 470 |
+
});
|
| 471 |
+
return state.webllm;
|
| 472 |
+
}
|
| 473 |
+
|
| 474 |
+
async function synthesizeAnswer(result) {
|
| 475 |
+
$("answer-header").style.display = "block";
|
| 476 |
+
$("answer-box").style.display = "block";
|
| 477 |
+
$("answer-box").innerHTML = '<em style="color:var(--fg-dim);">Generating plain-English summary...</em>';
|
| 478 |
+
|
| 479 |
+
let engine;
|
| 480 |
+
try {
|
| 481 |
+
engine = await loadWebLLM();
|
| 482 |
+
} catch (err) {
|
| 483 |
+
$("answer-box").innerHTML = `<em style="color:var(--warning);">⚠ WebLLM failed: ${escapeHtml(String(err))}<br>Numbers above are still correct.</em>`;
|
| 484 |
+
return;
|
| 485 |
+
}
|
| 486 |
+
const prompt = buildSynthesisPrompt(result);
|
| 487 |
+
let answer = "";
|
| 488 |
+
try {
|
| 489 |
+
const reply = await engine.chat.completions.create({
|
| 490 |
+
messages: [
|
| 491 |
+
{ role: "system", content: "You are a precise transformer LLM diagnostic assistant. Summarise pre-computed TAF results in 4-6 sentences. Cite section numbers. Always recommend an action. Never invent numbers." },
|
| 492 |
+
{ role: "user", content: prompt },
|
| 493 |
+
],
|
| 494 |
+
max_tokens: 400,
|
| 495 |
+
temperature: 0.2,
|
| 496 |
+
});
|
| 497 |
+
answer = reply.choices[0].message.content;
|
| 498 |
+
} catch (err) {
|
| 499 |
+
$("answer-box").innerHTML = `<em style="color:var(--warning);">⚠ Synthesis failed: ${escapeHtml(String(err))}</em>`;
|
| 500 |
+
return;
|
| 501 |
+
}
|
| 502 |
+
$("answer-box").innerHTML = `
|
| 503 |
+
<div style="white-space:pre-wrap; line-height:1.7;">${escapeHtml(answer)}</div>
|
| 504 |
+
<div style="margin-top:0.75rem; font-size:0.85rem; color:var(--fg-dim);">
|
| 505 |
+
↑ Synthesised by Llama-3.2-1B in your browser. Numbers are deterministic Python.
|
| 506 |
+
</div>
|
| 507 |
+
`;
|
| 508 |
+
setStatus("✅ Done.");
|
| 509 |
+
}
|
| 510 |
+
|
| 511 |
+
function buildSynthesisPrompt(r) {
|
| 512 |
+
const numbersBlock = r.chain.map(s =>
|
| 513 |
+
`Step ${s.step} (${s.section}) ${s.name}: ${formatResultPlain(s.result)} — ${s.interpretation || ""}`
|
| 514 |
+
).join("\n");
|
| 515 |
+
return `Recipe: ${r.recipe_id} — ${r.recipe_name}
|
| 516 |
+
${r._original_question ? `User question: "${r._original_question}"\n` : ""}
|
| 517 |
+
Computed chain:
|
| 518 |
+
${numbersBlock}
|
| 519 |
+
|
| 520 |
+
Verdict: ${r.verdict}
|
| 521 |
+
Reason: ${r.reason}
|
| 522 |
+
Action: ${r.mitigation}
|
| 523 |
+
|
| 524 |
+
Summarize for non-technical user in 4-6 sentences. Cite section numbers (§X.Y). Mention verdict and most important action.`;
|
| 525 |
+
}
|
| 526 |
+
|
| 527 |
+
function formatResultPlain(r) {
|
| 528 |
+
if (r === null || r === undefined) return "n/a";
|
| 529 |
+
if (typeof r === "number") return r.toLocaleString(undefined, { maximumFractionDigits: 4 });
|
| 530 |
+
if (typeof r === "object") return JSON.stringify(r);
|
| 531 |
+
return String(r);
|
| 532 |
+
}
|
| 533 |
+
|
| 534 |
+
// ════════════════════════════════════════════════════════════════════
|
| 535 |
+
// Bootstrap
|
| 536 |
+
// ════════════════════════════════════════════════════════════════════
|
| 537 |
+
loadPyodideAndTaf().catch(err => {
|
| 538 |
+
setStatus(`❌ Failed to initialise: ${err.message || err}`);
|
| 539 |
+
console.error(err);
|
| 540 |
+
});
|
|
@@ -0,0 +1,793 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
TAF Browser — Pyodide-compatible TAF formulas + recipes.
|
| 3 |
+
|
| 4 |
+
Pure-Python deterministic computations of TAF (Thermodynamic Attention Framework)
|
| 5 |
+
formulas, plus 5 cross-section recipes for the most common viability questions.
|
| 6 |
+
|
| 7 |
+
Author: Carles Marin <transformerkmarin@gmail.com>
|
| 8 |
+
License: Apache-2.0
|
| 9 |
+
"""
|
| 10 |
+
from __future__ import annotations
|
| 11 |
+
import math
|
| 12 |
+
import json
|
| 13 |
+
|
| 14 |
+
|
| 15 |
+
# ════════════════════════════════════════════════════════════════════════════
|
| 16 |
+
# §26 — γ-Thermodynamics (OUR contribution)
|
| 17 |
+
# ════════════════════════════════════════════════════════════════════════════
|
| 18 |
+
def gamma_pade(theta: float, T_eval: int) -> float:
|
| 19 |
+
"""§26.1 — γ = (2θ - T√2)/(2θ + T√2)"""
|
| 20 |
+
z_sqrt2 = T_eval * math.sqrt(2)
|
| 21 |
+
return (2 * theta - z_sqrt2) / (2 * theta + z_sqrt2)
|
| 22 |
+
|
| 23 |
+
|
| 24 |
+
def gamma_decompose(gamma_pade_val, has_GQA=False, has_SWA=False, n_params=0.0) -> dict:
|
| 25 |
+
"""§26.10 — 5-axis decomposition (n=23 OLS, paper sesión 28)."""
|
| 26 |
+
delta_GQA = +0.11 if has_GQA else 0.0
|
| 27 |
+
delta_SWA = -0.21 if has_SWA else 0.0
|
| 28 |
+
delta_post_IH = -0.15 if n_params >= 4e8 else 0.0
|
| 29 |
+
return {
|
| 30 |
+
"pade_centroid": gamma_pade_val,
|
| 31 |
+
"delta_GQA": delta_GQA,
|
| 32 |
+
"delta_SWA": delta_SWA,
|
| 33 |
+
"delta_post_IH": delta_post_IH,
|
| 34 |
+
"gamma_corrected": gamma_pade_val + delta_GQA + delta_SWA + delta_post_IH,
|
| 35 |
+
}
|
| 36 |
+
|
| 37 |
+
|
| 38 |
+
def d_horizon(theta: float, gamma: float):
|
| 39 |
+
"""§26.2 — d_h = θ(1-γ)√2/(1+γ). None if γ outside (0,1)."""
|
| 40 |
+
if gamma <= 0 or gamma >= 1:
|
| 41 |
+
return None
|
| 42 |
+
return theta * (1 - gamma) * math.sqrt(2) / (1 + gamma)
|
| 43 |
+
|
| 44 |
+
|
| 45 |
+
def l_niah_c(d_horizon_val):
|
| 46 |
+
"""§26.5 — L_NIAH^c = 2·d_horizon."""
|
| 47 |
+
return None if d_horizon_val is None else 2 * d_horizon_val
|
| 48 |
+
|
| 49 |
+
|
| 50 |
+
def chi_susceptibility(gamma: float) -> float:
|
| 51 |
+
"""§26.16 — χ = 1/|γ-1|."""
|
| 52 |
+
return float('inf') if gamma == 1.0 else 1.0 / abs(gamma - 1.0)
|
| 53 |
+
|
| 54 |
+
|
| 55 |
+
def p_hallucinate(L: int, theta: float, gamma: float):
|
| 56 |
+
"""§26.9 — Horizon-overshoot probability."""
|
| 57 |
+
dh = d_horizon(theta, gamma)
|
| 58 |
+
if dh is None or L <= 0:
|
| 59 |
+
return None
|
| 60 |
+
chi = chi_susceptibility(gamma)
|
| 61 |
+
if chi == float('inf'):
|
| 62 |
+
return None
|
| 63 |
+
geom = max(0.0, 1.0 - (dh / L) ** (1 - gamma))
|
| 64 |
+
return geom * (math.sqrt(chi) / (1 + math.sqrt(chi)))
|
| 65 |
+
|
| 66 |
+
|
| 67 |
+
def theta_design(gamma_target: float, T_eval: int) -> float:
|
| 68 |
+
"""§26.3 — θ to land at γ_target at T_eval (Padé inverse)."""
|
| 69 |
+
if gamma_target >= 1 or gamma_target <= -1:
|
| 70 |
+
raise ValueError("gamma_target must be in (-1, 1)")
|
| 71 |
+
return T_eval * math.sqrt(2) * (1 + gamma_target) / (2 * (1 - gamma_target))
|
| 72 |
+
|
| 73 |
+
|
| 74 |
+
def alpha_opt(gamma_target: float, T_eval: int, theta_nominal: float) -> float:
|
| 75 |
+
"""§26.4 — α = θ_design / θ_nominal."""
|
| 76 |
+
return theta_design(gamma_target, T_eval) / theta_nominal
|
| 77 |
+
|
| 78 |
+
|
| 79 |
+
def df_window(gamma: float, N: int, f: float = 0.90):
|
| 80 |
+
"""§26.7 — KV compression window. None outside [0.65, 0.85] zone."""
|
| 81 |
+
if not (0.65 <= gamma <= 0.85):
|
| 82 |
+
return None
|
| 83 |
+
if gamma >= 1:
|
| 84 |
+
return int(f * N)
|
| 85 |
+
inner = (1 - f) + f * N ** (1 - gamma)
|
| 86 |
+
return int(math.ceil(inner ** (1 / (1 - gamma))))
|
| 87 |
+
|
| 88 |
+
|
| 89 |
+
def kv_soft_decay_regime(theta: float, gamma: float, T_train: int) -> str:
|
| 90 |
+
"""§26.8 — Soft decay régimen-bound. d_h ≳ T_train/2 ⇒ applies."""
|
| 91 |
+
dh = d_horizon(theta, gamma)
|
| 92 |
+
if dh is None:
|
| 93 |
+
return "use-hard-cutoff"
|
| 94 |
+
ratio = dh / max(1, T_train / 2)
|
| 95 |
+
if ratio >= 1.2:
|
| 96 |
+
return "applies"
|
| 97 |
+
if ratio >= 0.8:
|
| 98 |
+
return "borderline"
|
| 99 |
+
return "use-hard-cutoff"
|
| 100 |
+
|
| 101 |
+
|
| 102 |
+
# ════════════════════════════════════════════════════════════════════════════
|
| 103 |
+
# §17 — Pre-training viability formulas
|
| 104 |
+
# ════════════════════════════════════════════════════════════════════════════
|
| 105 |
+
def chinchilla_optimal_tokens(N_params: float, ratio: float = 20.0) -> float:
|
| 106 |
+
"""§17.30 — Chinchilla 20:1 token budget. D = ratio · N."""
|
| 107 |
+
return ratio * N_params
|
| 108 |
+
|
| 109 |
+
|
| 110 |
+
def chinchilla_optimal_N(D_tokens: float, ratio: float = 20.0) -> float:
|
| 111 |
+
"""§17.30 inverse — given D tokens, optimal N = D/20."""
|
| 112 |
+
return D_tokens / ratio
|
| 113 |
+
|
| 114 |
+
|
| 115 |
+
def training_flops(N_params: float, D_tokens: float) -> float:
|
| 116 |
+
"""§17.10 — C ≈ 6·N·D total training FLOPs."""
|
| 117 |
+
return 6 * N_params * D_tokens
|
| 118 |
+
|
| 119 |
+
|
| 120 |
+
def training_memory_16N(N_params: float) -> dict:
|
| 121 |
+
"""§17.20 — total memory ≈ 16·N bytes (model + grads + Adam moments)."""
|
| 122 |
+
bytes_total = 16 * N_params
|
| 123 |
+
return {
|
| 124 |
+
"bytes": bytes_total,
|
| 125 |
+
"GB": bytes_total / 1e9,
|
| 126 |
+
}
|
| 127 |
+
|
| 128 |
+
|
| 129 |
+
def emergent_threshold(N_params: float) -> str:
|
| 130 |
+
"""§17.60 — capability threshold heuristic (Wei 2022)."""
|
| 131 |
+
if N_params >= 1e11:
|
| 132 |
+
return "above 100B — strong reasoning capabilities expected"
|
| 133 |
+
if N_params >= 1e10:
|
| 134 |
+
return "above 10B — most emergent capabilities present"
|
| 135 |
+
if N_params >= 1e9:
|
| 136 |
+
return "above 1B — basic instruction-following, not strong reasoning"
|
| 137 |
+
if N_params >= 1e8:
|
| 138 |
+
return "above 100M — useful for narrow tasks, no emergence"
|
| 139 |
+
return "below 100M — domain-specific tasks only"
|
| 140 |
+
|
| 141 |
+
|
| 142 |
+
# ════════════════════════════════════════════════════════════════════════════
|
| 143 |
+
# §19 — Inference economics
|
| 144 |
+
# ════════════════════════════════════════════════════════════════════════════
|
| 145 |
+
def kv_cache_memory(n_layers, n_kv_heads, d_head, seq_len, bytes_per_element=2.0) -> dict:
|
| 146 |
+
"""§19.1 — bytes = 2·L·n_kv·d_h·seq·B."""
|
| 147 |
+
bytes_total = 2 * n_layers * n_kv_heads * d_head * seq_len * bytes_per_element
|
| 148 |
+
return {"bytes": bytes_total, "MB": bytes_total / 1e6, "GB": bytes_total / 1e9}
|
| 149 |
+
|
| 150 |
+
|
| 151 |
+
def model_weights_memory(N_params, bytes_per_element=2.0) -> dict:
|
| 152 |
+
"""Inference memory for model weights only (BF16=2, INT8=1, INT4=0.5)."""
|
| 153 |
+
return {"GB": N_params * bytes_per_element / 1e9}
|
| 154 |
+
|
| 155 |
+
|
| 156 |
+
def inference_decode_throughput(N_params, hbm_GB_per_s, bytes_per_element=2.0) -> float:
|
| 157 |
+
"""§19.7 — memory-bound decode: tokens/sec = HBM_BW / model_size."""
|
| 158 |
+
model_GB = N_params * bytes_per_element / 1e9
|
| 159 |
+
return hbm_GB_per_s / model_GB
|
| 160 |
+
|
| 161 |
+
|
| 162 |
+
# ════════════════════════════════════════════════════════════════════════════
|
| 163 |
+
# §20 — Hardware catalog (curated from vendor docs 2026)
|
| 164 |
+
# ════════════════════════════════════════════════════════════════════════════
|
| 165 |
+
GPU_CATALOG = {
|
| 166 |
+
# name: {bf16_TFLOPs, hbm_GB, hbm_GB_s, cloud_USD_per_h_spot, tdp_W}
|
| 167 |
+
"H100 SXM": {"flops": 989, "vram_GB": 80, "bw_GB_s": 3350, "usd_h": 2.5, "tdp": 700},
|
| 168 |
+
"H100 PCIe": {"flops": 756, "vram_GB": 80, "bw_GB_s": 2000, "usd_h": 2.0, "tdp": 350},
|
| 169 |
+
"H200": {"flops": 989, "vram_GB": 141, "bw_GB_s": 4800, "usd_h": 3.5, "tdp": 700},
|
| 170 |
+
"B200": {"flops": 2250, "vram_GB": 192, "bw_GB_s": 8000, "usd_h": 5.0, "tdp": 1000},
|
| 171 |
+
"A100 80GB": {"flops": 312, "vram_GB": 80, "bw_GB_s": 2000, "usd_h": 1.2, "tdp": 400},
|
| 172 |
+
"A100 40GB": {"flops": 312, "vram_GB": 40, "bw_GB_s": 1555, "usd_h": 1.0, "tdp": 400},
|
| 173 |
+
"L40S": {"flops": 362, "vram_GB": 48, "bw_GB_s": 864, "usd_h": 0.7, "tdp": 350},
|
| 174 |
+
"MI300X": {"flops": 1307, "vram_GB": 192, "bw_GB_s": 5300, "usd_h": 2.1, "tdp": 750},
|
| 175 |
+
"RTX 4090": {"flops": 165, "vram_GB": 24, "bw_GB_s": 1008, "usd_h": 0.4, "tdp": 450},
|
| 176 |
+
"RTX 5090": {"flops": 419, "vram_GB": 32, "bw_GB_s": 1792, "usd_h": 0.7, "tdp": 575},
|
| 177 |
+
"RTX 5060Ti":{"flops": 36, "vram_GB": 16, "bw_GB_s": 448, "usd_h": 0.0, "tdp": 180}, # local
|
| 178 |
+
}
|
| 179 |
+
|
| 180 |
+
|
| 181 |
+
def cost_per_training_run(N_params: float, D_tokens: float, gpu: str = "H100 SXM",
|
| 182 |
+
n_gpus: int = 8, mfu: float = 0.45) -> dict:
|
| 183 |
+
"""§20.11 — cost = (flops_total / (peak·MFU·n_gpus)) · USD/h."""
|
| 184 |
+
info = GPU_CATALOG.get(gpu)
|
| 185 |
+
if info is None:
|
| 186 |
+
return {"error": f"unknown gpu '{gpu}'", "available": list(GPU_CATALOG.keys())}
|
| 187 |
+
total_flops = training_flops(N_params, D_tokens) # absolute FLOPs
|
| 188 |
+
effective_flops_per_sec = info["flops"] * 1e12 * mfu * n_gpus
|
| 189 |
+
seconds = total_flops / effective_flops_per_sec
|
| 190 |
+
hours = seconds / 3600
|
| 191 |
+
usd = hours * info["usd_h"] * n_gpus
|
| 192 |
+
return {
|
| 193 |
+
"total_FLOPs": total_flops,
|
| 194 |
+
"hours": hours,
|
| 195 |
+
"days": hours / 24,
|
| 196 |
+
"USD": usd,
|
| 197 |
+
"gpu": gpu, "n_gpus": n_gpus, "mfu": mfu,
|
| 198 |
+
}
|
| 199 |
+
|
| 200 |
+
|
| 201 |
+
def cost_per_inference_token(model_GB: float, gpu: str, batch: int = 1) -> dict:
|
| 202 |
+
"""§19.9 / §20.12 — derived $/Mtok from memory-bound decode."""
|
| 203 |
+
info = GPU_CATALOG.get(gpu)
|
| 204 |
+
if info is None:
|
| 205 |
+
return {"error": f"unknown gpu '{gpu}'"}
|
| 206 |
+
tok_per_sec = info["bw_GB_s"] / model_GB * batch
|
| 207 |
+
sec_per_Mtok = 1e6 / tok_per_sec
|
| 208 |
+
h_per_Mtok = sec_per_Mtok / 3600
|
| 209 |
+
usd_per_Mtok = h_per_Mtok * info["usd_h"]
|
| 210 |
+
return {
|
| 211 |
+
"tok_per_sec": tok_per_sec,
|
| 212 |
+
"USD_per_Mtok": usd_per_Mtok,
|
| 213 |
+
"gpu": gpu, "batch": batch,
|
| 214 |
+
}
|
| 215 |
+
|
| 216 |
+
|
| 217 |
+
# ════════════════════════════════════���═══════════════════════════════════════
|
| 218 |
+
# §24 — Cost / ROI
|
| 219 |
+
# ════════════════════════════════════════════════════════════════════════════
|
| 220 |
+
API_PRICING = {
|
| 221 |
+
# USD per million tokens (input/output blended typical)
|
| 222 |
+
"GPT-4o": {"input": 2.5, "output": 10.0},
|
| 223 |
+
"GPT-4o-mini": {"input": 0.15, "output": 0.60},
|
| 224 |
+
"Claude-Opus-4": {"input": 15.0, "output": 75.0},
|
| 225 |
+
"Claude-Sonnet-4":{"input": 3.0, "output": 15.0},
|
| 226 |
+
"Claude-Haiku-4": {"input": 0.80, "output": 4.0},
|
| 227 |
+
"Gemini-1.5-Pro": {"input": 1.25, "output": 5.0},
|
| 228 |
+
"DeepSeek-V3": {"input": 0.27, "output": 1.10},
|
| 229 |
+
"Llama-3.3-70B (Together)": {"input": 0.88, "output": 0.88},
|
| 230 |
+
}
|
| 231 |
+
|
| 232 |
+
|
| 233 |
+
def break_even_volume(training_cost: float, self_inference_per_Mtok: float,
|
| 234 |
+
api_per_Mtok: float, blend_input_output: float = 0.5) -> dict:
|
| 235 |
+
"""§24.3 — monthly tokens at which custom training pays off."""
|
| 236 |
+
savings_per_Mtok = api_per_Mtok - self_inference_per_Mtok
|
| 237 |
+
if savings_per_Mtok <= 0:
|
| 238 |
+
return {"error": "self-host more expensive than API per token; never breaks even"}
|
| 239 |
+
Mtok_breakeven = training_cost / savings_per_Mtok
|
| 240 |
+
return {
|
| 241 |
+
"savings_per_Mtok": savings_per_Mtok,
|
| 242 |
+
"Mtok_breakeven": Mtok_breakeven,
|
| 243 |
+
"tokens_breakeven": Mtok_breakeven * 1e6,
|
| 244 |
+
}
|
| 245 |
+
|
| 246 |
+
|
| 247 |
+
# ════════════════════════════════════════════════════════════════════════════
|
| 248 |
+
# RECIPES
|
| 249 |
+
# ════════════════════════════════════════════════════════════════════════════
|
| 250 |
+
|
| 251 |
+
# ─────────────────────────────────────────────────────────────────────
|
| 252 |
+
# X-2 — Long Context Viability
|
| 253 |
+
# ─────────────────────────────────────────────────────────────────────
|
| 254 |
+
def run_recipe_x2(theta, T_train, T_eval, n_attention_heads, n_kv_heads,
|
| 255 |
+
d_head, n_layers, n_params, has_SWA=False,
|
| 256 |
+
bytes_per_element=2.0, **_unused):
|
| 257 |
+
"""X-2: will model M serve length L doing NIAH retrieval?"""
|
| 258 |
+
chain = []
|
| 259 |
+
g_pade = gamma_pade(theta, T_eval)
|
| 260 |
+
chain.append(_step(1, "§26.1", "γ_Padé", "γ = (2θ - T√2)/(2θ + T√2)",
|
| 261 |
+
{"theta": theta, "T_eval": T_eval}, g_pade,
|
| 262 |
+
_phase_label(g_pade)))
|
| 263 |
+
|
| 264 |
+
has_GQA = (n_kv_heads < n_attention_heads)
|
| 265 |
+
decomp = gamma_decompose(g_pade, has_GQA=has_GQA, has_SWA=has_SWA, n_params=n_params)
|
| 266 |
+
g_corr = decomp["gamma_corrected"]
|
| 267 |
+
chain.append(_step(2, "§26.10", "γ-decomposition", "γ + δ_GQA + δ_SWA + δ_post_IH",
|
| 268 |
+
{"has_GQA": has_GQA, "has_SWA": has_SWA, "n_params": n_params},
|
| 269 |
+
g_corr, breakdown=decomp))
|
| 270 |
+
|
| 271 |
+
dh = d_horizon(theta, g_corr)
|
| 272 |
+
chain.append(_step(3, "§26.2", "d_horizon", "d_h = θ(1-γ)√2/(1+γ)",
|
| 273 |
+
{"theta": theta, "gamma": g_corr}, dh,
|
| 274 |
+
"n/a — γ outside (0,1)" if dh is None else f"horizon at d={dh:.0f}"))
|
| 275 |
+
|
| 276 |
+
l_niah = l_niah_c(dh)
|
| 277 |
+
chain.append(_step(4, "§26.5", "L_NIAH^c", "L_NIAH^c = 2·d_horizon",
|
| 278 |
+
{"d_horizon": dh}, l_niah,
|
| 279 |
+
"n/a" if l_niah is None else f"NIAH 50% at L={l_niah:.0f}"))
|
| 280 |
+
|
| 281 |
+
p_hallu = p_hallucinate(T_eval, theta, g_corr)
|
| 282 |
+
chain.append(_step(5, "§26.9", "P_hallucinate", "max(0,1-(d_h/L)^(1-γ))·√χ/(1+√χ)",
|
| 283 |
+
{"L": T_eval, "theta": theta, "gamma": g_corr}, p_hallu,
|
| 284 |
+
"n/a (Phase B)" if p_hallu is None else f"{p_hallu*100:.1f}% predicted"))
|
| 285 |
+
|
| 286 |
+
kv = kv_cache_memory(n_layers, n_kv_heads, d_head, T_eval, bytes_per_element)
|
| 287 |
+
chain.append(_step(6, "§19.1", "KV cache memory", "2·L·n_kv·d_h·seq·B",
|
| 288 |
+
{"n_layers": n_layers, "n_kv_heads": n_kv_heads, "d_head": d_head,
|
| 289 |
+
"seq_len": T_eval, "bytes_per_element": bytes_per_element},
|
| 290 |
+
kv, f"{kv['GB']:.2f} GB per request"))
|
| 291 |
+
|
| 292 |
+
if g_corr <= 0 or g_corr >= 1:
|
| 293 |
+
verdict, reason = "NO", "Phase B / geometric collapse (γ_corrected outside (0,1))"
|
| 294 |
+
mit = (f"Apply NTK-aware extension. Required θ for γ=0.85: "
|
| 295 |
+
f"{theta_design(0.85, T_eval):,.0f}. α_opt = {alpha_opt(0.85, T_eval, theta):.2f} "
|
| 296 |
+
f"({'fine-tuning required' if alpha_opt(0.85, T_eval, theta) > 8 else 'zero-shot may work'}).")
|
| 297 |
+
elif dh is not None and T_eval < dh:
|
| 298 |
+
margin = (1 - T_eval / dh) * 100
|
| 299 |
+
verdict, reason = "YES", f"L={T_eval} inside d_horizon={dh:.0f} ({margin:.0f}% margin)."
|
| 300 |
+
mit = "None required."
|
| 301 |
+
elif dh is not None and T_eval < l_niah:
|
| 302 |
+
verdict, reason = "DEGRADED", f"L between d_horizon ({dh:.0f}) and L_NIAH^c ({l_niah:.0f})."
|
| 303 |
+
mit = "Consider context contraction OR NTK extension."
|
| 304 |
+
else:
|
| 305 |
+
verdict, reason = "NO", f"L={T_eval} exceeds NIAH ceiling {l_niah:.0f}."
|
| 306 |
+
mit = f"Apply NTK extension; need θ ≈ {theta_design(0.85, T_eval):,.0f} for γ=0.85."
|
| 307 |
+
|
| 308 |
+
return _wrap("X-2", "Long Context Viability", locals(), chain, verdict, reason, mit)
|
| 309 |
+
|
| 310 |
+
|
| 311 |
+
# ─────────────────────────────────────────────────────────────────────
|
| 312 |
+
# X-1 — Custom training vs API for a domain task
|
| 313 |
+
# ─────────────────────────────────────────────────────────────────────
|
| 314 |
+
def run_recipe_x1(N_params, D_tokens=None, gpu="H100 SXM", n_gpus=8, mfu=0.45,
|
| 315 |
+
api_model="GPT-4o", monthly_tokens_M=10.0, **_unused):
|
| 316 |
+
"""X-1: custom training (Chinchilla optimal) vs API."""
|
| 317 |
+
chain = []
|
| 318 |
+
|
| 319 |
+
# Step 1: Chinchilla optimal D
|
| 320 |
+
if D_tokens is None:
|
| 321 |
+
D_tokens = chinchilla_optimal_tokens(N_params)
|
| 322 |
+
chain.append(_step(1, "§17.30", "Chinchilla optimal D", "D = 20·N",
|
| 323 |
+
{"N_params": N_params}, D_tokens,
|
| 324 |
+
f"recommended D = {D_tokens:.2e} tokens"))
|
| 325 |
+
|
| 326 |
+
# Step 2: training FLOPs
|
| 327 |
+
flops = training_flops(N_params, D_tokens)
|
| 328 |
+
chain.append(_step(2, "§17.10", "Training FLOPs", "C = 6·N·D",
|
| 329 |
+
{"N": N_params, "D": D_tokens}, flops,
|
| 330 |
+
f"{flops:.2e} FLOPs total"))
|
| 331 |
+
|
| 332 |
+
# Step 3: training cost
|
| 333 |
+
cost = cost_per_training_run(N_params, D_tokens, gpu=gpu, n_gpus=n_gpus, mfu=mfu)
|
| 334 |
+
chain.append(_step(3, "§20.11", "Training cost",
|
| 335 |
+
"hours·USD/h·n_gpus = total $",
|
| 336 |
+
{"gpu": gpu, "n_gpus": n_gpus, "mfu": mfu}, cost,
|
| 337 |
+
f"${cost['USD']:,.0f} over {cost['days']:.1f} days"))
|
| 338 |
+
|
| 339 |
+
# Step 4: model_GB and decode throughput
|
| 340 |
+
model_GB = N_params * 2 / 1e9 # BF16
|
| 341 |
+
inf = cost_per_inference_token(model_GB, gpu, batch=1)
|
| 342 |
+
chain.append(_step(4, "§19.9 / §20.12", "Self-inference $/Mtok",
|
| 343 |
+
"BW / model_GB → tok/s → $/Mtok",
|
| 344 |
+
{"model_GB": model_GB, "gpu": gpu}, inf,
|
| 345 |
+
f"${inf['USD_per_Mtok']:.2f} per million tokens (single user)"))
|
| 346 |
+
|
| 347 |
+
# Step 5: API blended price
|
| 348 |
+
api = API_PRICING.get(api_model, {"input": 2.0, "output": 8.0})
|
| 349 |
+
api_blend = (api["input"] + api["output"]) / 2
|
| 350 |
+
chain.append(_step(5, "§24.X", f"{api_model} blended price",
|
| 351 |
+
"(input + output) / 2 USD/Mtok",
|
| 352 |
+
{"api_model": api_model}, api_blend,
|
| 353 |
+
f"${api_blend:.2f}/Mtok blended"))
|
| 354 |
+
|
| 355 |
+
# Step 6: break-even
|
| 356 |
+
be = break_even_volume(cost["USD"], inf["USD_per_Mtok"], api_blend)
|
| 357 |
+
chain.append(_step(6, "§24.3", "Break-even tokens", "training$ / (api - self) = Mtok",
|
| 358 |
+
{"training_cost": cost["USD"]}, be,
|
| 359 |
+
_be_interp(be, monthly_tokens_M)))
|
| 360 |
+
|
| 361 |
+
# Verdict
|
| 362 |
+
if "error" in be:
|
| 363 |
+
verdict, reason = "NO", be["error"]
|
| 364 |
+
mit = f"Stick with {api_model} API."
|
| 365 |
+
elif monthly_tokens_M >= be["Mtok_breakeven"]:
|
| 366 |
+
verdict = "YES (custom)"
|
| 367 |
+
months_to_payoff = be["Mtok_breakeven"] / monthly_tokens_M
|
| 368 |
+
reason = (f"At {monthly_tokens_M} M tokens/month, break-even in "
|
| 369 |
+
f"{months_to_payoff:.1f} months. Long-term custom is cheaper.")
|
| 370 |
+
mit = f"Train at {gpu}×{n_gpus}; serve self-hosted."
|
| 371 |
+
else:
|
| 372 |
+
months = be["Mtok_breakeven"] / monthly_tokens_M
|
| 373 |
+
verdict = "NO (API)"
|
| 374 |
+
reason = (f"At {monthly_tokens_M} M tokens/month, break-even in "
|
| 375 |
+
f"{months:.1f} months — too slow.")
|
| 376 |
+
mit = f"Use {api_model} API (cheaper for your volume)."
|
| 377 |
+
|
| 378 |
+
return _wrap("X-1", "Custom training vs API", locals(), chain, verdict, reason, mit)
|
| 379 |
+
|
| 380 |
+
|
| 381 |
+
def _be_interp(be, monthly):
|
| 382 |
+
if "error" in be:
|
| 383 |
+
return be["error"]
|
| 384 |
+
months = be["Mtok_breakeven"] / max(monthly, 0.001)
|
| 385 |
+
return f"break-even at {be['Mtok_breakeven']:.0f} Mtok ({months:.1f} months at {monthly} M/mo)"
|
| 386 |
+
|
| 387 |
+
|
| 388 |
+
# ─────────────────────────────────────────────────────────────────────
|
| 389 |
+
# X-3 — Pre-flight check on $5K training budget
|
| 390 |
+
# ────────────────────────────────────────────────────────────────────���
|
| 391 |
+
def run_recipe_x3(USD_budget=5000.0, gpu="H100 SXM", mfu=0.45, n_gpus=1, **_unused):
|
| 392 |
+
"""X-3: given $ budget, what model can I train?"""
|
| 393 |
+
chain = []
|
| 394 |
+
info = GPU_CATALOG[gpu]
|
| 395 |
+
|
| 396 |
+
# Step 1: GPU-hours we can afford
|
| 397 |
+
hours = USD_budget / (info["usd_h"] * n_gpus)
|
| 398 |
+
chain.append(_step(1, "§20.11", "Affordable GPU-hours", "USD / ($/h·n_gpus)",
|
| 399 |
+
{"USD": USD_budget, "gpu": gpu, "n_gpus": n_gpus}, hours,
|
| 400 |
+
f"{hours:.0f} GPU-hours total ({hours/24:.1f} days at full use)"))
|
| 401 |
+
|
| 402 |
+
# Step 2: max FLOPs
|
| 403 |
+
max_flops = info["flops"] * 1e12 * mfu * n_gpus * hours * 3600
|
| 404 |
+
chain.append(_step(2, "§17.10", "Max training FLOPs",
|
| 405 |
+
"peak·MFU·n_gpus·seconds",
|
| 406 |
+
{"peak_TFLOPs": info["flops"], "MFU": mfu}, max_flops,
|
| 407 |
+
f"{max_flops:.2e} effective FLOPs"))
|
| 408 |
+
|
| 409 |
+
# Step 3: Chinchilla-optimal N (with D=20N)
|
| 410 |
+
# 6·N·D = max_flops, D=20N → 120·N² = max_flops → N = sqrt(max_flops/120)
|
| 411 |
+
N_chinchilla = math.sqrt(max_flops / 120)
|
| 412 |
+
D_chinchilla = 20 * N_chinchilla
|
| 413 |
+
chain.append(_step(3, "§17.30", "Chinchilla-optimal N",
|
| 414 |
+
"N = √(C/120) at D=20N", {"max_FLOPs": max_flops},
|
| 415 |
+
N_chinchilla,
|
| 416 |
+
f"N ≈ {N_chinchilla:.2e} params with D = {D_chinchilla:.2e} tokens"))
|
| 417 |
+
|
| 418 |
+
# Step 4: emergence check
|
| 419 |
+
emerg = emergent_threshold(N_chinchilla)
|
| 420 |
+
chain.append(_step(4, "§17.60", "Emergence threshold", "Wei 2022 capability",
|
| 421 |
+
{"N": N_chinchilla}, emerg, emerg))
|
| 422 |
+
|
| 423 |
+
# Step 5: memory budget check
|
| 424 |
+
mem = training_memory_16N(N_chinchilla)
|
| 425 |
+
fits = mem["GB"] <= info["vram_GB"]
|
| 426 |
+
chain.append(_step(5, "§17.20", "16N training memory",
|
| 427 |
+
"model + grads + AdamW",
|
| 428 |
+
{"N": N_chinchilla}, mem,
|
| 429 |
+
f"{mem['GB']:.1f} GB needed; "
|
| 430 |
+
f"{'fits in ' if fits else 'EXCEEDS '}{info['vram_GB']} GB VRAM"))
|
| 431 |
+
|
| 432 |
+
# Verdict
|
| 433 |
+
if N_chinchilla < 1e8:
|
| 434 |
+
verdict, reason = "TINY-MODEL", f"Budget supports only ~{N_chinchilla:.0e} params"
|
| 435 |
+
mit = "Use LoRA fine-tuning of larger pretrained model instead."
|
| 436 |
+
elif not fits:
|
| 437 |
+
verdict, reason = "MEMORY-LIMITED", f"Chinchilla N ({N_chinchilla:.1e}) doesn't fit one {gpu}"
|
| 438 |
+
mit = f"Use ZeRO-3 across multiple GPUs (need ≥{math.ceil(mem['GB']/info['vram_GB'])}× {gpu}) OR train smaller N undertrained."
|
| 439 |
+
else:
|
| 440 |
+
verdict = "GO"
|
| 441 |
+
reason = (f"At ${USD_budget}, train {N_chinchilla:.1e}-param model on "
|
| 442 |
+
f"{D_chinchilla:.1e} tokens in ~{hours/24:.1f} days. "
|
| 443 |
+
f"Capability tier: {emerg.split('—')[0].strip()}.")
|
| 444 |
+
mit = "None — proceed with Chinchilla-optimal recipe."
|
| 445 |
+
|
| 446 |
+
return _wrap("X-3", "Budget pre-flight", locals(), chain, verdict, reason, mit)
|
| 447 |
+
|
| 448 |
+
|
| 449 |
+
# ─────────────────────────────────────────────────────────────────────
|
| 450 |
+
# X-5 — Hardware selection for serving
|
| 451 |
+
# ─────────────────────────────────────────────────────────────────────
|
| 452 |
+
def run_recipe_x5(N_params, T_eval=4096, n_layers=32, n_kv_heads=8, d_head=128,
|
| 453 |
+
bytes_per_weight=2.0, target_tokens_per_day=10_000_000.0,
|
| 454 |
+
concurrent_users=1, **_unused):
|
| 455 |
+
"""X-5: which GPU should I use to serve N-param model at L context?"""
|
| 456 |
+
chain = []
|
| 457 |
+
|
| 458 |
+
# Step 1: weights memory
|
| 459 |
+
w_mem = model_weights_memory(N_params, bytes_per_weight)
|
| 460 |
+
chain.append(_step(1, "§19.X", "Model weights memory",
|
| 461 |
+
"N · bytes_per_weight",
|
| 462 |
+
{"N": N_params, "bytes": bytes_per_weight}, w_mem,
|
| 463 |
+
f"{w_mem['GB']:.1f} GB for weights"))
|
| 464 |
+
|
| 465 |
+
# Step 2: KV cache per request
|
| 466 |
+
kv = kv_cache_memory(n_layers, n_kv_heads, d_head, T_eval, bytes_per_weight)
|
| 467 |
+
chain.append(_step(2, "§19.1", "KV cache (per request)",
|
| 468 |
+
"2·L·n_kv·d_h·seq·B",
|
| 469 |
+
{"n_layers": n_layers, "n_kv": n_kv_heads,
|
| 470 |
+
"d_head": d_head, "seq": T_eval}, kv,
|
| 471 |
+
f"{kv['GB']:.2f} GB per concurrent request"))
|
| 472 |
+
|
| 473 |
+
# Step 3: total memory needed
|
| 474 |
+
total_GB = w_mem["GB"] + kv["GB"] * concurrent_users
|
| 475 |
+
chain.append(_step(3, "§20.3", "Total GPU memory",
|
| 476 |
+
"weights + KV·n_concurrent", {}, {"GB": total_GB},
|
| 477 |
+
f"{total_GB:.1f} GB for {concurrent_users} concurrent users"))
|
| 478 |
+
|
| 479 |
+
# Step 4: scan GPU catalog
|
| 480 |
+
candidates = []
|
| 481 |
+
for name, info in GPU_CATALOG.items():
|
| 482 |
+
if info["vram_GB"] < total_GB:
|
| 483 |
+
continue
|
| 484 |
+
# Decode throughput estimate (memory-bound)
|
| 485 |
+
tok_per_s = info["bw_GB_s"] / w_mem["GB"]
|
| 486 |
+
tok_per_day = tok_per_s * 86400
|
| 487 |
+
capacity_users = tok_per_day / target_tokens_per_day
|
| 488 |
+
usd_per_day = info["usd_h"] * 24
|
| 489 |
+
usd_per_Mtok = (usd_per_day / (tok_per_day / 1e6)) if tok_per_day > 0 else float('inf')
|
| 490 |
+
candidates.append({
|
| 491 |
+
"gpu": name, "vram_GB": info["vram_GB"], "bw_GB_s": info["bw_GB_s"],
|
| 492 |
+
"tok_per_sec": tok_per_s, "tok_per_day": tok_per_day,
|
| 493 |
+
"USD_per_day": usd_per_day, "USD_per_Mtok": usd_per_Mtok,
|
| 494 |
+
"users_supported": capacity_users,
|
| 495 |
+
})
|
| 496 |
+
candidates.sort(key=lambda c: c["USD_per_Mtok"])
|
| 497 |
+
chain.append(_step(4, "§20", f"Eligible GPUs (≥{total_GB:.0f}GB)",
|
| 498 |
+
"filter + rank by $/Mtok",
|
| 499 |
+
{"min_VRAM": total_GB}, candidates[:5],
|
| 500 |
+
f"{len(candidates)} GPUs fit; cheapest: {candidates[0]['gpu'] if candidates else 'NONE'}"))
|
| 501 |
+
|
| 502 |
+
# Verdict
|
| 503 |
+
if not candidates:
|
| 504 |
+
verdict, reason = "NO", f"No single GPU has ≥{total_GB:.0f} GB VRAM."
|
| 505 |
+
mit = (f"Use tensor parallelism across multiple GPUs "
|
| 506 |
+
f"(e.g. 2× H100 = 160GB), or quantize to INT8 (halves memory).")
|
| 507 |
+
else:
|
| 508 |
+
best = candidates[0]
|
| 509 |
+
verdict = "YES"
|
| 510 |
+
reason = (f"Best GPU: {best['gpu']} at ${best['USD_per_Mtok']:.2f}/Mtok. "
|
| 511 |
+
f"Supports {best['users_supported']:.1f}× your daily target.")
|
| 512 |
+
mit = f"Provision {best['gpu']}, expected {best['tok_per_sec']:.0f} tok/s decode."
|
| 513 |
+
|
| 514 |
+
return _wrap("X-5", "Hardware selection for serving", locals(), chain, verdict, reason, mit)
|
| 515 |
+
|
| 516 |
+
|
| 517 |
+
# ─────────────────────────────────────────────────────────────────────
|
| 518 |
+
# X-19 — KV compression decision (ours vs literature)
|
| 519 |
+
# ─────────────────────────────────────────────────────────────────────
|
| 520 |
+
def run_recipe_x19(theta, T_train, T_eval, n_attention_heads, n_kv_heads,
|
| 521 |
+
d_head, n_layers, n_params, has_SWA=False, **_unused):
|
| 522 |
+
"""X-19: should I use γ-soft KV decay, hard D_f, or literature methods?"""
|
| 523 |
+
chain = []
|
| 524 |
+
|
| 525 |
+
# Step 1: γ_Padé
|
| 526 |
+
g_pade = gamma_pade(theta, T_eval)
|
| 527 |
+
chain.append(_step(1, "§26.1", "γ_Padé", "(2θ-T√2)/(2θ+T√2)",
|
| 528 |
+
{"theta": theta, "T_eval": T_eval}, g_pade, _phase_label(g_pade)))
|
| 529 |
+
|
| 530 |
+
# Step 2: γ-decomposition
|
| 531 |
+
has_GQA = n_kv_heads < n_attention_heads
|
| 532 |
+
decomp = gamma_decompose(g_pade, has_GQA, has_SWA, n_params)
|
| 533 |
+
g_corr = decomp["gamma_corrected"]
|
| 534 |
+
chain.append(_step(2, "§26.10", "γ-decomposition", "5-axis adjustment",
|
| 535 |
+
{"has_GQA": has_GQA, "has_SWA": has_SWA, "n_params": n_params},
|
| 536 |
+
g_corr))
|
| 537 |
+
|
| 538 |
+
# Step 3: §26.7 D_f window applicability
|
| 539 |
+
df = df_window(g_corr, T_eval, f=0.90)
|
| 540 |
+
df_zone_ok = df is not None
|
| 541 |
+
chain.append(_step(3, "§26.7", "D_f window (γ in [0.65, 0.85])",
|
| 542 |
+
"[(1-f)+fN^(1-γ)]^(1/(1-γ))",
|
| 543 |
+
{"gamma": g_corr, "N": T_eval, "f": 0.9}, df,
|
| 544 |
+
f"D_f = {df}" if df_zone_ok
|
| 545 |
+
else f"NOT applicable (γ={g_corr:.3f} outside [0.65, 0.85])"))
|
| 546 |
+
|
| 547 |
+
# Step 4: §26.8 soft decay régimen
|
| 548 |
+
regime = kv_soft_decay_regime(theta, g_corr, T_train)
|
| 549 |
+
dh = d_horizon(theta, g_corr)
|
| 550 |
+
dh_str = f"{dh:.0f}" if dh is not None else "n/a"
|
| 551 |
+
chain.append(_step(4, "§26.8", "Soft decay régimen", "d_h ≳ T_train/2",
|
| 552 |
+
{"theta": theta, "gamma": g_corr, "T_train": T_train}, regime,
|
| 553 |
+
f"d_horizon={dh_str}; regime: {regime}"))
|
| 554 |
+
|
| 555 |
+
# Step 5: KV cache memory baseline
|
| 556 |
+
kv = kv_cache_memory(n_layers, n_kv_heads, d_head, T_eval)
|
| 557 |
+
chain.append(_step(5, "§19.1", "Baseline KV memory", "2·L·n_kv·d_h·seq·B",
|
| 558 |
+
{"L": n_layers, "n_kv": n_kv_heads, "d_h": d_head, "seq": T_eval},
|
| 559 |
+
kv, f"{kv['GB']:.2f} GB without compression"))
|
| 560 |
+
|
| 561 |
+
# Verdict
|
| 562 |
+
if regime == "applies" and df_zone_ok:
|
| 563 |
+
verdict = "USE SOFT DECAY"
|
| 564 |
+
reason = (f"d_horizon ≳ T_train/2 AND γ in compression zone. "
|
| 565 |
+
f"Soft decay (1-d/d_h)^γ best (-21% PPL vs hard cutoff per F17).")
|
| 566 |
+
mit = "Implement as 4D attention_mask additive bias with eager attention."
|
| 567 |
+
elif df_zone_ok:
|
| 568 |
+
verdict = "USE D_f HARD CUTOFF"
|
| 569 |
+
reason = f"γ in [0.65, 0.85] zone but d_h < T_train/2. Hard truncation at D_f={df} works."
|
| 570 |
+
mit = "Set cache_max_len = D_f."
|
| 571 |
+
elif regime == "applies":
|
| 572 |
+
verdict = "USE SOFT DECAY (caveat)"
|
| 573 |
+
reason = "Régimen applies but γ outside D_f validity zone. Soft decay only."
|
| 574 |
+
mit = "Soft decay; do not use D_f window."
|
| 575 |
+
elif g_corr >= 1 or g_corr <= 0:
|
| 576 |
+
verdict = "USE LITERATURE METHODS"
|
| 577 |
+
reason = f"γ={g_corr:.3f} outside Phase A. Our formulas don't apply."
|
| 578 |
+
mit = "Use SnapKV / PyramidKV / FastGen (literature heuristics)."
|
| 579 |
+
else:
|
| 580 |
+
verdict = "USE HARD T_train CUTOFF"
|
| 581 |
+
reason = "Régimen not met AND γ outside zone. Cap context at T_train."
|
| 582 |
+
mit = f"Set seq_len ≤ {T_train}, no extension."
|
| 583 |
+
|
| 584 |
+
return _wrap("X-19", "KV compression decision", locals(), chain, verdict, reason, mit)
|
| 585 |
+
|
| 586 |
+
|
| 587 |
+
# ════════════════════════════════════════════════════════════════════════════
|
| 588 |
+
# Helpers
|
| 589 |
+
# ════════════════════════════════════════════════════════════════════════════
|
| 590 |
+
def _step(n, sec, name, formula, inputs, result, interpretation=None, breakdown=None):
|
| 591 |
+
s = {"step": n, "section": sec, "name": name, "formula": formula,
|
| 592 |
+
"inputs": inputs, "result": result}
|
| 593 |
+
if interpretation:
|
| 594 |
+
s["interpretation"] = interpretation
|
| 595 |
+
if breakdown:
|
| 596 |
+
s["breakdown"] = breakdown
|
| 597 |
+
return s
|
| 598 |
+
|
| 599 |
+
|
| 600 |
+
def _wrap(rid, rname, locals_dict, chain, verdict, reason, mitigation):
|
| 601 |
+
# Clean inputs (drop chain/internal vars)
|
| 602 |
+
inputs = {k: v for k, v in locals_dict.items()
|
| 603 |
+
if not k.startswith("_") and k not in
|
| 604 |
+
("chain", "verdict", "reason", "mit", "info", "be", "kv", "g_pade", "g_corr",
|
| 605 |
+
"decomp", "dh", "l_niah", "p_hallu", "cost", "model_GB", "inf", "api",
|
| 606 |
+
"api_blend", "fits", "mem", "emerg", "max_flops", "hours",
|
| 607 |
+
"N_chinchilla", "D_chinchilla", "candidates", "best", "tok_per_s",
|
| 608 |
+
"tok_per_day", "capacity_users", "usd_per_day", "usd_per_Mtok",
|
| 609 |
+
"total_GB", "w_mem", "df", "df_zone_ok", "regime", "has_GQA",
|
| 610 |
+
"margin", "months", "months_to_payoff", "name")}
|
| 611 |
+
return {"recipe_id": rid, "recipe_name": rname, "inputs": inputs,
|
| 612 |
+
"chain": chain, "verdict": verdict, "reason": reason,
|
| 613 |
+
"mitigation": mitigation}
|
| 614 |
+
|
| 615 |
+
|
| 616 |
+
def _phase_label(g):
|
| 617 |
+
if 0 < g < 1:
|
| 618 |
+
return "Phase A (long-range OK)"
|
| 619 |
+
if g >= 1:
|
| 620 |
+
return "Phase B / Hagedorn"
|
| 621 |
+
return "Phase B / catastrophic (negative γ — T too large for θ)"
|
| 622 |
+
|
| 623 |
+
|
| 624 |
+
# ════════════════════════════════════════════════════════════════════════════
|
| 625 |
+
# Recipe registry
|
| 626 |
+
# ════════════════════════════════════════════════════════════════════════════
|
| 627 |
+
RECIPES = {
|
| 628 |
+
"X-1": {
|
| 629 |
+
"name": "Custom Training vs API",
|
| 630 |
+
"description": "Should I train a custom model or use a frontier API for my domain task?",
|
| 631 |
+
"fn": run_recipe_x1,
|
| 632 |
+
"params": ["N_params", "D_tokens", "gpu", "n_gpus", "mfu",
|
| 633 |
+
"api_model", "monthly_tokens_M"],
|
| 634 |
+
"category": "build-vs-buy",
|
| 635 |
+
"uses_sections": ["§17", "§19", "§20", "§24"],
|
| 636 |
+
},
|
| 637 |
+
"X-2": {
|
| 638 |
+
"name": "Long Context Viability",
|
| 639 |
+
"description": "Will model M serve length L doing Needle-in-a-Haystack retrieval?",
|
| 640 |
+
"fn": run_recipe_x2,
|
| 641 |
+
"params": ["theta", "T_train", "T_eval", "n_attention_heads", "n_kv_heads",
|
| 642 |
+
"d_head", "n_layers", "n_params", "has_SWA"],
|
| 643 |
+
"category": "long-context",
|
| 644 |
+
"uses_sections": ["§26", "§19"],
|
| 645 |
+
},
|
| 646 |
+
"X-3": {
|
| 647 |
+
"name": "Budget Pre-flight",
|
| 648 |
+
"description": "Given $ budget, what model is feasible to train?",
|
| 649 |
+
"fn": run_recipe_x3,
|
| 650 |
+
"params": ["USD_budget", "gpu", "mfu", "n_gpus"],
|
| 651 |
+
"category": "training-budget",
|
| 652 |
+
"uses_sections": ["§17", "§20"],
|
| 653 |
+
},
|
| 654 |
+
"X-5": {
|
| 655 |
+
"name": "Hardware Selection",
|
| 656 |
+
"description": "Which GPU should I use to serve my model at target throughput?",
|
| 657 |
+
"fn": run_recipe_x5,
|
| 658 |
+
"params": ["N_params", "T_eval", "n_layers", "n_kv_heads", "d_head",
|
| 659 |
+
"bytes_per_weight", "target_tokens_per_day", "concurrent_users"],
|
| 660 |
+
"category": "serving",
|
| 661 |
+
"uses_sections": ["§19", "§20"],
|
| 662 |
+
},
|
| 663 |
+
"X-19": {
|
| 664 |
+
"name": "KV Compression Decision",
|
| 665 |
+
"description": "Should I use soft decay, D_f cutoff, or literature methods to compress KV?",
|
| 666 |
+
"fn": run_recipe_x19,
|
| 667 |
+
"params": ["theta", "T_train", "T_eval", "n_attention_heads", "n_kv_heads",
|
| 668 |
+
"d_head", "n_layers", "n_params", "has_SWA"],
|
| 669 |
+
"category": "kv-compression",
|
| 670 |
+
"uses_sections": ["§26", "§19"],
|
| 671 |
+
},
|
| 672 |
+
}
|
| 673 |
+
|
| 674 |
+
|
| 675 |
+
def list_recipes() -> str:
|
| 676 |
+
"""Return JSON of all recipes for UI dropdown."""
|
| 677 |
+
return json.dumps([
|
| 678 |
+
{"id": rid, "name": r["name"], "description": r["description"],
|
| 679 |
+
"category": r["category"], "params": r["params"],
|
| 680 |
+
"uses_sections": r["uses_sections"]}
|
| 681 |
+
for rid, r in RECIPES.items()
|
| 682 |
+
])
|
| 683 |
+
|
| 684 |
+
|
| 685 |
+
def run_recipe(recipe_id: str, **params) -> dict:
|
| 686 |
+
"""Dispatcher — execute recipe by id with given params."""
|
| 687 |
+
r = RECIPES.get(recipe_id)
|
| 688 |
+
if r is None:
|
| 689 |
+
return {"error": f"unknown recipe '{recipe_id}'",
|
| 690 |
+
"available": list(RECIPES.keys())}
|
| 691 |
+
return r["fn"](**params)
|
| 692 |
+
|
| 693 |
+
|
| 694 |
+
# ════════════════════════════════════════════════════════════════════════════
|
| 695 |
+
# Known model presets
|
| 696 |
+
# ════════════════════════════════════════════════════════════════════════════
|
| 697 |
+
PRESETS = {
|
| 698 |
+
"EleutherAI/pythia-2.8b": {
|
| 699 |
+
"theta": 10000, "T_train": 2048,
|
| 700 |
+
"n_attention_heads": 32, "n_kv_heads": 32,
|
| 701 |
+
"d_head": 80, "n_layers": 32, "n_params": 2.8e9, "has_SWA": False,
|
| 702 |
+
},
|
| 703 |
+
"EleutherAI/pythia-1b": {
|
| 704 |
+
"theta": 10000, "T_train": 2048,
|
| 705 |
+
"n_attention_heads": 8, "n_kv_heads": 8,
|
| 706 |
+
"d_head": 256, "n_layers": 16, "n_params": 1e9, "has_SWA": False,
|
| 707 |
+
},
|
| 708 |
+
"EleutherAI/pythia-1.4b": {
|
| 709 |
+
"theta": 10000, "T_train": 2048,
|
| 710 |
+
"n_attention_heads": 16, "n_kv_heads": 16,
|
| 711 |
+
"d_head": 128, "n_layers": 24, "n_params": 1.4e9, "has_SWA": False,
|
| 712 |
+
},
|
| 713 |
+
"meta-llama/Meta-Llama-3-8B": {
|
| 714 |
+
"theta": 500000, "T_train": 8192,
|
| 715 |
+
"n_attention_heads": 32, "n_kv_heads": 8,
|
| 716 |
+
"d_head": 128, "n_layers": 32, "n_params": 8e9, "has_SWA": False,
|
| 717 |
+
},
|
| 718 |
+
"meta-llama/Llama-3.2-1B": {
|
| 719 |
+
"theta": 500000, "T_train": 131072,
|
| 720 |
+
"n_attention_heads": 32, "n_kv_heads": 8,
|
| 721 |
+
"d_head": 64, "n_layers": 16, "n_params": 1.2e9, "has_SWA": False,
|
| 722 |
+
},
|
| 723 |
+
"meta-llama/Llama-3.3-70B-Instruct": {
|
| 724 |
+
"theta": 500000, "T_train": 131072,
|
| 725 |
+
"n_attention_heads": 64, "n_kv_heads": 8,
|
| 726 |
+
"d_head": 128, "n_layers": 80, "n_params": 70e9, "has_SWA": False,
|
| 727 |
+
},
|
| 728 |
+
"mistralai/Mistral-7B-v0.1": {
|
| 729 |
+
"theta": 10000, "T_train": 8192,
|
| 730 |
+
"n_attention_heads": 32, "n_kv_heads": 8,
|
| 731 |
+
"d_head": 128, "n_layers": 32, "n_params": 7e9, "has_SWA": True,
|
| 732 |
+
},
|
| 733 |
+
"Qwen/Qwen2.5-7B": {
|
| 734 |
+
"theta": 1000000, "T_train": 32768,
|
| 735 |
+
"n_attention_heads": 28, "n_kv_heads": 4,
|
| 736 |
+
"d_head": 128, "n_layers": 28, "n_params": 7.6e9, "has_SWA": False,
|
| 737 |
+
},
|
| 738 |
+
"Qwen/Qwen2.5-1.5B": {
|
| 739 |
+
"theta": 1000000, "T_train": 32768,
|
| 740 |
+
"n_attention_heads": 12, "n_kv_heads": 2,
|
| 741 |
+
"d_head": 128, "n_layers": 28, "n_params": 1.5e9, "has_SWA": False,
|
| 742 |
+
},
|
| 743 |
+
"google/gemma-2-9b-it": {
|
| 744 |
+
"theta": 10000, "T_train": 8192,
|
| 745 |
+
"n_attention_heads": 16, "n_kv_heads": 8,
|
| 746 |
+
"d_head": 256, "n_layers": 42, "n_params": 9e9, "has_SWA": True,
|
| 747 |
+
},
|
| 748 |
+
"microsoft/phi-3-mini-4k-instruct": {
|
| 749 |
+
"theta": 10000, "T_train": 4096,
|
| 750 |
+
"n_attention_heads": 32, "n_kv_heads": 32,
|
| 751 |
+
"d_head": 96, "n_layers": 32, "n_params": 3.8e9, "has_SWA": True,
|
| 752 |
+
},
|
| 753 |
+
}
|
| 754 |
+
|
| 755 |
+
|
| 756 |
+
def list_presets() -> str:
|
| 757 |
+
return json.dumps([
|
| 758 |
+
{"id": k, "label": k.split("/")[-1],
|
| 759 |
+
"theta": v["theta"], "T_train": v["T_train"]}
|
| 760 |
+
for k, v in PRESETS.items()
|
| 761 |
+
])
|
| 762 |
+
|
| 763 |
+
|
| 764 |
+
def get_preset(model_id: str) -> dict:
|
| 765 |
+
return PRESETS.get(model_id, {})
|
| 766 |
+
|
| 767 |
+
|
| 768 |
+
# Smoke test
|
| 769 |
+
if __name__ == "__main__":
|
| 770 |
+
print("─── X-2 Llama-3-8B @ 32K ───")
|
| 771 |
+
r = run_recipe("X-2", theta=500_000, T_train=8192, T_eval=32_000,
|
| 772 |
+
n_attention_heads=32, n_kv_heads=8, d_head=128,
|
| 773 |
+
n_layers=32, n_params=8e9, has_SWA=False)
|
| 774 |
+
print(f"Verdict: {r['verdict']} — {r['reason']}\n")
|
| 775 |
+
|
| 776 |
+
print("─── X-1 Llama-3-8B vs GPT-4o (10M tok/mo) ───")
|
| 777 |
+
r = run_recipe("X-1", N_params=8e9, monthly_tokens_M=10.0, api_model="GPT-4o")
|
| 778 |
+
print(f"Verdict: {r['verdict']} — {r['reason']}\n")
|
| 779 |
+
|
| 780 |
+
print("─── X-3 budget $5K ───")
|
| 781 |
+
r = run_recipe("X-3", USD_budget=5000.0, gpu="H100 SXM", n_gpus=1)
|
| 782 |
+
print(f"Verdict: {r['verdict']} — {r['reason']}\n")
|
| 783 |
+
|
| 784 |
+
print("─── X-5 serve Llama-3-8B at 4K ───")
|
| 785 |
+
r = run_recipe("X-5", N_params=8e9, T_eval=4096, n_layers=32, n_kv_heads=8, d_head=128,
|
| 786 |
+
target_tokens_per_day=10e6, concurrent_users=1)
|
| 787 |
+
print(f"Verdict: {r['verdict']} — {r['reason']}\n")
|
| 788 |
+
|
| 789 |
+
print("─── X-19 KV compression for Llama-3-8B ───")
|
| 790 |
+
r = run_recipe("X-19", theta=500_000, T_train=8192, T_eval=8192,
|
| 791 |
+
n_attention_heads=32, n_kv_heads=8, d_head=128,
|
| 792 |
+
n_layers=32, n_params=8e9)
|
| 793 |
+
print(f"Verdict: {r['verdict']} — {r['reason']}\n")
|
|
@@ -0,0 +1,173 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
/* TAF Agent — minimal clean styling */
|
| 2 |
+
:root {
|
| 3 |
+
--bg: #0a0e14;
|
| 4 |
+
--bg-card: #12181f;
|
| 5 |
+
--bg-input: #1a2028;
|
| 6 |
+
--fg: #c9d1d9;
|
| 7 |
+
--fg-dim: #8b949e;
|
| 8 |
+
--accent: #58a6ff;
|
| 9 |
+
--accent-dim: #1f6feb;
|
| 10 |
+
--success: #3fb950;
|
| 11 |
+
--warning: #d29922;
|
| 12 |
+
--danger: #f85149;
|
| 13 |
+
--border: #30363d;
|
| 14 |
+
}
|
| 15 |
+
|
| 16 |
+
* { box-sizing: border-box; }
|
| 17 |
+
|
| 18 |
+
body {
|
| 19 |
+
font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, Oxygen,
|
| 20 |
+
Ubuntu, sans-serif;
|
| 21 |
+
background: var(--bg);
|
| 22 |
+
color: var(--fg);
|
| 23 |
+
margin: 0;
|
| 24 |
+
padding: 0;
|
| 25 |
+
line-height: 1.6;
|
| 26 |
+
}
|
| 27 |
+
|
| 28 |
+
header {
|
| 29 |
+
text-align: center;
|
| 30 |
+
padding: 2rem 1rem 1rem;
|
| 31 |
+
border-bottom: 1px solid var(--border);
|
| 32 |
+
}
|
| 33 |
+
header h1 { margin: 0 0 0.5rem 0; font-size: 2rem; }
|
| 34 |
+
.tagline { font-size: 1.1rem; margin: 0 0 0.5rem; }
|
| 35 |
+
.subtle { color: var(--fg-dim); font-size: 0.9rem; }
|
| 36 |
+
|
| 37 |
+
main {
|
| 38 |
+
max-width: 980px;
|
| 39 |
+
margin: 0 auto;
|
| 40 |
+
padding: 1.5rem;
|
| 41 |
+
}
|
| 42 |
+
|
| 43 |
+
section {
|
| 44 |
+
background: var(--bg-card);
|
| 45 |
+
border: 1px solid var(--border);
|
| 46 |
+
border-radius: 8px;
|
| 47 |
+
padding: 1.25rem 1.5rem;
|
| 48 |
+
margin-bottom: 1.25rem;
|
| 49 |
+
}
|
| 50 |
+
|
| 51 |
+
h2 { margin-top: 0; font-size: 1.2rem; color: var(--accent); }
|
| 52 |
+
|
| 53 |
+
#status-bar { padding: 0.75rem 1.25rem; }
|
| 54 |
+
#status { font-family: monospace; }
|
| 55 |
+
|
| 56 |
+
.recipe-desc { color: var(--fg-dim); margin: 0.5rem 0 0 0; }
|
| 57 |
+
|
| 58 |
+
.form-row { display: flex; gap: 1rem; margin-bottom: 1rem; align-items: center; }
|
| 59 |
+
.form-row label { min-width: 120px; }
|
| 60 |
+
|
| 61 |
+
.form-grid {
|
| 62 |
+
display: grid;
|
| 63 |
+
grid-template-columns: repeat(auto-fill, minmax(220px, 1fr));
|
| 64 |
+
gap: 0.75rem;
|
| 65 |
+
margin-bottom: 1rem;
|
| 66 |
+
}
|
| 67 |
+
.form-field { display: flex; flex-direction: column; }
|
| 68 |
+
.form-field label { font-size: 0.85rem; color: var(--fg-dim); margin-bottom: 0.25rem; }
|
| 69 |
+
|
| 70 |
+
input, select {
|
| 71 |
+
background: var(--bg-input);
|
| 72 |
+
color: var(--fg);
|
| 73 |
+
border: 1px solid var(--border);
|
| 74 |
+
border-radius: 4px;
|
| 75 |
+
padding: 0.4rem 0.6rem;
|
| 76 |
+
font-family: monospace;
|
| 77 |
+
font-size: 0.95rem;
|
| 78 |
+
}
|
| 79 |
+
input:focus, select:focus { outline: 1px solid var(--accent); border-color: var(--accent); }
|
| 80 |
+
|
| 81 |
+
button {
|
| 82 |
+
background: var(--accent-dim);
|
| 83 |
+
color: white;
|
| 84 |
+
border: none;
|
| 85 |
+
padding: 0.6rem 1.2rem;
|
| 86 |
+
font-size: 1rem;
|
| 87 |
+
font-weight: 600;
|
| 88 |
+
border-radius: 6px;
|
| 89 |
+
cursor: pointer;
|
| 90 |
+
transition: background 0.2s;
|
| 91 |
+
}
|
| 92 |
+
button:hover:not(:disabled) { background: var(--accent); }
|
| 93 |
+
button:disabled { background: #444; cursor: not-allowed; }
|
| 94 |
+
|
| 95 |
+
#verdict-box {
|
| 96 |
+
font-size: 1.05rem;
|
| 97 |
+
padding: 1rem;
|
| 98 |
+
border-radius: 6px;
|
| 99 |
+
border-left: 4px solid;
|
| 100 |
+
}
|
| 101 |
+
.verdict-yes { border-color: var(--success); background: rgba(63, 185, 80, 0.08); }
|
| 102 |
+
.verdict-no { border-color: var(--danger); background: rgba(248, 81, 73, 0.08); }
|
| 103 |
+
.verdict-degraded { border-color: var(--warning); background: rgba(210, 153, 34, 0.08); }
|
| 104 |
+
|
| 105 |
+
.chain-step {
|
| 106 |
+
background: var(--bg-input);
|
| 107 |
+
border: 1px solid var(--border);
|
| 108 |
+
border-radius: 6px;
|
| 109 |
+
padding: 0.75rem 1rem;
|
| 110 |
+
margin-bottom: 0.5rem;
|
| 111 |
+
}
|
| 112 |
+
.chain-step summary {
|
| 113 |
+
display: flex;
|
| 114 |
+
justify-content: space-between;
|
| 115 |
+
font-weight: 600;
|
| 116 |
+
cursor: pointer;
|
| 117 |
+
list-style: none;
|
| 118 |
+
}
|
| 119 |
+
.chain-step summary::before { content: "▸ "; color: var(--accent); }
|
| 120 |
+
.chain-step[open] summary::before { content: "▾ "; }
|
| 121 |
+
.step-section { color: var(--accent); font-family: monospace; font-size: 0.9rem; }
|
| 122 |
+
.step-formula { color: var(--fg-dim); font-family: monospace; font-size: 0.85rem; margin: 0.5rem 0; }
|
| 123 |
+
.step-result { color: var(--success); font-family: monospace; font-weight: 600; margin-top: 0.25rem; }
|
| 124 |
+
.step-interp { color: var(--fg-dim); font-size: 0.9rem; margin-top: 0.25rem; }
|
| 125 |
+
.step-result pre { background: var(--bg); padding: 0.5rem; border-radius: 4px; overflow-x: auto; }
|
| 126 |
+
|
| 127 |
+
.recipe-tag {
|
| 128 |
+
background: var(--bg-input);
|
| 129 |
+
color: var(--accent);
|
| 130 |
+
font-family: monospace;
|
| 131 |
+
font-size: 0.85rem;
|
| 132 |
+
padding: 0.2rem 0.5rem;
|
| 133 |
+
border-radius: 4px;
|
| 134 |
+
}
|
| 135 |
+
|
| 136 |
+
.mode-tabs { display: flex; gap: 0.5rem; margin-bottom: 0.75rem; flex-wrap: wrap; }
|
| 137 |
+
.mode-btn {
|
| 138 |
+
background: var(--bg-input); color: var(--fg-dim);
|
| 139 |
+
border: 1px solid var(--border); border-radius: 6px;
|
| 140 |
+
padding: 0.5rem 1rem; cursor: pointer; font-size: 0.95rem;
|
| 141 |
+
}
|
| 142 |
+
.mode-btn.active { background: var(--accent-dim); color: white; border-color: var(--accent); }
|
| 143 |
+
button.secondary {
|
| 144 |
+
background: var(--bg-input); color: var(--fg);
|
| 145 |
+
border: 1px solid var(--border); padding: 0.4rem 0.8rem;
|
| 146 |
+
}
|
| 147 |
+
button.secondary:hover:not(:disabled) { border-color: var(--accent); }
|
| 148 |
+
|
| 149 |
+
textarea {
|
| 150 |
+
width: 100%; min-height: 60px;
|
| 151 |
+
background: var(--bg-input); color: var(--fg);
|
| 152 |
+
border: 1px solid var(--border); border-radius: 4px;
|
| 153 |
+
padding: 0.5rem; font-family: inherit; font-size: 0.95rem; resize: vertical;
|
| 154 |
+
}
|
| 155 |
+
textarea:focus { outline: 1px solid var(--accent); border-color: var(--accent); }
|
| 156 |
+
|
| 157 |
+
@media (max-width: 600px) {
|
| 158 |
+
.form-grid { grid-template-columns: 1fr; }
|
| 159 |
+
main { padding: 0.75rem; }
|
| 160 |
+
.form-row { flex-direction: column; align-items: stretch; }
|
| 161 |
+
.form-row label { min-width: auto; }
|
| 162 |
+
}
|
| 163 |
+
|
| 164 |
+
footer {
|
| 165 |
+
text-align: center;
|
| 166 |
+
padding: 1.5rem;
|
| 167 |
+
color: var(--fg-dim);
|
| 168 |
+
font-size: 0.85rem;
|
| 169 |
+
border-top: 1px solid var(--border);
|
| 170 |
+
margin-top: 2rem;
|
| 171 |
+
}
|
| 172 |
+
footer a { color: var(--accent); text-decoration: none; }
|
| 173 |
+
footer a:hover { text-decoration: underline; }
|