Spaces:

karlexmarin
/

taf-agent

Running

karlexmarin Claude Opus 4.7 (1M context) commited on 7 days ago

Commit

7bc5d7c

1 Parent(s): 3dbfebb

v0.7.3: Quant-regime classifier (anti-bullshit pack #5)

NEW MODE: ⚖️ Quant — predicts γ-shift and ΔPPL for any (model × quant scheme) combination, architecture-aware.

Solves: HF community widely reports unpredictable quantization cliffs. NF4 might lose 2 PPL on Phi-3 but be fine on Llama-3-8B. Generic claims like "AWQ ~95% retention" are too vague — TAF gives architecture-specific verdict using d_head, GQA ratio, SWA flag, and model size.

NEW
- js/quant_regime.js: pure logic. QUANT_SCHEMES table for 10 schemes (FP8 / int8 / Q8_0 / Q5_K_M / AWQ / GPTQ / Q4_K_M / NF4 / Q3_K_M / Q2_K). predictQuantShift() returns γ_shift × arch multiplier + ΔPPL band + regime band (safe/mild/significant/cliff) + concrete recommendation code.
- predictAllSchemes() ranks all 10 schemes for a given architecture so user sees the full trade-off table.
- HF Hub auto-fetch + paste-config fallback. Two output modes: single (one scheme + breakdown) and compare-all (sorted table).

VIRTUAL SIMULATION
- Llama-3-8B + AWQ → mild (γ=0.022); + NF4 → significant (γ=0.076); + Q8_0 → safe (γ=0.009).
- Phi-3-mini (small d_head) + NF4 → cliff (γ=0.085) with reco "switch to AWQ".
- Pythia-160m + Q3_K_M → cliff (γ=0.185) with reco "switch to Q4_K_M".
- Mistral-7B trade-off table: FP8/Q8_0/int8 → safe; Q5_K_M/AWQ/GPTQ → mild; Q4_K_M/NF4 → significant; Q3_K_M/Q2_K → cliff.

DOCUMENTATION
- Help modal: new v0.7 quant section (4 langs) with problem/solution/use case.
- Inventory modal v0.7 card: new "⚖️ Quant" entry (4 langs).
- modes.tip: now lists 12 modes in 4 langs.
- 583 i18n keys × 4 langs · 0 missing / 0 extra (50 new quant.* keys per lang).

38/38 smoke tests passed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Files changed (4) hide show

index.html +36 -0
js/i18n.js +212 -4
js/main.js +176 -1
js/quant_regime.js +147 -0

index.html CHANGED Viewed

@@ -204,6 +204,9 @@
       <p><strong data-i18n="help.v07.contam.title">🧪 Contamination Prior</strong></p>
       <p data-i18n="help.v07.contam.body">Bayesian-ish prior on whether a benchmark score is contaminated. Enter your model's training cutoff date → tool rates 20+ popular benchmarks (MMLU, HellaSwag, GSM8K, HumanEval, IFEval, MMLU-Pro, GPQA, AIME, MATH-500, BBH, MUSR…) by P(contamination) based on time gap, corpus inclusion, and known leak history. Open LLM Leaderboard v1 was killed in 2024 after MMLU/HellaSwag scores became contaminated. <em>Use case</em>: decide which scores to trust when comparing two models.</p>
       <h3 data-i18n="help.audit.title">The audit chain</h3>
       <p data-i18n="help.audit.body">Every result shows the full <strong>Computation Chain</strong> — each formula step with its inputs,
       output, and interpretation. Click any step to expand. Cite section numbers (§26.1, §19.1, etc.) refer
@@ -309,6 +312,7 @@
             <li data-i18n="inv.v07.template"><strong>📜 Chat-template</strong> — exact CLI flag so lm-eval doesn't silently halve your accuracy</li>
             <li data-i18n="inv.v07.arena"><strong>🎯 Arena CI</strong> — recover the confidence intervals Chatbot Arena hides</li>
             <li data-i18n="inv.v07.contam"><strong>🧪 Contamination</strong> — rate 20+ benchmarks for contamination probability</li>
           </ul>
         </details>
       </div>
@@ -362,6 +366,7 @@
         <button class="mode-btn" data-mode="template" role="tab" aria-selected="false" data-i18n="modes.template">📜 Chat-template</button>
         <button class="mode-btn" data-mode="arena" role="tab" aria-selected="false" data-i18n="modes.arena">🎯 Arena CI</button>
         <button class="mode-btn" data-mode="contam" role="tab" aria-selected="false" data-i18n="modes.contam">🧪 Contamination</button>
       </div>
       <p id="mode-desc" class="recipe-desc" data-i18n="modes.desc">
         <strong>Quickest start</strong>: paste any HuggingFace model id (e.g. <code>meta-llama/Meta-Llama-3-8B</code>),
@@ -752,6 +757,37 @@
       <div id="contam-output" style="margin-top: 1em;"></div>
     </section>
     <!-- Recipe selector (mode=recipe) -->
     <section id="recipe-section" style="display:none;">
       <h2 data-i18n="recipe.title">📋 Recipe</h2>

       <p><strong data-i18n="help.v07.contam.title">🧪 Contamination Prior</strong></p>
       <p data-i18n="help.v07.contam.body">Bayesian-ish prior on whether a benchmark score is contaminated. Enter your model's training cutoff date → tool rates 20+ popular benchmarks (MMLU, HellaSwag, GSM8K, HumanEval, IFEval, MMLU-Pro, GPQA, AIME, MATH-500, BBH, MUSR…) by P(contamination) based on time gap, corpus inclusion, and known leak history. Open LLM Leaderboard v1 was killed in 2024 after MMLU/HellaSwag scores became contaminated. <em>Use case</em>: decide which scores to trust when comparing two models.</p>
+      <p><strong data-i18n="help.v07.quant.title">⚖️ Quant-regime Classifier</strong></p>
+      <p data-i18n="help.v07.quant.body">Predicts γ-shift and ΔPPL for any (model × quant scheme: NF4, AWQ, GPTQ, GGUF Q4_K_M / Q5_K_M / Q8_0, int8, FP8, …). Architecture-aware: small d_head + aggressive GQA → more sensitive; calibrated schemes (AWQ) absorb shift better than uncalibrated (NF4). Recommends safer alternatives if a cliff is detected. <em>Use case</em>: before quantizing, predict whether your specific architecture × scheme combo will keep PPL acceptable, with a concrete switch-to suggestion otherwise.</p>
       <h3 data-i18n="help.audit.title">The audit chain</h3>
       <p data-i18n="help.audit.body">Every result shows the full <strong>Computation Chain</strong> — each formula step with its inputs,
       output, and interpretation. Click any step to expand. Cite section numbers (§26.1, §19.1, etc.) refer
             <li data-i18n="inv.v07.template"><strong>📜 Chat-template</strong> — exact CLI flag so lm-eval doesn't silently halve your accuracy</li>
             <li data-i18n="inv.v07.arena"><strong>🎯 Arena CI</strong> — recover the confidence intervals Chatbot Arena hides</li>
             <li data-i18n="inv.v07.contam"><strong>🧪 Contamination</strong> — rate 20+ benchmarks for contamination probability</li>
+            <li data-i18n="inv.v07.quant"><strong>⚖️ Quant</strong> — predict γ shift + ΔPPL for any (model × quant scheme) combo</li>
           </ul>
         </details>
       </div>
         <button class="mode-btn" data-mode="template" role="tab" aria-selected="false" data-i18n="modes.template">📜 Chat-template</button>
         <button class="mode-btn" data-mode="arena" role="tab" aria-selected="false" data-i18n="modes.arena">🎯 Arena CI</button>
         <button class="mode-btn" data-mode="contam" role="tab" aria-selected="false" data-i18n="modes.contam">🧪 Contamination</button>
+        <button class="mode-btn" data-mode="quant" role="tab" aria-selected="false" data-i18n="modes.quant">⚖️ Quant</button>
       </div>
       <p id="mode-desc" class="recipe-desc" data-i18n="modes.desc">
         <strong>Quickest start</strong>: paste any HuggingFace model id (e.g. <code>meta-llama/Meta-Llama-3-8B</code>),
       <div id="contam-output" style="margin-top: 1em;"></div>
     </section>
+    <!-- Quant-regime classifier (v0.7.3 anti-bullshit pack #5) -->
+    <section id="quant-section" style="display:none;">
+      <h2><span data-i18n="quant.title">⚖️ Quant-regime Classifier</span>
+        <span class="info"><span class="tooltip" data-i18n="quant.tip">
+          Predicts γ-shift (and downstream ΔPPL) for a given (model × quant scheme).
+          Generic claims like "AWQ ~95% retention" are too vague — TAF uses
+          d_head, GQA ratio, SWA flag, and model size to give an architecture-specific
+          verdict. Solves: HF community widely reports unpredictable quant cliffs
+          (NF4 -2 PPL on Phi-3 but fine on Llama-3-8B).
+        </span></span>
+      </h2>
+      <p class="recipe-desc" data-i18n="quant.desc">
+        <strong>Will quantizing your model break it?</strong> Paste an HF model id, pick a quant scheme — get predicted γ-shift, expected ΔPPL band, and a recommended alternative if it's a cliff. Browser-only, no GPU, no calibration set required.
+      </p>
+      <div class="form-row">
+        <label for="quant-id" data-i18n="quant.id_label">HF model id:</label>
+        <input type="text" id="quant-id" placeholder="e.g. meta-llama/Llama-3.2-1B" />
+        <button type="button" id="quant-fetch-btn" data-i18n="quant.fetch_btn">📥 Fetch config</button>
+      </div>
+      <div class="form-row">
+        <label for="quant-scheme" data-i18n="quant.scheme_label">Quant scheme:</label>
+        <select id="quant-scheme">
+          <option value="">— select scheme —</option>
+        </select>
+        <button type="button" id="quant-run-btn" data-i18n="quant.run_btn">⚖️ Predict</button>
+        <button type="button" id="quant-all-btn" class="secondary" data-i18n="quant.all_btn">📊 Compare all schemes</button>
+      </div>
+      <p id="quant-status" class="recipe-desc" style="font-size:0.92em;"></p>
+      <div id="quant-output" style="margin-top: 1em;"></div>
+    </section>
     <!-- Recipe selector (mode=recipe) -->
     <section id="recipe-section" style="display:none;">
       <h2 data-i18n="recipe.title">📋 Recipe</h2>

js/i18n.js CHANGED Viewed

@@ -302,6 +302,8 @@ export const TRANSLATIONS = {
     "help.v07.arena.body":         "Chatbot Arena strips confidence intervals from its public leaderboard — a 5-Elo gap can be statistically meaningless. Paste raw pairwise vote data (model_a, model_b, winner) → Bradley-Terry MLE + 200-iteration bootstrap → ranked Elos with 95% CIs and a \"statistical ties\" panel listing pairs whose CIs overlap. Try the Load sample button. <em>Use case</em>: before declaring \"model A beats model B\", verify their CIs don't overlap.",
     "help.v07.contam.title":       "🧪 Contamination Prior",
     "help.v07.contam.body":        "Bayesian-ish prior on whether a benchmark score is contaminated. Enter your model's training cutoff date → tool rates 20+ popular benchmarks (MMLU, HellaSwag, GSM8K, HumanEval, IFEval, MMLU-Pro, GPQA, AIME, MATH-500, BBH, MUSR…) by P(contamination) based on time gap, corpus inclusion, and known leak history. Open LLM Leaderboard v1 was killed in 2024 after MMLU/HellaSwag scores became contaminated. <em>Use case</em>: decide which scores to trust when comparing two models.",
     // v0.7 — Inventory modal 5th card
     "inv.v07.title":               "🆕 v0.7 anti-bullshit pack",
@@ -309,6 +311,56 @@ export const TRANSLATIONS = {
     "inv.v07.template":            "<strong>📜 Chat-template</strong> — exact CLI flag so lm-eval doesn't silently halve your accuracy",
     "inv.v07.arena":               "<strong>🎯 Arena CI</strong> — recover the confidence intervals Chatbot Arena hides",
     "inv.v07.contam":              "<strong>🧪 Contamination</strong> — rate 20+ benchmarks for contamination probability",
     "share.import_desc":       "Got a JSON file from someone else's TAF analysis? Load it here to see the verdict + chain locally. Same view as if you'd run it yourself.",
     "share.import_btn":        "📂 Load shared JSON",
     "synthesis.system":        "You are a precise transformer LLM diagnostic assistant. Given pre-computed TAF formula results, write a clear plain-English summary in 4-6 sentences. Cite the section number (§X.Y) for each number you mention. Always give a concrete recommendation. Do NOT invent numbers.",
@@ -401,7 +453,7 @@ export const TRANSLATIONS = {
     "common.no":           "No",
     // Mode tooltips
-    "modes.tip":           "<strong>Eleven ways to use the tool</strong>.<br><strong>📇 Profile</strong>: paste a model id → 5-recipe TAF Card.<br><strong>🆚 Compare</strong>: 2-3 models side-by-side on one recipe.<br><strong>🔍 Inspect config</strong>: paste raw config.json → full Profile.<br><strong>💬 Ask</strong>: free-form question, browser LLM picks the recipe.<br><strong>📋 Recipe</strong>: manual selection with full form control.<br><strong>🩺 Diagnose CLI</strong>: generate Python command for local γ measurement.<br><strong>📊 Phase diagram</strong>: 23-model panel on (log θ, γ) plane.<br><strong>🪟 Unmask</strong>: detect misleading max_position_embeddings (SWA / YaRN / RoPE-scaling).<br><strong>📜 Chat-template</strong>: detect family + give exact CLI flag for lm-eval / vLLM / transformers.<br><strong>🎯 Arena CI</strong>: reconstruct confidence intervals from raw pairwise vote data; detect statistical ties Arena hides.<br><strong>🧪 Contamination</strong>: rate 20+ benchmarks for contamination probability based on training cutoff vs release date.",
     "profile.tip":         "<strong>One-click full diagnosis</strong>. Paste any HF model id (or pick preset). Tool runs all 5 recipes (long-context, KV-compression, custom-vs-API, budget, hardware) and produces a single <strong>TAF Card</strong> with verdict per dimension + key numbers + architecture classification.<br><br><strong>Use case</strong>: \"I'm evaluating Qwen2.5-32B for production — what's its full viability profile?\" → paste id → Profile → done.",
     "compare.tip":         "<strong>Same recipe, multiple models</strong>. Pick 2-3 candidate models and one recipe. See verdicts in a single comparison table.<br><br><strong>Use case</strong>: \"I need long-context retrieval at 16K — which is best: Llama-3-8B, Mistral-7B, or Qwen-7B?\" → pick 3 + X-2 + 16K → see winner.",
@@ -1035,6 +1087,8 @@ export const TRANSLATIONS = {
     "help.v07.arena.body":         "Chatbot Arena oculta los intervalos de confianza en su leaderboard público — una diferencia de 5 Elo puede ser estadísticamente irrelevante. Pega datos crudos de votos pairwise (model_a, model_b, winner) → MLE Bradley-Terry + bootstrap de 200 iteraciones → Elos ranked con CIs 95% y un panel de \"empates estadísticos\" listando pares cuyos CIs se solapan. Prueba el botón Cargar sample. <em>Caso de uso</em>: antes de afirmar \"modelo A vence a modelo B\", verifica que sus CIs no se solapen.",
     "help.v07.contam.title":       "🧪 Prior de Contaminación",
     "help.v07.contam.body":        "Prior bayesiano-ish sobre si un score de benchmark está contaminado. Introduce la fecha cutoff de entrenamiento de tu modelo → la herramienta puntúa 20+ benchmarks populares (MMLU, HellaSwag, GSM8K, HumanEval, IFEval, MMLU-Pro, GPQA, AIME, MATH-500, BBH, MUSR…) por P(contaminación) según gap temporal, inclusión en corpus y historial de leaks conocidos. Open LLM Leaderboard v1 fue cancelado en 2024 tras la contaminación de MMLU/HellaSwag. <em>Caso de uso</em>: decide qué scores te puedes creer al comparar dos modelos.",
     // v0.7 — Inventory modal 5ª card
     "inv.v07.title":               "🆕 Pack anti-bullshit v0.7",
@@ -1042,6 +1096,56 @@ export const TRANSLATIONS = {
     "inv.v07.template":            "<strong>📜 Chat-template</strong> — flag CLI exacto para que lm-eval no divida tu accuracy entre 2 silenciosamente",
     "inv.v07.arena":               "<strong>🎯 Arena CI</strong> — recupera los intervalos de confianza que Chatbot Arena oculta",
     "inv.v07.contam":              "<strong>🧪 Contaminación</strong> — puntúa 20+ benchmarks por probabilidad de contaminación",
     "share.import_desc":       "¿Tienes un fichero JSON del análisis TAF de alguien? Cárgalo aquí para ver el veredicto + cadena localmente. La misma vista que si lo hubieras ejecutado tú.",
     "share.import_btn":        "📂 Cargar JSON compartido",
     "synthesis.system":        "Eres un asistente de diagnóstico preciso para LLMs transformer. Dados resultados de fórmulas TAF pre-calculados, escribe un resumen claro en español de 4-6 frases. Cita el número de sección (§X.Y) para cada número que menciones. Da siempre una recomendación concreta. NO inventes números.",
@@ -1134,7 +1238,7 @@ export const TRANSLATIONS = {
     "common.no":           "No",
     // Tooltips de modos
-    "modes.tip":           "<strong>Once formas de usar la herramienta</strong>.<br><strong>📇 Perfil</strong>: pega un id → TAF Card de 5 recetas.<br><strong>🆚 Comparar</strong>: 2-3 modelos lado a lado en una receta.<br><strong>🔍 Inspeccionar config</strong>: pega config.json crudo → Perfil completo.<br><strong>💬 Pregunta</strong>: pregunta libre, el LLM del navegador elige la receta.<br><strong>📋 Receta</strong>: selección manual con control total del formulario.<br><strong>🩺 Diagnóstico CLI</strong>: genera comando Python para medir γ localmente.<br><strong>📊 Diagrama de fase</strong>: panel de 23 modelos en plano (log θ, γ).<br><strong>🪟 Desenmascarar</strong>: detecta max_position_embeddings engañoso (SWA / YaRN / RoPE-scaling).<br><strong>📜 Chat-template</strong>: detecta familia + da el flag CLI exacto para lm-eval / vLLM / transformers.<br><strong>🎯 Arena CI</strong>: reconstruye intervalos de confianza desde votos pairwise crudos; detecta empates estadísticos que Arena oculta.<br><strong>🧪 Contaminación</strong>: puntúa 20+ benchmarks por probabilidad de contaminación según cutoff de entrenamiento vs fecha de release.",
     "profile.tip":         "<strong>Diagnóstico completo en un click</strong>. Pega cualquier id de modelo HF (o elige preset). La herramienta ejecuta las 5 recetas (contexto largo, compresión KV, custom vs API, presupuesto, hardware) y produce una única <strong>TAF Card</strong> con veredicto por dimensión + números clave + clasificación arquitectónica.<br><br><strong>Caso de uso</strong>: \"Estoy evaluando Qwen2.5-32B para producción — ¿cuál es su perfil completo de viabilidad?\" → pega id → Perfilar → listo.",
     "compare.tip":         "<strong>Misma receta, múltiples modelos</strong>. Elige 2-3 modelos candidatos y una receta. Ve los veredictos en una única tabla comparativa.<br><br><strong>Caso de uso</strong>: \"Necesito recuperación de contexto largo a 16K — ¿cuál es mejor: Llama-3-8B, Mistral-7B o Qwen-7B?\" → elige 3 + X-2 + 16K → ve el ganador.",
@@ -1632,6 +1736,8 @@ export const TRANSLATIONS = {
     "help.v07.arena.body":         "Chatbot Arena masque les intervalles de confiance de son leaderboard public — un écart de 5 Elo peut être statistiquement insignifiant. Collez des données brutes de votes pairwise (model_a, model_b, winner) → MLE Bradley-Terry + bootstrap 200 itérations → Elos classés avec CIs 95% et un panneau \"égalités statistiques\" listant les paires dont les CIs se chevauchent. Essayez le bouton Charger échantillon. <em>Cas d'usage</em> : avant de déclarer \"modèle A bat modèle B\", vérifiez que leurs CIs ne se chevauchent pas.",
     "help.v07.contam.title":       "🧪 Prior de Contamination",
     "help.v07.contam.body":        "Prior bayésien-ish sur la contamination d'un score de benchmark. Saisissez la date de cutoff d'entraînement de votre modèle → l'outil note 20+ benchmarks populaires (MMLU, HellaSwag, GSM8K, HumanEval, IFEval, MMLU-Pro, GPQA, AIME, MATH-500, BBH, MUSR…) par P(contamination) selon l'écart temporel, l'inclusion dans corpus et l'historique de leaks connus. Open LLM Leaderboard v1 a été tué en 2024 après la contamination de MMLU/HellaSwag. <em>Cas d'usage</em> : décidez quels scores croire en comparant deux modèles.",
     // v0.7 — Inventory modal 5ème card
     "inv.v07.title":               "🆕 Pack anti-bullshit v0.7",
@@ -1639,6 +1745,56 @@ export const TRANSLATIONS = {
     "inv.v07.template":            "<strong>📜 Chat-template</strong> — flag CLI exact pour que lm-eval ne divise pas votre accuracy par 2 en silence",
     "inv.v07.arena":               "<strong>🎯 Arena CI</strong> — récupère les intervalles de confiance que Chatbot Arena cache",
     "inv.v07.contam":              "<strong>🧪 Contamination</strong> — note 20+ benchmarks par probabilité de contamination",
     "share.import_desc":       "Vous avez un fichier JSON de l'analyse TAF de quelqu'un ? Chargez-le ici pour voir le verdict + la chaîne localement. La même vue que si vous l'aviez exécuté vous-même.",
     "share.import_btn":        "📂 Charger JSON partagé",
     "synthesis.system":        "Vous êtes un assistant de diagnostic précis pour LLMs transformer. Étant donné des résultats de formules TAF pré-calculés, écrivez un résumé clair en français de 4-6 phrases. Citez le numéro de section (§X.Y) pour chaque nombre mentionné. Donnez toujours une recommandation concrète. N'INVENTEZ PAS de nombres.",
@@ -1731,7 +1887,7 @@ export const TRANSLATIONS = {
     "common.no":           "Non",
     // Tooltips des modes
-    "modes.tip":           "<strong>Onze façons d'utiliser l'outil</strong>.<br><strong>📇 Profil</strong>: collez un id → TAF Card avec 5 recettes.<br><strong>🆚 Comparer</strong>: 2-3 modèles côte à côte sur une recette.<br><strong>🔍 Inspecter config</strong>: collez config.json brut → Profil complet.<br><strong>💬 Question</strong>: question libre, le LLM du navigateur choisit la recette.<br><strong>📋 Recette</strong>: sélection manuelle avec contrôle total du formulaire.<br><strong>🩺 Diagnostic CLI</strong>: génère commande Python pour mesurer γ localement.<br><strong>📊 Diagramme de phase</strong>: panel de 23 modèles dans le plan (log θ, γ).<br><strong>🪟 Démasquer</strong>: détecte un max_position_embeddings trompeur (SWA / YaRN / RoPE-scaling).<br><strong>📜 Chat-template</strong>: détecte la famille + donne le flag CLI exact pour lm-eval / vLLM / transformers.<br><strong>🎯 Arena CI</strong>: reconstruit les intervalles de confiance depuis les votes pairwise bruts ; détecte les égalités statistiques qu'Arena cache.<br><strong>🧪 Contamination</strong>: note 20+ benchmarks pour leur probabilité de contamination selon le cutoff d'entraînement vs la date de sortie.",
     "profile.tip":         "<strong>Diagnostic complet en un clic</strong>. Collez n'importe quel id de modèle HF (ou choisissez préréglage). L'outil exécute les 5 recettes (contexte long, compression KV, custom vs API, budget, hardware) et produit une <strong>TAF Card</strong> unique avec verdict par dimension + nombres clés + classification architecturale.<br><br><strong>Cas d'usage</strong>: « J'évalue Qwen2.5-32B pour la production — quel est son profil complet de viabilité ? » → collez id → Profiler → fait.",
     "compare.tip":         "<strong>Même recette, plusieurs modèles</strong>. Choisissez 2-3 modèles candidats et une recette. Voyez les verdicts dans un seul tableau comparatif.<br><br><strong>Cas d'usage</strong>: « J'ai besoin de récupération longue contexte à 16K — quel est le meilleur : Llama-3-8B, Mistral-7B ou Qwen-7B ? » → choisissez 3 + X-2 + 16K → voyez le gagnant.",
@@ -2229,6 +2385,8 @@ export const TRANSLATIONS = {
     "help.v07.arena.body":         "Chatbot Arena 在公开排行榜中删除了置信区间 — 5 Elo 的差距在统计上可能毫无意义。粘贴原始 pairwise 投票数据（model_a, model_b, winner）→ Bradley-Terry MLE + 200 次 bootstrap → 排序 Elo + 95% CI + \"统计并列\" 面板，列出 CI 重叠的配对。尝试加载样本按钮。<em>用例</em>：宣称 \"模型 A 胜过模型 B\" 之前，验证它们的 CI 不重叠。",
     "help.v07.contam.title":       "🧪 污染先验",
     "help.v07.contam.body":        "对 benchmark 分数是否被污染做贝叶斯式的先验估计。输入模型训练 cutoff 日期 → 工具按 P(污染) 评估 20+ 主流 benchmark（MMLU、HellaSwag、GSM8K、HumanEval、IFEval、MMLU-Pro、GPQA、AIME、MATH-500、BBH、MUSR…），基于时间差距、语料库纳入和已知泄漏历史。Open LLM Leaderboard v1 在 2024 年因 MMLU/HellaSwag 分数被污染而停用。<em>用例</em>：比较两个模型时决定相信哪些分数。",
     // v0.7 — Inventory 模态第 5 卡
     "inv.v07.title":               "🆕 v0.7 anti-bullshit 套件",
@@ -2236,6 +2394,56 @@ export const TRANSLATIONS = {
     "inv.v07.template":            "<strong>📜 Chat-template</strong> — 精确 CLI flag，让 lm-eval 不会静默对半你的 accuracy",
     "inv.v07.arena":               "<strong>🎯 Arena CI</strong> — 恢复 Chatbot Arena 隐藏的置信区间",
     "inv.v07.contam":              "<strong>🧪 污染</strong> — 按污染概率对 20+ benchmark 评级",
     "share.import_desc":       "有他人 TAF 分析的 JSON 文件? 在这里加载以本地查看判定 + 链。与您自己运行的视图相同。",
     "share.import_btn":        "📂 加载共享的 JSON",
     "synthesis.system":        "您是 transformer LLM 的精确诊断助手。给定预先计算的 TAF 公式结果,用 4-6 句中文写出清晰的摘要。为每个提到的数字引用章节号 (§X.Y)。始终给出具体建议。不要编造数字。",
@@ -2328,7 +2536,7 @@ export const TRANSLATIONS = {
     "common.no":           "否",
     // 模式提示
-    "modes.tip":           "<strong>十一种使用方式</strong>。<br><strong>📇 画像</strong>: 粘贴模型 id → 5 个配方的 TAF 卡。<br><strong>🆚 比较</strong>: 2-3 个模型在一个配方上并排比较。<br><strong>🔍 检查 config</strong>: 粘贴原始 config.json → 完整画像。<br><strong>💬 提问</strong>: 自由形式问题,浏览器 LLM 选择配方。<br><strong>📋 配方</strong>: 手动选择,完全控制表单。<br><strong>🩺 CLI 诊断</strong>: 生成 Python 命令在本地测量 γ。<br><strong>📊 相图</strong>: 23 个面板模型在 (log θ, γ) 平面上。<br><strong>🪟 揭示</strong>: 检测误导的 max_position_embeddings（SWA / YaRN / RoPE 缩放）。<br><strong>📜 Chat-template</strong>: 检测系列 + 给出 lm-eval / vLLM / transformers 的精确 CLI flag。<br><strong>🎯 Arena CI</strong>: 从原始 pairwise 投票数据重建置信区间；检测 Arena 隐藏的统计并列。<br><strong>🧪 污染</strong>: 根据训练 cutoff 与发布日期，对 20+ benchmark 进行污染概率评估。",
     "profile.tip":         "<strong>一键完整诊断</strong>。粘贴任意 HF 模型 id (或选择预设)。工具运行所有 5 个配方 (长上下文、KV 压缩、自定义 vs API、预算、硬件),生成单个 <strong>TAF 卡</strong>,显示每个维度的判定 + 关键数字 + 架构分类。<br><br><strong>用例</strong>: \"我正在为生产评估 Qwen2.5-32B — 它的完整可行性概况是什么?\" → 粘贴 id → 画像 → 完成。",
     "compare.tip":         "<strong>同一配方,多个模型</strong>。选择 2-3 个候选模型和一个配方。在单个比较表中查看判定。<br><br><strong>用例</strong>: \"我需要在 16K 进行长上下文检索 — 哪个最好: Llama-3-8B、Mistral-7B 或 Qwen-7B?\" → 选择 3 个 + X-2 + 16K → 看赢家。",

     "help.v07.arena.body":         "Chatbot Arena strips confidence intervals from its public leaderboard — a 5-Elo gap can be statistically meaningless. Paste raw pairwise vote data (model_a, model_b, winner) → Bradley-Terry MLE + 200-iteration bootstrap → ranked Elos with 95% CIs and a \"statistical ties\" panel listing pairs whose CIs overlap. Try the Load sample button. <em>Use case</em>: before declaring \"model A beats model B\", verify their CIs don't overlap.",
     "help.v07.contam.title":       "🧪 Contamination Prior",
     "help.v07.contam.body":        "Bayesian-ish prior on whether a benchmark score is contaminated. Enter your model's training cutoff date → tool rates 20+ popular benchmarks (MMLU, HellaSwag, GSM8K, HumanEval, IFEval, MMLU-Pro, GPQA, AIME, MATH-500, BBH, MUSR…) by P(contamination) based on time gap, corpus inclusion, and known leak history. Open LLM Leaderboard v1 was killed in 2024 after MMLU/HellaSwag scores became contaminated. <em>Use case</em>: decide which scores to trust when comparing two models.",
+    "help.v07.quant.title":        "⚖️ Quant-regime Classifier",
+    "help.v07.quant.body":         "Predicts γ-shift and ΔPPL for any (model × quant scheme: NF4, AWQ, GPTQ, GGUF Q4_K_M / Q5_K_M / Q8_0, int8, FP8, …). Architecture-aware: small d_head + aggressive GQA → more sensitive; calibrated schemes (AWQ) absorb shift better than uncalibrated (NF4). Recommends safer alternatives if a cliff is detected. <em>Use case</em>: before quantizing, predict whether your specific architecture × scheme combo will keep PPL acceptable, with a concrete switch-to suggestion otherwise.",
     // v0.7 — Inventory modal 5th card
     "inv.v07.title":               "🆕 v0.7 anti-bullshit pack",
     "inv.v07.template":            "<strong>📜 Chat-template</strong> — exact CLI flag so lm-eval doesn't silently halve your accuracy",
     "inv.v07.arena":               "<strong>🎯 Arena CI</strong> — recover the confidence intervals Chatbot Arena hides",
     "inv.v07.contam":              "<strong>🧪 Contamination</strong> — rate 20+ benchmarks for contamination probability",
+    "inv.v07.quant":               "<strong>⚖️ Quant</strong> — predict γ shift + ΔPPL for any (model × quant scheme) combo",
+    // v0.7.3 — anti-bullshit pack #5: Quant-regime classifier
+    "modes.quant":                 "⚖️ Quant",
+    "mode_desc.quant":             "Predicts γ-shift and ΔPPL for any (model × quant scheme). Architecture-aware: small d_head + GQA → more sensitive. Recommends safer alternatives if a cliff is detected.",
+    "quant.title":                 "⚖️ Quant-regime Classifier",
+    "quant.tip":                   "Predicts γ-shift (and downstream ΔPPL) for a given (model × quant scheme). Generic claims like 'AWQ ~95% retention' are too vague — TAF uses d_head, GQA ratio, SWA flag, and model size to give an architecture-specific verdict. Solves: HF community widely reports unpredictable quant cliffs (NF4 -2 PPL on Phi-3 but fine on Llama-3-8B).",
+    "quant.desc":                  "<strong>Will quantizing your model break it?</strong> Paste an HF model id, pick a quant scheme — get predicted γ-shift, expected ΔPPL band, and a recommended alternative if it's a cliff. Browser-only, no GPU, no calibration set required.",
+    "quant.id_label":              "HF model id:",
+    "quant.fetch_btn":             "📥 Fetch config",
+    "quant.scheme_label":          "Quant scheme:",
+    "quant.run_btn":                "⚖️ Predict",
+    "quant.all_btn":               "📊 Compare all schemes",
+    "quant.regime.safe":           "✅ SAFE",
+    "quant.regime.mild":           "✅ MILD COMPRESSION",
+    "quant.regime.significant":    "⚠ SIGNIFICANT DEGRADATION",
+    "quant.regime.cliff":          "❌ HEAVY CLIFF",
+    "quant.label.gamma_shift":     "γ shift",
+    "quant.label.delta_ppl":       "ΔPPL (est.)",
+    "quant.label.arch_mult":       "Arch multiplier",
+    "quant.section.breakdown":     "Breakdown",
+    "quant.section.reco":          "Recommendation",
+    "quant.section.compare":       "All schemes (sorted by safety)",
+    "quant.field.scheme":          "Scheme",
+    "quant.field.calibrated":      "calibrated",
+    "quant.field.uncalibrated":    "uncalibrated",
+    "quant.field.base_penalty":    "Base penalty",
+    "quant.field.arch_mult_full":  "Architecture multiplier",
+    "quant.field.gamma_shift":     "Predicted γ shift",
+    "quant.field.ppl_band":        "ΔPPL band (est.)",
+    "quant.field.params":          "Parameters",
+    "quant.col.scheme":            "Scheme",
+    "quant.col.bits":              "Bits",
+    "quant.col.gamma_shift":       "γ shift",
+    "quant.col.ppl_band":          "ΔPPL band",
+    "quant.col.regime":            "Regime",
+    "quant.reco.switch_to_awq":    "<strong>Switch to {scheme}</strong> — calibrated 4-bit handles small d_head + GQA much better than NF4. Expected ΔPPL drops ~2-3×.",
+    "quant.reco.switch_to_q5_km":  "<strong>Switch to {scheme}</strong> — Q5 keeps more head dimensions intact at low cost (only ~25% bigger file).",
+    "quant.reco.switch_to_q4_km":  "<strong>Switch to {scheme}</strong> — Q3/Q2 are too aggressive for this architecture.",
+    "quant.reco.consider_awq":     "<strong>Consider {scheme}</strong> — calibration meaningfully reduces γ-shift on this architecture.",
+    "quant.reco.use_higher_bits":  "<strong>Use higher-bit alternative</strong> — this architecture cannot absorb 4-bit cleanly. Try 5- or 8-bit.",
+    "quant.reco.verify_with_eval": "<strong>Verify with a real eval</strong> — predicted shift is borderline. Run NIAH at your target context before deploying.",
+    "quant.reco.no_action":        "No action needed — quantization is safe for this architecture.",
+    "quant.summary.headline_all":  "All schemes for <code>{modelId}</code>",
+    "quant.status.empty_id":       "⚠ Enter a model id (e.g. meta-llama/Llama-3.2-1B).",
+    "quant.status.fetching":       "⏳ Fetching config.json for {modelId}...",
+    "quant.status.fetched":        "✅ Config fetched for {modelId}. Pick a scheme and click Predict (or Compare all schemes).",
+    "quant.status.no_scheme":      "⚠ Pick a quant scheme from the dropdown.",
+    "quant.status.done":           "✅ Predicted regime: {regime}",
+    "quant.status.done_all":       "✅ Compared {n} schemes — sorted by safety.",
     "share.import_desc":       "Got a JSON file from someone else's TAF analysis? Load it here to see the verdict + chain locally. Same view as if you'd run it yourself.",
     "share.import_btn":        "📂 Load shared JSON",
     "synthesis.system":        "You are a precise transformer LLM diagnostic assistant. Given pre-computed TAF formula results, write a clear plain-English summary in 4-6 sentences. Cite the section number (§X.Y) for each number you mention. Always give a concrete recommendation. Do NOT invent numbers.",
     "common.no":           "No",
     // Mode tooltips
+    "modes.tip":           "<strong>Twelve ways to use the tool</strong>.<br><strong>📇 Profile</strong>: paste a model id → 5-recipe TAF Card.<br><strong>🆚 Compare</strong>: 2-3 models side-by-side on one recipe.<br><strong>🔍 Inspect config</strong>: paste raw config.json → full Profile.<br><strong>💬 Ask</strong>: free-form question, browser LLM picks the recipe.<br><strong>📋 Recipe</strong>: manual selection with full form control.<br><strong>🩺 Diagnose CLI</strong>: generate Python command for local γ measurement.<br><strong>📊 Phase diagram</strong>: 23-model panel on (log θ, γ) plane.<br><strong>🪟 Unmask</strong>: detect misleading max_position_embeddings (SWA / YaRN / RoPE-scaling).<br><strong>📜 Chat-template</strong>: detect family + give exact CLI flag for lm-eval / vLLM / transformers.<br><strong>🎯 Arena CI</strong>: reconstruct confidence intervals from raw pairwise vote data; detect statistical ties Arena hides.<br><strong>🧪 Contamination</strong>: rate 20+ benchmarks for contamination probability based on training cutoff vs release date.<br><strong>⚖️ Quant</strong>: predict γ-shift and ΔPPL for any (model × quant scheme); recommend safer alternative on cliff.",
     "profile.tip":         "<strong>One-click full diagnosis</strong>. Paste any HF model id (or pick preset). Tool runs all 5 recipes (long-context, KV-compression, custom-vs-API, budget, hardware) and produces a single <strong>TAF Card</strong> with verdict per dimension + key numbers + architecture classification.<br><br><strong>Use case</strong>: \"I'm evaluating Qwen2.5-32B for production — what's its full viability profile?\" → paste id → Profile → done.",
     "compare.tip":         "<strong>Same recipe, multiple models</strong>. Pick 2-3 candidate models and one recipe. See verdicts in a single comparison table.<br><br><strong>Use case</strong>: \"I need long-context retrieval at 16K — which is best: Llama-3-8B, Mistral-7B, or Qwen-7B?\" → pick 3 + X-2 + 16K → see winner.",
     "help.v07.arena.body":         "Chatbot Arena oculta los intervalos de confianza en su leaderboard público — una diferencia de 5 Elo puede ser estadísticamente irrelevante. Pega datos crudos de votos pairwise (model_a, model_b, winner) → MLE Bradley-Terry + bootstrap de 200 iteraciones → Elos ranked con CIs 95% y un panel de \"empates estadísticos\" listando pares cuyos CIs se solapan. Prueba el botón Cargar sample. <em>Caso de uso</em>: antes de afirmar \"modelo A vence a modelo B\", verifica que sus CIs no se solapen.",
     "help.v07.contam.title":       "🧪 Prior de Contaminación",
     "help.v07.contam.body":        "Prior bayesiano-ish sobre si un score de benchmark está contaminado. Introduce la fecha cutoff de entrenamiento de tu modelo → la herramienta puntúa 20+ benchmarks populares (MMLU, HellaSwag, GSM8K, HumanEval, IFEval, MMLU-Pro, GPQA, AIME, MATH-500, BBH, MUSR…) por P(contaminación) según gap temporal, inclusión en corpus y historial de leaks conocidos. Open LLM Leaderboard v1 fue cancelado en 2024 tras la contaminación de MMLU/HellaSwag. <em>Caso de uso</em>: decide qué scores te puedes creer al comparar dos modelos.",
+    "help.v07.quant.title":        "⚖️ Clasificador de régimen de cuantización",
+    "help.v07.quant.body":         "Predice γ-shift y ΔPPL para cualquier (modelo × esquema de cuantización: NF4, AWQ, GPTQ, GGUF Q4_K_M / Q5_K_M / Q8_0, int8, FP8…). Arch-aware: d_head pequeño + GQA agresivo → más sensible; los esquemas calibrados (AWQ) absorben mejor el shift que los no calibrados (NF4). Recomienda alternativas más seguras si detecta cliff. <em>Caso de uso</em>: antes de cuantizar, predice si tu combo arquitectura × esquema mantendrá la PPL aceptable, con sugerencia concreta de switch si no.",
     // v0.7 — Inventory modal 5ª card
     "inv.v07.title":               "🆕 Pack anti-bullshit v0.7",
     "inv.v07.template":            "<strong>📜 Chat-template</strong> — flag CLI exacto para que lm-eval no divida tu accuracy entre 2 silenciosamente",
     "inv.v07.arena":               "<strong>🎯 Arena CI</strong> — recupera los intervalos de confianza que Chatbot Arena oculta",
     "inv.v07.contam":              "<strong>🧪 Contaminación</strong> — puntúa 20+ benchmarks por probabilidad de contaminación",
+    "inv.v07.quant":               "<strong>⚖️ Quant</strong> — predice γ-shift + ΔPPL para cualquier combo (modelo × esquema de cuantización)",
+    // v0.7.3 — anti-bullshit pack #5: Quant-regime classifier
+    "modes.quant":                 "⚖️ Quant",
+    "mode_desc.quant":             "Predice γ-shift y ΔPPL para cualquier (modelo × esquema de cuantización). Arch-aware: d_head pequeño + GQA → más sensible. Recomienda alternativas más seguras si detecta cliff.",
+    "quant.title":                 "⚖️ Clasificador de régimen de cuantización",
+    "quant.tip":                   "Predice γ-shift (y la ΔPPL resultante) para un par (modelo × esquema). Claims genéricos como 'AWQ ~95% retención' son demasiado vagos — TAF usa d_head, ratio GQA, flag SWA y tamaño del modelo para dar veredicto arquitectura-específico. Resuelve: la comunidad HF reporta cliffs de cuantización impredecibles (NF4 -2 PPL en Phi-3 pero bien en Llama-3-8B).",
+    "quant.desc":                  "<strong>¿Cuantizar romperá tu modelo?</strong> Pega un id HF, elige esquema de cuantización — obtén γ-shift predicho, banda ΔPPL esperada y alternativa recomendada si es un cliff. Solo navegador, sin GPU, sin set de calibración.",
+    "quant.id_label":              "ID modelo HF:",
+    "quant.fetch_btn":             "📥 Fetch config",
+    "quant.scheme_label":          "Esquema cuant:",
+    "quant.run_btn":                "⚖️ Predecir",
+    "quant.all_btn":               "📊 Comparar todos los esquemas",
+    "quant.regime.safe":           "✅ SEGURO",
+    "quant.regime.mild":           "✅ COMPRESIÓN LEVE",
+    "quant.regime.significant":    "⚠ DEGRADACIÓN SIGNIFICATIVA",
+    "quant.regime.cliff":          "❌ CLIFF FUERTE",
+    "quant.label.gamma_shift":     "γ shift",
+    "quant.label.delta_ppl":       "ΔPPL (est.)",
+    "quant.label.arch_mult":       "Multiplicador arch",
+    "quant.section.breakdown":     "Desglose",
+    "quant.section.reco":          "Recomendación",
+    "quant.section.compare":       "Todos los esquemas (ordenados por seguridad)",
+    "quant.field.scheme":          "Esquema",
+    "quant.field.calibrated":      "calibrado",
+    "quant.field.uncalibrated":    "no calibrado",
+    "quant.field.base_penalty":    "Penalización base",
+    "quant.field.arch_mult_full":  "Multiplicador arquitectónico",
+    "quant.field.gamma_shift":     "γ shift predicho",
+    "quant.field.ppl_band":        "Banda ΔPPL (est.)",
+    "quant.field.params":          "Parámetros",
+    "quant.col.scheme":            "Esquema",
+    "quant.col.bits":              "Bits",
+    "quant.col.gamma_shift":       "γ shift",
+    "quant.col.ppl_band":          "Banda ΔPPL",
+    "quant.col.regime":            "Régimen",
+    "quant.reco.switch_to_awq":    "<strong>Cambia a {scheme}</strong> — el 4-bit calibrado maneja d_head pequeño + GQA mucho mejor que NF4. ΔPPL esperada cae ~2-3×.",
+    "quant.reco.switch_to_q5_km":  "<strong>Cambia a {scheme}</strong> — Q5 mantiene más dimensiones de head intactas a bajo coste (solo ~25% más grande).",
+    "quant.reco.switch_to_q4_km":  "<strong>Cambia a {scheme}</strong> — Q3/Q2 son demasiado agresivos para esta arquitectura.",
+    "quant.reco.consider_awq":     "<strong>Considera {scheme}</strong> — la calibración reduce γ-shift significativamente en esta arquitectura.",
+    "quant.reco.use_higher_bits":  "<strong>Usa alternativa de mayor bit</strong> — esta arquitectura no absorbe 4-bit limpiamente. Prueba 5 u 8-bit.",
+    "quant.reco.verify_with_eval": "<strong>Verifica con eval real</strong> — el shift predicho está en el límite. Corre NIAH a tu contexto objetivo antes de desplegar.",
+    "quant.reco.no_action":        "No requiere acción — la cuantización es segura para esta arquitectura.",
+    "quant.summary.headline_all":  "Todos los esquemas para <code>{modelId}</code>",
+    "quant.status.empty_id":       "⚠ Introduce un model id (ej. meta-llama/Llama-3.2-1B).",
+    "quant.status.fetching":       "⏳ Obteniendo config.json para {modelId}...",
+    "quant.status.fetched":        "✅ Config obtenido para {modelId}. Elige un esquema y click Predecir (o Comparar todos).",
+    "quant.status.no_scheme":      "⚠ Elige un esquema de cuantización del dropdown.",
+    "quant.status.done":           "✅ Régimen predicho: {regime}",
+    "quant.status.done_all":       "✅ Comparados {n} esquemas — ordenados por seguridad.",
     "share.import_desc":       "¿Tienes un fichero JSON del análisis TAF de alguien? Cárgalo aquí para ver el veredicto + cadena localmente. La misma vista que si lo hubieras ejecutado tú.",
     "share.import_btn":        "📂 Cargar JSON compartido",
     "synthesis.system":        "Eres un asistente de diagnóstico preciso para LLMs transformer. Dados resultados de fórmulas TAF pre-calculados, escribe un resumen claro en español de 4-6 frases. Cita el número de sección (§X.Y) para cada número que menciones. Da siempre una recomendación concreta. NO inventes números.",
     "common.no":           "No",
     // Tooltips de modos
+    "modes.tip":           "<strong>Doce formas de usar la herramienta</strong>.<br><strong>📇 Perfil</strong>: pega un id → TAF Card de 5 recetas.<br><strong>🆚 Comparar</strong>: 2-3 modelos lado a lado en una receta.<br><strong>🔍 Inspeccionar config</strong>: pega config.json crudo → Perfil completo.<br><strong>💬 Pregunta</strong>: pregunta libre, el LLM del navegador elige la receta.<br><strong>📋 Receta</strong>: selección manual con control total del formulario.<br><strong>🩺 Diagnóstico CLI</strong>: genera comando Python para medir γ localmente.<br><strong>📊 Diagrama de fase</strong>: panel de 23 modelos en plano (log θ, γ).<br><strong>🪟 Desenmascarar</strong>: detecta max_position_embeddings engañoso (SWA / YaRN / RoPE-scaling).<br><strong>📜 Chat-template</strong>: detecta familia + da el flag CLI exacto para lm-eval / vLLM / transformers.<br><strong>🎯 Arena CI</strong>: reconstruye intervalos de confianza desde votos pairwise crudos; detecta empates estadísticos que Arena oculta.<br><strong>🧪 Contaminación</strong>: puntúa 20+ benchmarks por probabilidad de contaminación según cutoff de entrenamiento vs fecha de release.<br><strong>⚖️ Quant</strong>: predice γ-shift y ΔPPL para cualquier (modelo × esquema de cuantización); recomienda alternativa segura si hay cliff.",
     "profile.tip":         "<strong>Diagnóstico completo en un click</strong>. Pega cualquier id de modelo HF (o elige preset). La herramienta ejecuta las 5 recetas (contexto largo, compresión KV, custom vs API, presupuesto, hardware) y produce una única <strong>TAF Card</strong> con veredicto por dimensión + números clave + clasificación arquitectónica.<br><br><strong>Caso de uso</strong>: \"Estoy evaluando Qwen2.5-32B para producción — ¿cuál es su perfil completo de viabilidad?\" → pega id → Perfilar → listo.",
     "compare.tip":         "<strong>Misma receta, múltiples modelos</strong>. Elige 2-3 modelos candidatos y una receta. Ve los veredictos en una única tabla comparativa.<br><br><strong>Caso de uso</strong>: \"Necesito recuperación de contexto largo a 16K — ¿cuál es mejor: Llama-3-8B, Mistral-7B o Qwen-7B?\" → elige 3 + X-2 + 16K → ve el ganador.",
     "help.v07.arena.body":         "Chatbot Arena masque les intervalles de confiance de son leaderboard public — un écart de 5 Elo peut être statistiquement insignifiant. Collez des données brutes de votes pairwise (model_a, model_b, winner) → MLE Bradley-Terry + bootstrap 200 itérations → Elos classés avec CIs 95% et un panneau \"égalités statistiques\" listant les paires dont les CIs se chevauchent. Essayez le bouton Charger échantillon. <em>Cas d'usage</em> : avant de déclarer \"modèle A bat modèle B\", vérifiez que leurs CIs ne se chevauchent pas.",
     "help.v07.contam.title":       "🧪 Prior de Contamination",
     "help.v07.contam.body":        "Prior bayésien-ish sur la contamination d'un score de benchmark. Saisissez la date de cutoff d'entraînement de votre modèle → l'outil note 20+ benchmarks populaires (MMLU, HellaSwag, GSM8K, HumanEval, IFEval, MMLU-Pro, GPQA, AIME, MATH-500, BBH, MUSR…) par P(contamination) selon l'écart temporel, l'inclusion dans corpus et l'historique de leaks connus. Open LLM Leaderboard v1 a été tué en 2024 après la contamination de MMLU/HellaSwag. <em>Cas d'usage</em> : décidez quels scores croire en comparant deux modèles.",
+    "help.v07.quant.title":        "⚖️ Classificateur de régime de quantification",
+    "help.v07.quant.body":         "Prédit le γ-shift et ΔPPL pour tout (modèle × schéma de quantification : NF4, AWQ, GPTQ, GGUF Q4_K_M / Q5_K_M / Q8_0, int8, FP8…). Arch-aware : petit d_head + GQA agressif → plus sensible ; les schémas calibrés (AWQ) absorbent mieux le shift que les non calibrés (NF4). Recommande des alternatives plus sûres si un cliff est détecté. <em>Cas d'usage</em> : avant de quantifier, prédisez si votre combo architecture × schéma maintiendra la PPL acceptable, avec une suggestion concrète de switch sinon.",
     // v0.7 — Inventory modal 5ème card
     "inv.v07.title":               "🆕 Pack anti-bullshit v0.7",
     "inv.v07.template":            "<strong>📜 Chat-template</strong> — flag CLI exact pour que lm-eval ne divise pas votre accuracy par 2 en silence",
     "inv.v07.arena":               "<strong>🎯 Arena CI</strong> — récupère les intervalles de confiance que Chatbot Arena cache",
     "inv.v07.contam":              "<strong>🧪 Contamination</strong> — note 20+ benchmarks par probabilité de contamination",
+    "inv.v07.quant":               "<strong>⚖️ Quant</strong> — prédit le γ-shift + ΔPPL pour tout combo (modèle × schéma de quantification)",
+    // v0.7.3 — anti-bullshit pack #5: Quant-regime classifier
+    "modes.quant":                 "⚖️ Quant",
+    "mode_desc.quant":             "Prédit le γ-shift et ΔPPL pour tout (modèle × schéma de quantification). Arch-aware : petit d_head + GQA → plus sensible. Recommande des alternatives plus sûres si un cliff est détecté.",
+    "quant.title":                 "⚖️ Classificateur de régime de quantification",
+    "quant.tip":                   "Prédit le γ-shift (et la ΔPPL résultante) pour une paire (modèle × schéma). Les claims génériques comme 'AWQ ~95% retention' sont trop vagues — TAF utilise d_head, ratio GQA, flag SWA et taille du modèle pour donner un verdict arch-spécifique. Résout : la communauté HF rapporte des cliffs de quantification imprédictibles (NF4 -2 PPL sur Phi-3 mais OK sur Llama-3-8B).",
+    "quant.desc":                  "<strong>La quantification cassera-t-elle votre modèle ?</strong> Collez un id HF, choisissez un schéma — obtenez le γ-shift prédit, la bande ΔPPL attendue et une alternative recommandée si c'est un cliff. Navigateur uniquement, sans GPU, sans set de calibration.",
+    "quant.id_label":              "ID modèle HF :",
+    "quant.fetch_btn":             "📥 Récupérer config",
+    "quant.scheme_label":          "Schéma quant :",
+    "quant.run_btn":                "⚖️ Prédire",
+    "quant.all_btn":               "📊 Comparer tous les schémas",
+    "quant.regime.safe":           "✅ SÛR",
+    "quant.regime.mild":           "✅ COMPRESSION LÉGÈRE",
+    "quant.regime.significant":    "⚠ DÉGRADATION SIGNIFICATIVE",
+    "quant.regime.cliff":          "❌ CLIFF SÉVÈRE",
+    "quant.label.gamma_shift":     "γ shift",
+    "quant.label.delta_ppl":       "ΔPPL (est.)",
+    "quant.label.arch_mult":       "Multiplicateur arch",
+    "quant.section.breakdown":     "Détail",
+    "quant.section.reco":          "Recommandation",
+    "quant.section.compare":       "Tous les schémas (triés par sécurité)",
+    "quant.field.scheme":          "Schéma",
+    "quant.field.calibrated":      "calibré",
+    "quant.field.uncalibrated":    "non calibré",
+    "quant.field.base_penalty":    "Pénalité de base",
+    "quant.field.arch_mult_full":  "Multiplicateur architectural",
+    "quant.field.gamma_shift":     "γ shift prédit",
+    "quant.field.ppl_band":        "Bande ΔPPL (est.)",
+    "quant.field.params":          "Paramètres",
+    "quant.col.scheme":            "Schéma",
+    "quant.col.bits":              "Bits",
+    "quant.col.gamma_shift":       "γ shift",
+    "quant.col.ppl_band":          "Bande ΔPPL",
+    "quant.col.regime":            "Régime",
+    "quant.reco.switch_to_awq":    "<strong>Passez à {scheme}</strong> — le 4-bit calibré gère bien mieux les petits d_head + GQA que NF4. ΔPPL attendue chute ~2-3×.",
+    "quant.reco.switch_to_q5_km":  "<strong>Passez à {scheme}</strong> — Q5 garde plus de dimensions de head intactes à faible coût (~25% plus grand seulement).",
+    "quant.reco.switch_to_q4_km":  "<strong>Passez à {scheme}</strong> — Q3/Q2 sont trop agressifs pour cette architecture.",
+    "quant.reco.consider_awq":     "<strong>Considérez {scheme}</strong> — la calibration réduit significativement le γ-shift sur cette architecture.",
+    "quant.reco.use_higher_bits":  "<strong>Utilisez une alternative à plus de bits</strong> — cette architecture n'absorbe pas le 4-bit proprement. Essayez 5 ou 8-bit.",
+    "quant.reco.verify_with_eval": "<strong>Vérifiez avec une vraie éval</strong> — le shift prédit est borderline. Lancez NIAH à votre contexte cible avant de déployer.",
+    "quant.reco.no_action":        "Pas d'action requise — la quantification est sûre pour cette architecture.",
+    "quant.summary.headline_all":  "Tous les schémas pour <code>{modelId}</code>",
+    "quant.status.empty_id":       "⚠ Saisissez un model id (ex. meta-llama/Llama-3.2-1B).",
+    "quant.status.fetching":       "⏳ Récupération config.json pour {modelId}...",
+    "quant.status.fetched":        "✅ Config récupéré pour {modelId}. Choisissez un schéma et cliquez Prédire (ou Comparer tous).",
+    "quant.status.no_scheme":      "⚠ Choisissez un schéma de quantification dans le dropdown.",
+    "quant.status.done":           "✅ Régime prédit : {regime}",
+    "quant.status.done_all":       "✅ Comparé {n} schémas — triés par sécurité.",
     "share.import_desc":       "Vous avez un fichier JSON de l'analyse TAF de quelqu'un ? Chargez-le ici pour voir le verdict + la chaîne localement. La même vue que si vous l'aviez exécuté vous-même.",
     "share.import_btn":        "📂 Charger JSON partagé",
     "synthesis.system":        "Vous êtes un assistant de diagnostic précis pour LLMs transformer. Étant donné des résultats de formules TAF pré-calculés, écrivez un résumé clair en français de 4-6 phrases. Citez le numéro de section (§X.Y) pour chaque nombre mentionné. Donnez toujours une recommandation concrète. N'INVENTEZ PAS de nombres.",
     "common.no":           "Non",
     // Tooltips des modes
+    "modes.tip":           "<strong>Douze façons d'utiliser l'outil</strong>.<br><strong>📇 Profil</strong>: collez un id → TAF Card avec 5 recettes.<br><strong>🆚 Comparer</strong>: 2-3 modèles côte à côte sur une recette.<br><strong>🔍 Inspecter config</strong>: collez config.json brut → Profil complet.<br><strong>💬 Question</strong>: question libre, le LLM du navigateur choisit la recette.<br><strong>📋 Recette</strong>: sélection manuelle avec contrôle total du formulaire.<br><strong>🩺 Diagnostic CLI</strong>: génère commande Python pour mesurer γ localement.<br><strong>📊 Diagramme de phase</strong>: panel de 23 modèles dans le plan (log θ, γ).<br><strong>🪟 Démasquer</strong>: détecte un max_position_embeddings trompeur (SWA / YaRN / RoPE-scaling).<br><strong>📜 Chat-template</strong>: détecte la famille + donne le flag CLI exact pour lm-eval / vLLM / transformers.<br><strong>🎯 Arena CI</strong>: reconstruit les intervalles de confiance depuis les votes pairwise bruts ; détecte les égalités statistiques qu'Arena cache.<br><strong>🧪 Contamination</strong>: note 20+ benchmarks pour leur probabilité de contamination selon le cutoff d'entraînement vs la date de sortie.<br><strong>⚖️ Quant</strong>: prédit γ-shift et ΔPPL pour tout (modèle × schéma de quantification) ; recommande une alternative sûre en cas de cliff.",
     "profile.tip":         "<strong>Diagnostic complet en un clic</strong>. Collez n'importe quel id de modèle HF (ou choisissez préréglage). L'outil exécute les 5 recettes (contexte long, compression KV, custom vs API, budget, hardware) et produit une <strong>TAF Card</strong> unique avec verdict par dimension + nombres clés + classification architecturale.<br><br><strong>Cas d'usage</strong>: « J'évalue Qwen2.5-32B pour la production — quel est son profil complet de viabilité ? » → collez id → Profiler → fait.",
     "compare.tip":         "<strong>Même recette, plusieurs modèles</strong>. Choisissez 2-3 modèles candidats et une recette. Voyez les verdicts dans un seul tableau comparatif.<br><br><strong>Cas d'usage</strong>: « J'ai besoin de récupération longue contexte à 16K — quel est le meilleur : Llama-3-8B, Mistral-7B ou Qwen-7B ? » → choisissez 3 + X-2 + 16K → voyez le gagnant.",
     "help.v07.arena.body":         "Chatbot Arena 在公开排行榜中删除了置信区间 — 5 Elo 的差距在统计上可能毫无意义。粘贴原始 pairwise 投票数据（model_a, model_b, winner）→ Bradley-Terry MLE + 200 次 bootstrap → 排序 Elo + 95% CI + \"统计并列\" 面板，列出 CI 重叠的配对。尝试加载样本按钮。<em>用例</em>：宣称 \"模型 A 胜过模型 B\" 之前，验证它们的 CI 不重叠。",
     "help.v07.contam.title":       "🧪 污染先验",
     "help.v07.contam.body":        "对 benchmark 分数是否被污染做贝叶斯式的先验估计。输入模型训练 cutoff 日期 → 工具按 P(污染) 评估 20+ 主流 benchmark（MMLU、HellaSwag、GSM8K、HumanEval、IFEval、MMLU-Pro、GPQA、AIME、MATH-500、BBH、MUSR…），基于时间差距、语料库纳入和已知泄漏历史。Open LLM Leaderboard v1 在 2024 年因 MMLU/HellaSwag 分数被污染而停用。<em>用例</em>：比较两个模型时决定相信哪些分数。",
+    "help.v07.quant.title":        "⚖️ 量化机制分类器",
+    "help.v07.quant.body":         "预测任意（模型 × 量化方案：NF4、AWQ、GPTQ、GGUF Q4_K_M / Q5_K_M / Q8_0、int8、FP8…）的 γ-shift 与 ΔPPL。架构感知：小 d_head + 激进 GQA → 更敏感；校准方案（AWQ）比未校准方案（NF4）更好地吸收偏移。检测到 cliff 时推荐更安全的替代方案。<em>用例</em>：量化之前，预测你的特定架构 × 方案组合是否能保持 PPL 可接受，否则给出具体的切换建议。",
     // v0.7 — Inventory 模态第 5 卡
     "inv.v07.title":               "🆕 v0.7 anti-bullshit 套件",
     "inv.v07.template":            "<strong>📜 Chat-template</strong> — 精确 CLI flag，让 lm-eval 不会静默对半你的 accuracy",
     "inv.v07.arena":               "<strong>🎯 Arena CI</strong> — 恢复 Chatbot Arena 隐藏的置信区间",
     "inv.v07.contam":              "<strong>🧪 污染</strong> — 按污染概率对 20+ benchmark 评级",
+    "inv.v07.quant":               "<strong>⚖️ Quant</strong> — 预测任意（模型 × 量化方案）组合的 γ-shift + ΔPPL",
+    // v0.7.3 — anti-bullshit pack #5: Quant-regime classifier
+    "modes.quant":                 "⚖️ Quant",
+    "mode_desc.quant":             "预测任意（模型 × 量化方案）的 γ-shift 与 ΔPPL。架构感知：小 d_head + GQA → 更敏感。检测到 cliff 时推荐更安全的替代方案。",
+    "quant.title":                 "⚖️ 量化机制分类器",
+    "quant.tip":                   "预测给定（模型 × ���化方案）的 γ-shift（及由此产生的 ΔPPL）。\"AWQ 保留 ~95%\" 这类通用说法太模糊 — TAF 利用 d_head、GQA 比、SWA 标志和模型大小给出特定于架构的判定。解决：HF 社区普遍报告不可预测的量化 cliff（NF4 在 Phi-3 上 -2 PPL，但在 Llama-3-8B 上没问题）。",
+    "quant.desc":                  "<strong>量化会破坏你的模型吗？</strong>粘贴 HF 模型 id，选择量化方案 — 获取预测的 γ-shift、预期 ΔPPL 区间，以及在 cliff 情况下的推荐替代方案。仅浏览器，无 GPU，无需校准集。",
+    "quant.id_label":              "HF 模型 id：",
+    "quant.fetch_btn":             "📥 获取 config",
+    "quant.scheme_label":          "量化方案：",
+    "quant.run_btn":                "⚖️ 预测",
+    "quant.all_btn":               "📊 比较所有方案",
+    "quant.regime.safe":           "✅ 安全",
+    "quant.regime.mild":           "✅ 轻度压缩",
+    "quant.regime.significant":    "⚠ 显著退化",
+    "quant.regime.cliff":          "❌ 重大 CLIFF",
+    "quant.label.gamma_shift":     "γ 偏移",
+    "quant.label.delta_ppl":       "ΔPPL（估）",
+    "quant.label.arch_mult":       "架构乘数",
+    "quant.section.breakdown":     "细节分解",
+    "quant.section.reco":          "建议",
+    "quant.section.compare":       "所有方案（按安全性排序）",
+    "quant.field.scheme":          "方案",
+    "quant.field.calibrated":      "已校准",
+    "quant.field.uncalibrated":    "未校准",
+    "quant.field.base_penalty":    "基础惩罚",
+    "quant.field.arch_mult_full":  "架构乘数",
+    "quant.field.gamma_shift":     "预测 γ 偏移",
+    "quant.field.ppl_band":        "ΔPPL 区间（估）",
+    "quant.field.params":          "参数量",
+    "quant.col.scheme":            "方案",
+    "quant.col.bits":              "比特",
+    "quant.col.gamma_shift":       "γ 偏移",
+    "quant.col.ppl_band":          "ΔPPL 区间",
+    "quant.col.regime":            "机制",
+    "quant.reco.switch_to_awq":    "<strong>切换到 {scheme}</strong> — 校准的 4-bit 处理小 d_head + GQA 比 NF4 好得多。预期 ΔPPL 下降 ~2-3 倍。",
+    "quant.reco.switch_to_q5_km":  "<strong>切换到 {scheme}</strong> — Q5 以低成本保留更多 head 维度（仅大约 25% 文件更大）。",
+    "quant.reco.switch_to_q4_km":  "<strong>切换到 {scheme}</strong> — Q3/Q2 对此架构过于激进。",
+    "quant.reco.consider_awq":     "<strong>考虑 {scheme}</strong> — 在此架构上校准能显著降低 γ-shift。",
+    "quant.reco.use_higher_bits":  "<strong>使用更高比特的替代</strong> — 此架构无法干净吸收 4-bit。尝试 5 或 8-bit。",
+    "quant.reco.verify_with_eval": "<strong>用真实 eval 验证</strong> — 预测偏移在边缘。部署前在目标上下文运行 NIAH。",
+    "quant.reco.no_action":        "无需操作 — 此架构下量化是安全的。",
+    "quant.summary.headline_all":  "<code>{modelId}</code> 的所有方案",
+    "quant.status.empty_id":       "⚠ 输入 model id（例如 meta-llama/Llama-3.2-1B）。",
+    "quant.status.fetching":       "⏳ 正在获取 {modelId} 的 config.json...",
+    "quant.status.fetched":        "✅ 已获取 {modelId} 的 config。选择方案并点击预测（或比较所有）。",
+    "quant.status.no_scheme":      "⚠ 从下拉中选择一个量化方案。",
+    "quant.status.done":           "✅ 预测机制：{regime}",
+    "quant.status.done_all":       "✅ 已比较 {n} 个方案 — 按安全性排序。",
     "share.import_desc":       "有他人 TAF 分析的 JSON 文件? 在这里加载以本地查看判定 + 链。与您自己运行的视图相同。",
     "share.import_btn":        "📂 加载共享的 JSON",
     "synthesis.system":        "您是 transformer LLM 的精确诊断助手。给定预先计算的 TAF 公式结果,用 4-6 句中文写出清晰的摘要。为每个提到的数字引用章节号 (§X.Y)。始终给出具体建议。不要编造数字。",
     "common.no":           "否",
     // 模式提示
+    "modes.tip":           "<strong>十二种使用方式</strong>。<br><strong>📇 画像</strong>: 粘贴模型 id → 5 个配方的 TAF 卡。<br><strong>🆚 比较</strong>: 2-3 个模型在一个配方上并排比较。<br><strong>🔍 检查 config</strong>: 粘贴原始 config.json → 完整画像。<br><strong>💬 提问</strong>: 自由形式问题,浏览器 LLM 选择配方。<br><strong>📋 配方</strong>: 手动选择,完全控制表单。<br><strong>🩺 CLI 诊断</strong>: 生成 Python 命令在本地测量 γ。<br><strong>📊 相图</strong>: 23 个面板模型在 (log θ, γ) 平面上。<br><strong>🪟 揭示</strong>: 检测误导的 max_position_embeddings（SWA / YaRN / RoPE 缩放）。<br><strong>📜 Chat-template</strong>: 检测系列 + 给出 lm-eval / vLLM / transformers 的精确 CLI flag。<br><strong>🎯 Arena CI</strong>: 从原始 pairwise 投票数据重建置信区间；检测 Arena 隐藏的统计并列。<br><strong>🧪 污染</strong>: 根据训练 cutoff 与发布日期，对 20+ benchmark 进行污染概率评估。<br><strong>⚖️ Quant</strong>: 预测任意（模型 × 量化方案）的 γ-shift 与 ΔPPL；cliff 时推荐更安全替代方案。",
     "profile.tip":         "<strong>一键完整诊断</strong>。粘贴任意 HF 模型 id (或选择预设)。工具运行所有 5 个配方 (长上下文、KV 压缩、自定义 vs API、预算、硬件),生成单个 <strong>TAF 卡</strong>,显示每个维度的判定 + 关键数字 + 架构分类。<br><br><strong>用例</strong>: \"我正在为生产评估 Qwen2.5-32B — 它的完整可行性概况是什么?\" → 粘贴 id → 画像 → 完成。",
     "compare.tip":         "<strong>同一配方,多个模型</strong>。选择 2-3 个候选模型和一个配方。在单个比较表中查看判定。<br><br><strong>用例</strong>: \"我需要在 16K 进行长上下文检索 — 哪个最好: Llama-3-8B、Mistral-7B 或 Qwen-7B?\" → 选择 3 个 + X-2 + 16K → 看赢家。",

js/main.js CHANGED Viewed

@@ -15,6 +15,7 @@ import { unmaskConfig } from "./swa_unmasker.js";
 import { sniffChatTemplate } from "./chat_template_sniffer.js";
 import { parseVotesCSV, computeArenaCI, SAMPLE_VOTES_CSV } from "./arena_ci.js";
 import { rateAllBenchmarks, BENCHMARK_DB } from "./contamination_prior.js";
 const TAF_BROWSER_URL = "python/taf_browser.py";
 const ENABLE_WEBLLM = true;
@@ -190,7 +191,8 @@ document.querySelectorAll(".mode-btn").forEach(btn => {
     ["ask-section", "recipe-section", "form-section",
      "profile-section", "compare-section", "inspector-section",
      "diagnose-section", "phase-section", "unmask-section",
-     "template-section", "arena-section", "contam-section"].forEach(id => {
       const el = $(id);
       if (el) el.style.display = "none";
     });
@@ -200,6 +202,7 @@ document.querySelectorAll(".mode-btn").forEach(btn => {
       compare: "compare-section", inspector: "inspector-section",
       diagnose: "diagnose-section", phase: "phase-section", unmask: "unmask-section",
       template: "template-section", arena: "arena-section", contam: "contam-section",
     };
     const sectionId = sectionMap[mode];
     if (sectionId) $(sectionId).style.display = "";
@@ -980,6 +983,178 @@ $("contam-cutoff")?.addEventListener("keydown", (e) => {
   if (e.key === "Enter") { e.preventDefault(); runContamCompute(); }
 });
 function configToPreset(cfg, modelId) {
   const n_attn = cfg.num_attention_heads || cfg.n_head || 0;
   const n_kv = cfg.num_key_value_heads || cfg.num_attention_heads || cfg.n_head || 0;

 import { sniffChatTemplate } from "./chat_template_sniffer.js";
 import { parseVotesCSV, computeArenaCI, SAMPLE_VOTES_CSV } from "./arena_ci.js";
 import { rateAllBenchmarks, BENCHMARK_DB } from "./contamination_prior.js";
+import { predictQuantShift, predictAllSchemes, QUANT_SCHEMES } from "./quant_regime.js";
 const TAF_BROWSER_URL = "python/taf_browser.py";
 const ENABLE_WEBLLM = true;
     ["ask-section", "recipe-section", "form-section",
      "profile-section", "compare-section", "inspector-section",
      "diagnose-section", "phase-section", "unmask-section",
+     "template-section", "arena-section", "contam-section",
+     "quant-section"].forEach(id => {
       const el = $(id);
       if (el) el.style.display = "none";
     });
       compare: "compare-section", inspector: "inspector-section",
       diagnose: "diagnose-section", phase: "phase-section", unmask: "unmask-section",
       template: "template-section", arena: "arena-section", contam: "contam-section",
+      quant: "quant-section",
     };
     const sectionId = sectionMap[mode];
     if (sectionId) $(sectionId).style.display = "";
   if (e.key === "Enter") { e.preventDefault(); runContamCompute(); }
 });
+// ════════════════════════════════════════════════════════════════════
+// ⚖️ Quant-regime classifier (v0.7.3 anti-bullshit pack #5)
+// ════════════════════════════════════════════════════════════════════
+const QUANT_REGIME_COLOR = {
+  safe:        "#3fb950",
+  mild:        "#3fb950",
+  significant: "#f1c40f",
+  cliff:       "#f85149",
+};
+// Populate scheme dropdown from QUANT_SCHEMES on first render. Idempotent.
+function populateQuantSchemes() {
+  const sel = $("quant-scheme");
+  if (!sel || sel.options.length > 1) return;
+  for (const s of QUANT_SCHEMES) {
+    const opt = document.createElement("option");
+    opt.value = s.id;
+    opt.textContent = s.label;
+    sel.appendChild(opt);
+  }
+}
+// Cache config across "Fetch" + "Predict" / "Compare" actions on the same id.
+let __quantLastConfig = null;
+let __quantLastModelId = null;
+async function quantFetchConfig() {
+  const modelId = ($("quant-id").value || "").trim();
+  if (!modelId) {
+    $("quant-status").textContent = t("quant.status.empty_id") || "⚠ Enter a model id.";
+    return null;
+  }
+  $("quant-status").textContent = tFmt("quant.status.fetching", { modelId });
+  $("quant-fetch-btn").disabled = true;
+  try {
+    const cfg = await fetchHfConfig(modelId);
+    __quantLastConfig = cfg;
+    __quantLastModelId = modelId;
+    $("quant-status").textContent = tFmt("quant.status.fetched", { modelId });
+    return cfg;
+  } catch (err) {
+    $("quant-status").textContent = `❌ ${err.message}`;
+    return null;
+  } finally {
+    $("quant-fetch-btn").disabled = false;
+  }
+}
+function renderQuantSingle(result, modelId) {
+  const escapeHtml = (s) => String(s).replace(/[&<>"']/g, c =>
+    ({"&":"&amp;","<":"&lt;",">":"&gt;",'"':"&quot;","'":"&#39;"}[c]));
+  const fmtN = (x) => x === null || x === undefined ? "—" : Number(x).toLocaleString();
+  const color = QUANT_REGIME_COLOR[result.regime] || "#8b949e";
+  const regimeLabel = t(`quant.regime.${result.regime}`) || result.regime;
+  let recoHtml = "";
+  if (result.recommend_code) {
+    const recoText = result.recommend_scheme
+      ? tFmt("quant.reco." + result.recommend_code, {
+          scheme: QUANT_SCHEMES.find(s => s.id === result.recommend_scheme)?.label || result.recommend_scheme,
+        })
+      : (t("quant.reco." + result.recommend_code) || result.recommend_code);
+    recoHtml = `<p class="unmask-reco">${recoText}</p>`;
+  } else {
+    recoHtml = `<p class="unmask-reco">${t("quant.reco.no_action") || "No action needed — quantization is safe for this architecture."}</p>`;
+  }
+  return `
+    <div class="unmask-result">
+      <div class="unmask-hero" style="border-color: ${color};">
+        <div class="unmask-verdict" style="color: ${color};">${regimeLabel}</div>
+        <div class="unmask-model"><code>${escapeHtml(modelId)}</code> + <code>${escapeHtml(result.scheme_label)}</code></div>
+        <div class="unmask-numbers">
+          <div><span class="unmask-num-label">${t("quant.label.gamma_shift") || "γ shift"}</span><span class="unmask-num-val">+${result.gamma_shift.toFixed(3)}</span></div>
+          <div><span class="unmask-num-label">${t("quant.label.delta_ppl") || "ΔPPL (est.)"}</span><span class="unmask-num-val">+${result.delta_ppl.mid.toFixed(2)}</span></div>
+          <div><span class="unmask-num-label">${t("quant.label.arch_mult") || "Arch multiplier"}</span><span class="unmask-num-val">×${result.arch_multiplier}</span></div>
+        </div>
+      </div>
+      <div class="unmask-details">
+        <details class="unmask-panel" open>
+          <summary class="unmask-panel-title">${t("quant.section.breakdown") || "Breakdown"}</summary>
+          <ul>
+            <li><strong>${t("quant.field.scheme") || "Scheme"}:</strong> ${escapeHtml(result.scheme_label)} (${result.scheme_bits}-bit, ${result.scheme_calibrated ? (t("quant.field.calibrated") || "calibrated") : (t("quant.field.uncalibrated") || "uncalibrated")})</li>
+            <li><strong>${t("quant.field.base_penalty") || "Base penalty"}:</strong> ${result.base_penalty.toFixed(3)}</li>
+            <li><strong>${t("quant.field.arch_mult_full") || "Architecture multiplier"}:</strong> ×${result.arch_multiplier} (d_head, GQA, SWA, params)</li>
+            <li><strong>${t("quant.field.gamma_shift") || "Predicted γ shift"}:</strong> +${result.gamma_shift.toFixed(3)}</li>
+            <li><strong>${t("quant.field.ppl_band") || "ΔPPL band (est.)"}:</strong> ${result.delta_ppl.low.toFixed(2)} – ${result.delta_ppl.high.toFixed(2)}</li>
+            <li><strong>${t("quant.field.params") || "Parameters"}:</strong> ${fmtN(result.n_params)}</li>
+          </ul>
+        </details>
+        <details class="unmask-panel" open>
+          <summary class="unmask-panel-title">${t("quant.section.reco") || "Recommendation"}</summary>
+          ${recoHtml}
+        </details>
+      </div>
+    </div>
+  `;
+}
+function renderQuantAll(rows, modelId) {
+  const escapeHtml = (s) => String(s).replace(/[&<>"']/g, c =>
+    ({"&":"&amp;","<":"&lt;",">":"&gt;",'"':"&quot;","'":"&#39;"}[c]));
+  let body = "";
+  for (const r of rows) {
+    const color = QUANT_REGIME_COLOR[r.regime] || "#8b949e";
+    const regimeLabel = t(`quant.regime.${r.regime}`) || r.regime;
+    body += `<tr>
+      <td><strong>${escapeHtml(r.scheme_label)}</strong></td>
+      <td class="arena-spread">${r.scheme_bits}-bit ${r.scheme_calibrated ? "✓" : ""}</td>
+      <td class="arena-elo">+${r.gamma_shift.toFixed(3)}</td>
+      <td class="arena-spread">${r.delta_ppl.low.toFixed(2)}–${r.delta_ppl.high.toFixed(2)}</td>
+      <td style="color: ${color};"><strong>${regimeLabel}</strong></td>
+    </tr>`;
+  }
+  return `
+    <div class="arena-result">
+      <div class="unmask-hero" style="border-color: #58a6ff;">
+        <div class="unmask-verdict" style="color: #58a6ff;">${tFmt("quant.summary.headline_all", { modelId })}</div>
+      </div>
+      <div class="unmask-details">
+        <details class="unmask-panel" open>
+          <summary class="unmask-panel-title">${t("quant.section.compare") || "All schemes (sorted by safety)"}</summary>
+          <table class="arena-table">
+            <thead><tr>
+              <th>${t("quant.col.scheme") || "Scheme"}</th>
+              <th>${t("quant.col.bits") || "Bits"}</th>
+              <th>${t("quant.col.gamma_shift") || "γ shift"}</th>
+              <th>${t("quant.col.ppl_band") || "ΔPPL band"}</th>
+              <th>${t("quant.col.regime") || "Regime"}</th>
+            </tr></thead>
+            <tbody>${body}</tbody>
+          </table>
+        </details>
+      </div>
+    </div>
+  `;
+}
+async function runQuantPredict() {
+  const cfg = __quantLastConfig || await quantFetchConfig();
+  if (!cfg) return;
+  const schemeId = $("quant-scheme").value;
+  if (!schemeId) {
+    $("quant-status").textContent = t("quant.status.no_scheme") || "⚠ Pick a quant scheme.";
+    return;
+  }
+  const result = predictQuantShift(cfg, schemeId);
+  if (!result) {
+    $("quant-status").textContent = "❌ Unknown scheme.";
+    return;
+  }
+  $("quant-output").innerHTML = renderQuantSingle(result, __quantLastModelId);
+  $("quant-status").textContent = tFmt("quant.status.done", { regime: t(`quant.regime.${result.regime}`) || result.regime });
+}
+async function runQuantAll() {
+  const cfg = __quantLastConfig || await quantFetchConfig();
+  if (!cfg) return;
+  const rows = predictAllSchemes(cfg);
+  $("quant-output").innerHTML = renderQuantAll(rows, __quantLastModelId);
+  $("quant-status").textContent = tFmt("quant.status.done_all", { n: rows.length });
+}
+populateQuantSchemes();
+$("quant-fetch-btn")?.addEventListener("click", quantFetchConfig);
+$("quant-run-btn")?.addEventListener("click", runQuantPredict);
+$("quant-all-btn")?.addEventListener("click", runQuantAll);
+$("quant-id")?.addEventListener("keydown", (e) => {
+  if (e.key === "Enter") { e.preventDefault(); quantFetchConfig(); }
+});
 function configToPreset(cfg, modelId) {
   const n_attn = cfg.num_attention_heads || cfg.n_head || 0;
   const n_kv = cfg.num_key_value_heads || cfg.num_attention_heads || cfg.n_head || 0;

js/quant_regime.js ADDED Viewed

	@@ -0,0 +1,147 @@

+// Quant-regime classifier (v0.7.3 anti-bullshit pack #5)
+// Predicts γ shift under quantization given (architecture × quant scheme).
+// Pure logic — no human strings. Solves: HF community widely complains that
+// quantization "cliffs" are unpredictable per model. Generic "AWQ ~95% retention"
+// claims are too vague — TAF gives architecture-specific verdict.
+//
+// Calibration sources: Maarten Grootendorst's quant comparison newsletter,
+// llama.cpp PPL benchmarks, GPTQ/AWQ papers.
+export const QUANT_SCHEMES = [
+  { id: "fp8",         label: "FP8 (Hopper)",       bits: 8, base_penalty: 0.007, calibrated: false, hardware: "h100+" },
+  { id: "int8",        label: "int8 (LLM.int8())", bits: 8, base_penalty: 0.010, calibrated: false, hardware: "any"   },
+  { id: "gguf_q8_0",   label: "GGUF Q8_0",         bits: 8, base_penalty: 0.008, calibrated: false, hardware: "cpu/any" },
+  { id: "gguf_q5_km",  label: "GGUF Q5_K_M",       bits: 5, base_penalty: 0.020, calibrated: false, hardware: "cpu/any" },
+  { id: "awq",         label: "AWQ (4-bit, calibrated)", bits: 4, base_penalty: 0.020, calibrated: true,  hardware: "any" },
+  { id: "gptq",        label: "GPTQ (4-bit, calibrated)", bits: 4, base_penalty: 0.035, calibrated: true,  hardware: "any" },
+  { id: "gguf_q4_km",  label: "GGUF Q4_K_M",       bits: 4, base_penalty: 0.050, calibrated: false, hardware: "cpu/any" },
+  { id: "nf4",         label: "NF4 (bitsandbytes, uncalibrated)", bits: 4, base_penalty: 0.070, calibrated: false, hardware: "any" },
+  { id: "gguf_q3_km",  label: "GGUF Q3_K_M (aggressive)", bits: 3, base_penalty: 0.110, calibrated: false, hardware: "cpu/any" },
+  { id: "gguf_q2_k",   label: "GGUF Q2_K (extreme)",       bits: 2, base_penalty: 0.180, calibrated: false, hardware: "cpu/any" },
+];
+const REGIME_BANDS = [
+  { id: "safe",         max_gamma_shift: 0.015, label_code: "safe" },
+  { id: "mild",         max_gamma_shift: 0.04,  label_code: "mild" },
+  { id: "significant",  max_gamma_shift: 0.08,  label_code: "significant" },
+  { id: "cliff",        max_gamma_shift: 1.0,   label_code: "cliff" },
+];
+function bandFor(gammaShift) {
+  for (const b of REGIME_BANDS) if (gammaShift <= b.max_gamma_shift) return b.id;
+  return "cliff";
+}
+// Architecture-specific multiplier on the base quant penalty.
+// More sensitive: small d_head, aggressive GQA ratio, very small models (pre-IH).
+// Less sensitive: large d_head, post-IH, MHA (no GQA pressure).
+function archMultiplier(config) {
+  let mult = 1.0;
+  const n_attn = config.num_attention_heads ?? null;
+  const n_kv   = config.num_key_value_heads ?? n_attn;
+  const hidden = config.hidden_size ?? null;
+  const d_head = config.head_dim ?? (n_attn && hidden ? hidden / n_attn : null);
+  const n_params = inferNParams(config);
+  const hasSWA = typeof config.sliding_window === "number" && config.sliding_window > 0;
+  const hasGQA = n_attn && n_kv && n_kv < n_attn;
+  const gqaRatio = hasGQA ? n_attn / n_kv : 1;
+  // d_head sensitivity (small head = more compression damage)
+  if (d_head !== null) {
+    if (d_head < 64) mult *= 1.5;
+    else if (d_head < 96) mult *= 1.2;
+    else if (d_head < 128) mult *= 1.05;
+    // d_head >= 128: no penalty
+  }
+  // GQA pressure (heavily-shared kv heads = more interference under quant)
+  if (gqaRatio >= 8) mult *= 1.3;
+  else if (gqaRatio >= 4) mult *= 1.15;
+  // SWA: localized attention is somewhat more robust to head-level noise
+  if (hasSWA) mult *= 0.92;
+  // Post-IH (large) models more robust; pre-IH (small) less robust
+  if (n_params !== null) {
+    if (n_params < 1.5e9) mult *= 1.4;       // <1.5B = pre-IH
+    else if (n_params < 4e9) mult *= 1.15;   // borderline
+    else if (n_params >= 30e9) mult *= 0.85; // very large = robust
+    else if (n_params >= 7e9) mult *= 0.95;
+  }
+  return mult;
+}
+function inferNParams(config) {
+  if (typeof config.num_parameters === "number") return config.num_parameters;
+  if (typeof config.n_params === "number") return config.n_params;
+  // Estimate from h × layers × ~12h (transformer rule-of-thumb)
+  const h = config.hidden_size ?? null;
+  const L = config.num_hidden_layers ?? null;
+  const v = config.vocab_size ?? null;
+  if (h && L) {
+    const transformer = 12 * L * h * h;
+    const embed = v ? v * h : 0;
+    return transformer + 2 * embed;
+  }
+  return null;
+}
+// Predict ΔPPL band from γ shift, scaled by model size.
+// Empirical fit (rough): ΔPPL ≈ 8 × γ_shift² × (1 + log10(N)/4).
+// Returns {low, mid, high} estimate as a band (50% uncertainty).
+function predictDeltaPPL(gammaShift, nParams) {
+  if (gammaShift <= 0) return { low: 0, mid: 0, high: 0 };
+  const sizeBoost = nParams ? 1 + Math.log10(nParams / 1e9) / 4 : 1;
+  const mid = 8 * gammaShift * gammaShift * sizeBoost;
+  return {
+    low:  Math.max(0, Math.round((mid * 0.6) * 100) / 100),
+    mid:  Math.round(mid * 100) / 100,
+    high: Math.round((mid * 1.5) * 100) / 100,
+  };
+}
+export function predictQuantShift(config, schemeId) {
+  const scheme = QUANT_SCHEMES.find(s => s.id === schemeId);
+  if (!scheme) return null;
+  const mult = archMultiplier(config);
+  const gammaShift = scheme.base_penalty * mult;
+  const regime = bandFor(gammaShift);
+  const nParams = inferNParams(config);
+  const deltaPPL = predictDeltaPPL(gammaShift, nParams);
+  // Recommendation logic (which scheme to switch to if regime is bad).
+  let recommendCode = null;
+  let recommendScheme = null;
+  if (regime === "cliff") {
+    // Suggest stepping up to next-better: q4_km → q5_km, nf4 → awq, q3 → q4, q2 → q4
+    if (scheme.id === "nf4") { recommendCode = "switch_to_awq"; recommendScheme = "awq"; }
+    else if (scheme.id === "gguf_q4_km") { recommendCode = "switch_to_q5_km"; recommendScheme = "gguf_q5_km"; }
+    else if (scheme.id === "gguf_q3_km") { recommendCode = "switch_to_q4_km"; recommendScheme = "gguf_q4_km"; }
+    else if (scheme.id === "gguf_q2_k") { recommendCode = "switch_to_q4_km"; recommendScheme = "gguf_q4_km"; }
+    else if (scheme.id === "gptq") { recommendCode = "switch_to_awq"; recommendScheme = "awq"; }
+    else recommendCode = "use_higher_bits";
+  } else if (regime === "significant") {
+    if (scheme.id === "nf4") { recommendCode = "consider_awq"; recommendScheme = "awq"; }
+    else recommendCode = "verify_with_eval";
+  }
+  return {
+    scheme: scheme.id,
+    scheme_label: scheme.label,
+    scheme_bits: scheme.bits,
+    scheme_calibrated: scheme.calibrated,
+    arch_multiplier: Math.round(mult * 100) / 100,
+    base_penalty: scheme.base_penalty,
+    gamma_shift: Math.round(gammaShift * 1000) / 1000,
+    regime,
+    delta_ppl: deltaPPL,
+    n_params: nParams,
+    recommend_code: recommendCode,
+    recommend_scheme: recommendScheme,
+  };
+}
+// Batch: predict all schemes for one config. Useful for "show me the trade-offs".
+export function predictAllSchemes(config) {
+  return QUANT_SCHEMES.map(s => predictQuantShift(config, s.id))
+    .filter(Boolean)
+    .sort((a, b) => a.gamma_shift - b.gamma_shift);
+}