Spaces:
Running
v0.7.3: Quant-regime classifier (anti-bullshit pack #5)
Browse filesNEW MODE: ⚖️ Quant — predicts γ-shift and ΔPPL for any (model × quant scheme) combination, architecture-aware.
Solves: HF community widely reports unpredictable quantization cliffs. NF4 might lose 2 PPL on Phi-3 but be fine on Llama-3-8B. Generic claims like "AWQ ~95% retention" are too vague — TAF gives architecture-specific verdict using d_head, GQA ratio, SWA flag, and model size.
NEW
- js/quant_regime.js: pure logic. QUANT_SCHEMES table for 10 schemes (FP8 / int8 / Q8_0 / Q5_K_M / AWQ / GPTQ / Q4_K_M / NF4 / Q3_K_M / Q2_K). predictQuantShift() returns γ_shift × arch multiplier + ΔPPL band + regime band (safe/mild/significant/cliff) + concrete recommendation code.
- predictAllSchemes() ranks all 10 schemes for a given architecture so user sees the full trade-off table.
- HF Hub auto-fetch + paste-config fallback. Two output modes: single (one scheme + breakdown) and compare-all (sorted table).
VIRTUAL SIMULATION
- Llama-3-8B + AWQ → mild (γ=0.022); + NF4 → significant (γ=0.076); + Q8_0 → safe (γ=0.009).
- Phi-3-mini (small d_head) + NF4 → cliff (γ=0.085) with reco "switch to AWQ".
- Pythia-160m + Q3_K_M → cliff (γ=0.185) with reco "switch to Q4_K_M".
- Mistral-7B trade-off table: FP8/Q8_0/int8 → safe; Q5_K_M/AWQ/GPTQ → mild; Q4_K_M/NF4 → significant; Q3_K_M/Q2_K → cliff.
DOCUMENTATION
- Help modal: new v0.7 quant section (4 langs) with problem/solution/use case.
- Inventory modal v0.7 card: new "⚖️ Quant" entry (4 langs).
- modes.tip: now lists 12 modes in 4 langs.
- 583 i18n keys × 4 langs · 0 missing / 0 extra (50 new quant.* keys per lang).
38/38 smoke tests passed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- index.html +36 -0
- js/i18n.js +212 -4
- js/main.js +176 -1
- js/quant_regime.js +147 -0
|
@@ -204,6 +204,9 @@
|
|
| 204 |
<p><strong data-i18n="help.v07.contam.title">🧪 Contamination Prior</strong></p>
|
| 205 |
<p data-i18n="help.v07.contam.body">Bayesian-ish prior on whether a benchmark score is contaminated. Enter your model's training cutoff date → tool rates 20+ popular benchmarks (MMLU, HellaSwag, GSM8K, HumanEval, IFEval, MMLU-Pro, GPQA, AIME, MATH-500, BBH, MUSR…) by P(contamination) based on time gap, corpus inclusion, and known leak history. Open LLM Leaderboard v1 was killed in 2024 after MMLU/HellaSwag scores became contaminated. <em>Use case</em>: decide which scores to trust when comparing two models.</p>
|
| 206 |
|
|
|
|
|
|
|
|
|
|
| 207 |
<h3 data-i18n="help.audit.title">The audit chain</h3>
|
| 208 |
<p data-i18n="help.audit.body">Every result shows the full <strong>Computation Chain</strong> — each formula step with its inputs,
|
| 209 |
output, and interpretation. Click any step to expand. Cite section numbers (§26.1, §19.1, etc.) refer
|
|
@@ -309,6 +312,7 @@
|
|
| 309 |
<li data-i18n="inv.v07.template"><strong>📜 Chat-template</strong> — exact CLI flag so lm-eval doesn't silently halve your accuracy</li>
|
| 310 |
<li data-i18n="inv.v07.arena"><strong>🎯 Arena CI</strong> — recover the confidence intervals Chatbot Arena hides</li>
|
| 311 |
<li data-i18n="inv.v07.contam"><strong>🧪 Contamination</strong> — rate 20+ benchmarks for contamination probability</li>
|
|
|
|
| 312 |
</ul>
|
| 313 |
</details>
|
| 314 |
</div>
|
|
@@ -362,6 +366,7 @@
|
|
| 362 |
<button class="mode-btn" data-mode="template" role="tab" aria-selected="false" data-i18n="modes.template">📜 Chat-template</button>
|
| 363 |
<button class="mode-btn" data-mode="arena" role="tab" aria-selected="false" data-i18n="modes.arena">🎯 Arena CI</button>
|
| 364 |
<button class="mode-btn" data-mode="contam" role="tab" aria-selected="false" data-i18n="modes.contam">🧪 Contamination</button>
|
|
|
|
| 365 |
</div>
|
| 366 |
<p id="mode-desc" class="recipe-desc" data-i18n="modes.desc">
|
| 367 |
<strong>Quickest start</strong>: paste any HuggingFace model id (e.g. <code>meta-llama/Meta-Llama-3-8B</code>),
|
|
@@ -752,6 +757,37 @@
|
|
| 752 |
<div id="contam-output" style="margin-top: 1em;"></div>
|
| 753 |
</section>
|
| 754 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 755 |
<!-- Recipe selector (mode=recipe) -->
|
| 756 |
<section id="recipe-section" style="display:none;">
|
| 757 |
<h2 data-i18n="recipe.title">📋 Recipe</h2>
|
|
|
|
| 204 |
<p><strong data-i18n="help.v07.contam.title">🧪 Contamination Prior</strong></p>
|
| 205 |
<p data-i18n="help.v07.contam.body">Bayesian-ish prior on whether a benchmark score is contaminated. Enter your model's training cutoff date → tool rates 20+ popular benchmarks (MMLU, HellaSwag, GSM8K, HumanEval, IFEval, MMLU-Pro, GPQA, AIME, MATH-500, BBH, MUSR…) by P(contamination) based on time gap, corpus inclusion, and known leak history. Open LLM Leaderboard v1 was killed in 2024 after MMLU/HellaSwag scores became contaminated. <em>Use case</em>: decide which scores to trust when comparing two models.</p>
|
| 206 |
|
| 207 |
+
<p><strong data-i18n="help.v07.quant.title">⚖️ Quant-regime Classifier</strong></p>
|
| 208 |
+
<p data-i18n="help.v07.quant.body">Predicts γ-shift and ΔPPL for any (model × quant scheme: NF4, AWQ, GPTQ, GGUF Q4_K_M / Q5_K_M / Q8_0, int8, FP8, …). Architecture-aware: small d_head + aggressive GQA → more sensitive; calibrated schemes (AWQ) absorb shift better than uncalibrated (NF4). Recommends safer alternatives if a cliff is detected. <em>Use case</em>: before quantizing, predict whether your specific architecture × scheme combo will keep PPL acceptable, with a concrete switch-to suggestion otherwise.</p>
|
| 209 |
+
|
| 210 |
<h3 data-i18n="help.audit.title">The audit chain</h3>
|
| 211 |
<p data-i18n="help.audit.body">Every result shows the full <strong>Computation Chain</strong> — each formula step with its inputs,
|
| 212 |
output, and interpretation. Click any step to expand. Cite section numbers (§26.1, §19.1, etc.) refer
|
|
|
|
| 312 |
<li data-i18n="inv.v07.template"><strong>📜 Chat-template</strong> — exact CLI flag so lm-eval doesn't silently halve your accuracy</li>
|
| 313 |
<li data-i18n="inv.v07.arena"><strong>🎯 Arena CI</strong> — recover the confidence intervals Chatbot Arena hides</li>
|
| 314 |
<li data-i18n="inv.v07.contam"><strong>🧪 Contamination</strong> — rate 20+ benchmarks for contamination probability</li>
|
| 315 |
+
<li data-i18n="inv.v07.quant"><strong>⚖️ Quant</strong> — predict γ shift + ΔPPL for any (model × quant scheme) combo</li>
|
| 316 |
</ul>
|
| 317 |
</details>
|
| 318 |
</div>
|
|
|
|
| 366 |
<button class="mode-btn" data-mode="template" role="tab" aria-selected="false" data-i18n="modes.template">📜 Chat-template</button>
|
| 367 |
<button class="mode-btn" data-mode="arena" role="tab" aria-selected="false" data-i18n="modes.arena">🎯 Arena CI</button>
|
| 368 |
<button class="mode-btn" data-mode="contam" role="tab" aria-selected="false" data-i18n="modes.contam">🧪 Contamination</button>
|
| 369 |
+
<button class="mode-btn" data-mode="quant" role="tab" aria-selected="false" data-i18n="modes.quant">⚖️ Quant</button>
|
| 370 |
</div>
|
| 371 |
<p id="mode-desc" class="recipe-desc" data-i18n="modes.desc">
|
| 372 |
<strong>Quickest start</strong>: paste any HuggingFace model id (e.g. <code>meta-llama/Meta-Llama-3-8B</code>),
|
|
|
|
| 757 |
<div id="contam-output" style="margin-top: 1em;"></div>
|
| 758 |
</section>
|
| 759 |
|
| 760 |
+
<!-- Quant-regime classifier (v0.7.3 anti-bullshit pack #5) -->
|
| 761 |
+
<section id="quant-section" style="display:none;">
|
| 762 |
+
<h2><span data-i18n="quant.title">⚖️ Quant-regime Classifier</span>
|
| 763 |
+
<span class="info"><span class="tooltip" data-i18n="quant.tip">
|
| 764 |
+
Predicts γ-shift (and downstream ΔPPL) for a given (model × quant scheme).
|
| 765 |
+
Generic claims like "AWQ ~95% retention" are too vague — TAF uses
|
| 766 |
+
d_head, GQA ratio, SWA flag, and model size to give an architecture-specific
|
| 767 |
+
verdict. Solves: HF community widely reports unpredictable quant cliffs
|
| 768 |
+
(NF4 -2 PPL on Phi-3 but fine on Llama-3-8B).
|
| 769 |
+
</span></span>
|
| 770 |
+
</h2>
|
| 771 |
+
<p class="recipe-desc" data-i18n="quant.desc">
|
| 772 |
+
<strong>Will quantizing your model break it?</strong> Paste an HF model id, pick a quant scheme — get predicted γ-shift, expected ΔPPL band, and a recommended alternative if it's a cliff. Browser-only, no GPU, no calibration set required.
|
| 773 |
+
</p>
|
| 774 |
+
<div class="form-row">
|
| 775 |
+
<label for="quant-id" data-i18n="quant.id_label">HF model id:</label>
|
| 776 |
+
<input type="text" id="quant-id" placeholder="e.g. meta-llama/Llama-3.2-1B" />
|
| 777 |
+
<button type="button" id="quant-fetch-btn" data-i18n="quant.fetch_btn">📥 Fetch config</button>
|
| 778 |
+
</div>
|
| 779 |
+
<div class="form-row">
|
| 780 |
+
<label for="quant-scheme" data-i18n="quant.scheme_label">Quant scheme:</label>
|
| 781 |
+
<select id="quant-scheme">
|
| 782 |
+
<option value="">— select scheme —</option>
|
| 783 |
+
</select>
|
| 784 |
+
<button type="button" id="quant-run-btn" data-i18n="quant.run_btn">⚖️ Predict</button>
|
| 785 |
+
<button type="button" id="quant-all-btn" class="secondary" data-i18n="quant.all_btn">📊 Compare all schemes</button>
|
| 786 |
+
</div>
|
| 787 |
+
<p id="quant-status" class="recipe-desc" style="font-size:0.92em;"></p>
|
| 788 |
+
<div id="quant-output" style="margin-top: 1em;"></div>
|
| 789 |
+
</section>
|
| 790 |
+
|
| 791 |
<!-- Recipe selector (mode=recipe) -->
|
| 792 |
<section id="recipe-section" style="display:none;">
|
| 793 |
<h2 data-i18n="recipe.title">📋 Recipe</h2>
|
|
@@ -302,6 +302,8 @@ export const TRANSLATIONS = {
|
|
| 302 |
"help.v07.arena.body": "Chatbot Arena strips confidence intervals from its public leaderboard — a 5-Elo gap can be statistically meaningless. Paste raw pairwise vote data (model_a, model_b, winner) → Bradley-Terry MLE + 200-iteration bootstrap → ranked Elos with 95% CIs and a \"statistical ties\" panel listing pairs whose CIs overlap. Try the Load sample button. <em>Use case</em>: before declaring \"model A beats model B\", verify their CIs don't overlap.",
|
| 303 |
"help.v07.contam.title": "🧪 Contamination Prior",
|
| 304 |
"help.v07.contam.body": "Bayesian-ish prior on whether a benchmark score is contaminated. Enter your model's training cutoff date → tool rates 20+ popular benchmarks (MMLU, HellaSwag, GSM8K, HumanEval, IFEval, MMLU-Pro, GPQA, AIME, MATH-500, BBH, MUSR…) by P(contamination) based on time gap, corpus inclusion, and known leak history. Open LLM Leaderboard v1 was killed in 2024 after MMLU/HellaSwag scores became contaminated. <em>Use case</em>: decide which scores to trust when comparing two models.",
|
|
|
|
|
|
|
| 305 |
|
| 306 |
// v0.7 — Inventory modal 5th card
|
| 307 |
"inv.v07.title": "🆕 v0.7 anti-bullshit pack",
|
|
@@ -309,6 +311,56 @@ export const TRANSLATIONS = {
|
|
| 309 |
"inv.v07.template": "<strong>📜 Chat-template</strong> — exact CLI flag so lm-eval doesn't silently halve your accuracy",
|
| 310 |
"inv.v07.arena": "<strong>🎯 Arena CI</strong> — recover the confidence intervals Chatbot Arena hides",
|
| 311 |
"inv.v07.contam": "<strong>🧪 Contamination</strong> — rate 20+ benchmarks for contamination probability",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 312 |
"share.import_desc": "Got a JSON file from someone else's TAF analysis? Load it here to see the verdict + chain locally. Same view as if you'd run it yourself.",
|
| 313 |
"share.import_btn": "📂 Load shared JSON",
|
| 314 |
"synthesis.system": "You are a precise transformer LLM diagnostic assistant. Given pre-computed TAF formula results, write a clear plain-English summary in 4-6 sentences. Cite the section number (§X.Y) for each number you mention. Always give a concrete recommendation. Do NOT invent numbers.",
|
|
@@ -401,7 +453,7 @@ export const TRANSLATIONS = {
|
|
| 401 |
"common.no": "No",
|
| 402 |
|
| 403 |
// Mode tooltips
|
| 404 |
-
"modes.tip": "<strong>
|
| 405 |
"profile.tip": "<strong>One-click full diagnosis</strong>. Paste any HF model id (or pick preset). Tool runs all 5 recipes (long-context, KV-compression, custom-vs-API, budget, hardware) and produces a single <strong>TAF Card</strong> with verdict per dimension + key numbers + architecture classification.<br><br><strong>Use case</strong>: \"I'm evaluating Qwen2.5-32B for production — what's its full viability profile?\" → paste id → Profile → done.",
|
| 406 |
"compare.tip": "<strong>Same recipe, multiple models</strong>. Pick 2-3 candidate models and one recipe. See verdicts in a single comparison table.<br><br><strong>Use case</strong>: \"I need long-context retrieval at 16K — which is best: Llama-3-8B, Mistral-7B, or Qwen-7B?\" → pick 3 + X-2 + 16K → see winner.",
|
| 407 |
|
|
@@ -1035,6 +1087,8 @@ export const TRANSLATIONS = {
|
|
| 1035 |
"help.v07.arena.body": "Chatbot Arena oculta los intervalos de confianza en su leaderboard público — una diferencia de 5 Elo puede ser estadísticamente irrelevante. Pega datos crudos de votos pairwise (model_a, model_b, winner) → MLE Bradley-Terry + bootstrap de 200 iteraciones → Elos ranked con CIs 95% y un panel de \"empates estadísticos\" listando pares cuyos CIs se solapan. Prueba el botón Cargar sample. <em>Caso de uso</em>: antes de afirmar \"modelo A vence a modelo B\", verifica que sus CIs no se solapen.",
|
| 1036 |
"help.v07.contam.title": "🧪 Prior de Contaminación",
|
| 1037 |
"help.v07.contam.body": "Prior bayesiano-ish sobre si un score de benchmark está contaminado. Introduce la fecha cutoff de entrenamiento de tu modelo → la herramienta puntúa 20+ benchmarks populares (MMLU, HellaSwag, GSM8K, HumanEval, IFEval, MMLU-Pro, GPQA, AIME, MATH-500, BBH, MUSR…) por P(contaminación) según gap temporal, inclusión en corpus y historial de leaks conocidos. Open LLM Leaderboard v1 fue cancelado en 2024 tras la contaminación de MMLU/HellaSwag. <em>Caso de uso</em>: decide qué scores te puedes creer al comparar dos modelos.",
|
|
|
|
|
|
|
| 1038 |
|
| 1039 |
// v0.7 — Inventory modal 5ª card
|
| 1040 |
"inv.v07.title": "🆕 Pack anti-bullshit v0.7",
|
|
@@ -1042,6 +1096,56 @@ export const TRANSLATIONS = {
|
|
| 1042 |
"inv.v07.template": "<strong>📜 Chat-template</strong> — flag CLI exacto para que lm-eval no divida tu accuracy entre 2 silenciosamente",
|
| 1043 |
"inv.v07.arena": "<strong>🎯 Arena CI</strong> — recupera los intervalos de confianza que Chatbot Arena oculta",
|
| 1044 |
"inv.v07.contam": "<strong>🧪 Contaminación</strong> — puntúa 20+ benchmarks por probabilidad de contaminación",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1045 |
"share.import_desc": "¿Tienes un fichero JSON del análisis TAF de alguien? Cárgalo aquí para ver el veredicto + cadena localmente. La misma vista que si lo hubieras ejecutado tú.",
|
| 1046 |
"share.import_btn": "📂 Cargar JSON compartido",
|
| 1047 |
"synthesis.system": "Eres un asistente de diagnóstico preciso para LLMs transformer. Dados resultados de fórmulas TAF pre-calculados, escribe un resumen claro en español de 4-6 frases. Cita el número de sección (§X.Y) para cada número que menciones. Da siempre una recomendación concreta. NO inventes números.",
|
|
@@ -1134,7 +1238,7 @@ export const TRANSLATIONS = {
|
|
| 1134 |
"common.no": "No",
|
| 1135 |
|
| 1136 |
// Tooltips de modos
|
| 1137 |
-
"modes.tip": "<strong>
|
| 1138 |
"profile.tip": "<strong>Diagnóstico completo en un click</strong>. Pega cualquier id de modelo HF (o elige preset). La herramienta ejecuta las 5 recetas (contexto largo, compresión KV, custom vs API, presupuesto, hardware) y produce una única <strong>TAF Card</strong> con veredicto por dimensión + números clave + clasificación arquitectónica.<br><br><strong>Caso de uso</strong>: \"Estoy evaluando Qwen2.5-32B para producción — ¿cuál es su perfil completo de viabilidad?\" → pega id → Perfilar → listo.",
|
| 1139 |
"compare.tip": "<strong>Misma receta, múltiples modelos</strong>. Elige 2-3 modelos candidatos y una receta. Ve los veredictos en una única tabla comparativa.<br><br><strong>Caso de uso</strong>: \"Necesito recuperación de contexto largo a 16K — ¿cuál es mejor: Llama-3-8B, Mistral-7B o Qwen-7B?\" → elige 3 + X-2 + 16K → ve el ganador.",
|
| 1140 |
|
|
@@ -1632,6 +1736,8 @@ export const TRANSLATIONS = {
|
|
| 1632 |
"help.v07.arena.body": "Chatbot Arena masque les intervalles de confiance de son leaderboard public — un écart de 5 Elo peut être statistiquement insignifiant. Collez des données brutes de votes pairwise (model_a, model_b, winner) → MLE Bradley-Terry + bootstrap 200 itérations → Elos classés avec CIs 95% et un panneau \"égalités statistiques\" listant les paires dont les CIs se chevauchent. Essayez le bouton Charger échantillon. <em>Cas d'usage</em> : avant de déclarer \"modèle A bat modèle B\", vérifiez que leurs CIs ne se chevauchent pas.",
|
| 1633 |
"help.v07.contam.title": "🧪 Prior de Contamination",
|
| 1634 |
"help.v07.contam.body": "Prior bayésien-ish sur la contamination d'un score de benchmark. Saisissez la date de cutoff d'entraînement de votre modèle → l'outil note 20+ benchmarks populaires (MMLU, HellaSwag, GSM8K, HumanEval, IFEval, MMLU-Pro, GPQA, AIME, MATH-500, BBH, MUSR…) par P(contamination) selon l'écart temporel, l'inclusion dans corpus et l'historique de leaks connus. Open LLM Leaderboard v1 a été tué en 2024 après la contamination de MMLU/HellaSwag. <em>Cas d'usage</em> : décidez quels scores croire en comparant deux modèles.",
|
|
|
|
|
|
|
| 1635 |
|
| 1636 |
// v0.7 — Inventory modal 5ème card
|
| 1637 |
"inv.v07.title": "🆕 Pack anti-bullshit v0.7",
|
|
@@ -1639,6 +1745,56 @@ export const TRANSLATIONS = {
|
|
| 1639 |
"inv.v07.template": "<strong>📜 Chat-template</strong> — flag CLI exact pour que lm-eval ne divise pas votre accuracy par 2 en silence",
|
| 1640 |
"inv.v07.arena": "<strong>🎯 Arena CI</strong> — récupère les intervalles de confiance que Chatbot Arena cache",
|
| 1641 |
"inv.v07.contam": "<strong>🧪 Contamination</strong> — note 20+ benchmarks par probabilité de contamination",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1642 |
"share.import_desc": "Vous avez un fichier JSON de l'analyse TAF de quelqu'un ? Chargez-le ici pour voir le verdict + la chaîne localement. La même vue que si vous l'aviez exécuté vous-même.",
|
| 1643 |
"share.import_btn": "📂 Charger JSON partagé",
|
| 1644 |
"synthesis.system": "Vous êtes un assistant de diagnostic précis pour LLMs transformer. Étant donné des résultats de formules TAF pré-calculés, écrivez un résumé clair en français de 4-6 phrases. Citez le numéro de section (§X.Y) pour chaque nombre mentionné. Donnez toujours une recommandation concrète. N'INVENTEZ PAS de nombres.",
|
|
@@ -1731,7 +1887,7 @@ export const TRANSLATIONS = {
|
|
| 1731 |
"common.no": "Non",
|
| 1732 |
|
| 1733 |
// Tooltips des modes
|
| 1734 |
-
"modes.tip": "<strong>
|
| 1735 |
"profile.tip": "<strong>Diagnostic complet en un clic</strong>. Collez n'importe quel id de modèle HF (ou choisissez préréglage). L'outil exécute les 5 recettes (contexte long, compression KV, custom vs API, budget, hardware) et produit une <strong>TAF Card</strong> unique avec verdict par dimension + nombres clés + classification architecturale.<br><br><strong>Cas d'usage</strong>: « J'évalue Qwen2.5-32B pour la production — quel est son profil complet de viabilité ? » → collez id → Profiler → fait.",
|
| 1736 |
"compare.tip": "<strong>Même recette, plusieurs modèles</strong>. Choisissez 2-3 modèles candidats et une recette. Voyez les verdicts dans un seul tableau comparatif.<br><br><strong>Cas d'usage</strong>: « J'ai besoin de récupération longue contexte à 16K — quel est le meilleur : Llama-3-8B, Mistral-7B ou Qwen-7B ? » → choisissez 3 + X-2 + 16K → voyez le gagnant.",
|
| 1737 |
|
|
@@ -2229,6 +2385,8 @@ export const TRANSLATIONS = {
|
|
| 2229 |
"help.v07.arena.body": "Chatbot Arena 在公开排行榜中删除了置信区间 — 5 Elo 的差距在统计上可能毫无意义。粘贴原始 pairwise 投票数据(model_a, model_b, winner)→ Bradley-Terry MLE + 200 次 bootstrap → 排序 Elo + 95% CI + \"统计并列\" 面板,列出 CI 重叠的配对。尝试加载样本按钮。<em>用例</em>:宣称 \"模型 A 胜过模型 B\" 之前,验证它们的 CI 不重叠。",
|
| 2230 |
"help.v07.contam.title": "🧪 污染先验",
|
| 2231 |
"help.v07.contam.body": "对 benchmark 分数是否被污染做贝叶斯式的先验估计。输入模型训练 cutoff 日期 → 工具按 P(污染) 评估 20+ 主流 benchmark(MMLU、HellaSwag、GSM8K、HumanEval、IFEval、MMLU-Pro、GPQA、AIME、MATH-500、BBH、MUSR…),基于时间差距、语料库纳入和已知泄漏历史。Open LLM Leaderboard v1 在 2024 年因 MMLU/HellaSwag 分数被污染而停用。<em>用例</em>:比较两个模型时决定相信哪些分数。",
|
|
|
|
|
|
|
| 2232 |
|
| 2233 |
// v0.7 — Inventory 模态第 5 卡
|
| 2234 |
"inv.v07.title": "🆕 v0.7 anti-bullshit 套件",
|
|
@@ -2236,6 +2394,56 @@ export const TRANSLATIONS = {
|
|
| 2236 |
"inv.v07.template": "<strong>📜 Chat-template</strong> — 精确 CLI flag,让 lm-eval 不会静默对半你的 accuracy",
|
| 2237 |
"inv.v07.arena": "<strong>🎯 Arena CI</strong> — 恢复 Chatbot Arena 隐藏的置信区间",
|
| 2238 |
"inv.v07.contam": "<strong>🧪 污染</strong> — 按污染概率对 20+ benchmark 评级",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2239 |
"share.import_desc": "有他人 TAF 分析的 JSON 文件? 在这里加载以本地查看判定 + 链。与您自己运行的视图相同。",
|
| 2240 |
"share.import_btn": "📂 加载共享的 JSON",
|
| 2241 |
"synthesis.system": "您是 transformer LLM 的精确诊断助手。给定预先计算的 TAF 公式结果,用 4-6 句中文写出清晰的摘要。为每个提到的数字引用章节号 (§X.Y)。始终给出具体建议。不要编造数字。",
|
|
@@ -2328,7 +2536,7 @@ export const TRANSLATIONS = {
|
|
| 2328 |
"common.no": "否",
|
| 2329 |
|
| 2330 |
// 模式提示
|
| 2331 |
-
"modes.tip": "<strong>十
|
| 2332 |
"profile.tip": "<strong>一键完整诊断</strong>。粘贴任意 HF 模型 id (或选择预设)。工具运行所有 5 个配方 (长上下文、KV 压缩、自定义 vs API、预算、硬件),生成单个 <strong>TAF 卡</strong>,显示每个维度的判定 + 关键数字 + 架构分类。<br><br><strong>用例</strong>: \"我正在为生产评估 Qwen2.5-32B — 它的完整可行性概况是什么?\" → 粘贴 id → 画像 → 完成。",
|
| 2333 |
"compare.tip": "<strong>同一配方,多个模型</strong>。选择 2-3 个候选模型和一个配方。在单个比较表中查看判定。<br><br><strong>用例</strong>: \"我需要在 16K 进行长上下文检索 — 哪个最好: Llama-3-8B、Mistral-7B 或 Qwen-7B?\" → 选择 3 个 + X-2 + 16K → 看赢家。",
|
| 2334 |
|
|
|
|
| 302 |
"help.v07.arena.body": "Chatbot Arena strips confidence intervals from its public leaderboard — a 5-Elo gap can be statistically meaningless. Paste raw pairwise vote data (model_a, model_b, winner) → Bradley-Terry MLE + 200-iteration bootstrap → ranked Elos with 95% CIs and a \"statistical ties\" panel listing pairs whose CIs overlap. Try the Load sample button. <em>Use case</em>: before declaring \"model A beats model B\", verify their CIs don't overlap.",
|
| 303 |
"help.v07.contam.title": "🧪 Contamination Prior",
|
| 304 |
"help.v07.contam.body": "Bayesian-ish prior on whether a benchmark score is contaminated. Enter your model's training cutoff date → tool rates 20+ popular benchmarks (MMLU, HellaSwag, GSM8K, HumanEval, IFEval, MMLU-Pro, GPQA, AIME, MATH-500, BBH, MUSR…) by P(contamination) based on time gap, corpus inclusion, and known leak history. Open LLM Leaderboard v1 was killed in 2024 after MMLU/HellaSwag scores became contaminated. <em>Use case</em>: decide which scores to trust when comparing two models.",
|
| 305 |
+
"help.v07.quant.title": "⚖️ Quant-regime Classifier",
|
| 306 |
+
"help.v07.quant.body": "Predicts γ-shift and ΔPPL for any (model × quant scheme: NF4, AWQ, GPTQ, GGUF Q4_K_M / Q5_K_M / Q8_0, int8, FP8, …). Architecture-aware: small d_head + aggressive GQA → more sensitive; calibrated schemes (AWQ) absorb shift better than uncalibrated (NF4). Recommends safer alternatives if a cliff is detected. <em>Use case</em>: before quantizing, predict whether your specific architecture × scheme combo will keep PPL acceptable, with a concrete switch-to suggestion otherwise.",
|
| 307 |
|
| 308 |
// v0.7 — Inventory modal 5th card
|
| 309 |
"inv.v07.title": "🆕 v0.7 anti-bullshit pack",
|
|
|
|
| 311 |
"inv.v07.template": "<strong>📜 Chat-template</strong> — exact CLI flag so lm-eval doesn't silently halve your accuracy",
|
| 312 |
"inv.v07.arena": "<strong>🎯 Arena CI</strong> — recover the confidence intervals Chatbot Arena hides",
|
| 313 |
"inv.v07.contam": "<strong>🧪 Contamination</strong> — rate 20+ benchmarks for contamination probability",
|
| 314 |
+
"inv.v07.quant": "<strong>⚖️ Quant</strong> — predict γ shift + ΔPPL for any (model × quant scheme) combo",
|
| 315 |
+
|
| 316 |
+
// v0.7.3 — anti-bullshit pack #5: Quant-regime classifier
|
| 317 |
+
"modes.quant": "⚖️ Quant",
|
| 318 |
+
"mode_desc.quant": "Predicts γ-shift and ΔPPL for any (model × quant scheme). Architecture-aware: small d_head + GQA → more sensitive. Recommends safer alternatives if a cliff is detected.",
|
| 319 |
+
"quant.title": "⚖️ Quant-regime Classifier",
|
| 320 |
+
"quant.tip": "Predicts γ-shift (and downstream ΔPPL) for a given (model × quant scheme). Generic claims like 'AWQ ~95% retention' are too vague — TAF uses d_head, GQA ratio, SWA flag, and model size to give an architecture-specific verdict. Solves: HF community widely reports unpredictable quant cliffs (NF4 -2 PPL on Phi-3 but fine on Llama-3-8B).",
|
| 321 |
+
"quant.desc": "<strong>Will quantizing your model break it?</strong> Paste an HF model id, pick a quant scheme — get predicted γ-shift, expected ΔPPL band, and a recommended alternative if it's a cliff. Browser-only, no GPU, no calibration set required.",
|
| 322 |
+
"quant.id_label": "HF model id:",
|
| 323 |
+
"quant.fetch_btn": "📥 Fetch config",
|
| 324 |
+
"quant.scheme_label": "Quant scheme:",
|
| 325 |
+
"quant.run_btn": "⚖️ Predict",
|
| 326 |
+
"quant.all_btn": "📊 Compare all schemes",
|
| 327 |
+
"quant.regime.safe": "✅ SAFE",
|
| 328 |
+
"quant.regime.mild": "✅ MILD COMPRESSION",
|
| 329 |
+
"quant.regime.significant": "⚠ SIGNIFICANT DEGRADATION",
|
| 330 |
+
"quant.regime.cliff": "❌ HEAVY CLIFF",
|
| 331 |
+
"quant.label.gamma_shift": "γ shift",
|
| 332 |
+
"quant.label.delta_ppl": "ΔPPL (est.)",
|
| 333 |
+
"quant.label.arch_mult": "Arch multiplier",
|
| 334 |
+
"quant.section.breakdown": "Breakdown",
|
| 335 |
+
"quant.section.reco": "Recommendation",
|
| 336 |
+
"quant.section.compare": "All schemes (sorted by safety)",
|
| 337 |
+
"quant.field.scheme": "Scheme",
|
| 338 |
+
"quant.field.calibrated": "calibrated",
|
| 339 |
+
"quant.field.uncalibrated": "uncalibrated",
|
| 340 |
+
"quant.field.base_penalty": "Base penalty",
|
| 341 |
+
"quant.field.arch_mult_full": "Architecture multiplier",
|
| 342 |
+
"quant.field.gamma_shift": "Predicted γ shift",
|
| 343 |
+
"quant.field.ppl_band": "ΔPPL band (est.)",
|
| 344 |
+
"quant.field.params": "Parameters",
|
| 345 |
+
"quant.col.scheme": "Scheme",
|
| 346 |
+
"quant.col.bits": "Bits",
|
| 347 |
+
"quant.col.gamma_shift": "γ shift",
|
| 348 |
+
"quant.col.ppl_band": "ΔPPL band",
|
| 349 |
+
"quant.col.regime": "Regime",
|
| 350 |
+
"quant.reco.switch_to_awq": "<strong>Switch to {scheme}</strong> — calibrated 4-bit handles small d_head + GQA much better than NF4. Expected ΔPPL drops ~2-3×.",
|
| 351 |
+
"quant.reco.switch_to_q5_km": "<strong>Switch to {scheme}</strong> — Q5 keeps more head dimensions intact at low cost (only ~25% bigger file).",
|
| 352 |
+
"quant.reco.switch_to_q4_km": "<strong>Switch to {scheme}</strong> — Q3/Q2 are too aggressive for this architecture.",
|
| 353 |
+
"quant.reco.consider_awq": "<strong>Consider {scheme}</strong> — calibration meaningfully reduces γ-shift on this architecture.",
|
| 354 |
+
"quant.reco.use_higher_bits": "<strong>Use higher-bit alternative</strong> — this architecture cannot absorb 4-bit cleanly. Try 5- or 8-bit.",
|
| 355 |
+
"quant.reco.verify_with_eval": "<strong>Verify with a real eval</strong> — predicted shift is borderline. Run NIAH at your target context before deploying.",
|
| 356 |
+
"quant.reco.no_action": "No action needed — quantization is safe for this architecture.",
|
| 357 |
+
"quant.summary.headline_all": "All schemes for <code>{modelId}</code>",
|
| 358 |
+
"quant.status.empty_id": "⚠ Enter a model id (e.g. meta-llama/Llama-3.2-1B).",
|
| 359 |
+
"quant.status.fetching": "⏳ Fetching config.json for {modelId}...",
|
| 360 |
+
"quant.status.fetched": "✅ Config fetched for {modelId}. Pick a scheme and click Predict (or Compare all schemes).",
|
| 361 |
+
"quant.status.no_scheme": "⚠ Pick a quant scheme from the dropdown.",
|
| 362 |
+
"quant.status.done": "✅ Predicted regime: {regime}",
|
| 363 |
+
"quant.status.done_all": "✅ Compared {n} schemes — sorted by safety.",
|
| 364 |
"share.import_desc": "Got a JSON file from someone else's TAF analysis? Load it here to see the verdict + chain locally. Same view as if you'd run it yourself.",
|
| 365 |
"share.import_btn": "📂 Load shared JSON",
|
| 366 |
"synthesis.system": "You are a precise transformer LLM diagnostic assistant. Given pre-computed TAF formula results, write a clear plain-English summary in 4-6 sentences. Cite the section number (§X.Y) for each number you mention. Always give a concrete recommendation. Do NOT invent numbers.",
|
|
|
|
| 453 |
"common.no": "No",
|
| 454 |
|
| 455 |
// Mode tooltips
|
| 456 |
+
"modes.tip": "<strong>Twelve ways to use the tool</strong>.<br><strong>📇 Profile</strong>: paste a model id → 5-recipe TAF Card.<br><strong>🆚 Compare</strong>: 2-3 models side-by-side on one recipe.<br><strong>🔍 Inspect config</strong>: paste raw config.json → full Profile.<br><strong>💬 Ask</strong>: free-form question, browser LLM picks the recipe.<br><strong>📋 Recipe</strong>: manual selection with full form control.<br><strong>🩺 Diagnose CLI</strong>: generate Python command for local γ measurement.<br><strong>📊 Phase diagram</strong>: 23-model panel on (log θ, γ) plane.<br><strong>🪟 Unmask</strong>: detect misleading max_position_embeddings (SWA / YaRN / RoPE-scaling).<br><strong>📜 Chat-template</strong>: detect family + give exact CLI flag for lm-eval / vLLM / transformers.<br><strong>🎯 Arena CI</strong>: reconstruct confidence intervals from raw pairwise vote data; detect statistical ties Arena hides.<br><strong>🧪 Contamination</strong>: rate 20+ benchmarks for contamination probability based on training cutoff vs release date.<br><strong>⚖️ Quant</strong>: predict γ-shift and ΔPPL for any (model × quant scheme); recommend safer alternative on cliff.",
|
| 457 |
"profile.tip": "<strong>One-click full diagnosis</strong>. Paste any HF model id (or pick preset). Tool runs all 5 recipes (long-context, KV-compression, custom-vs-API, budget, hardware) and produces a single <strong>TAF Card</strong> with verdict per dimension + key numbers + architecture classification.<br><br><strong>Use case</strong>: \"I'm evaluating Qwen2.5-32B for production — what's its full viability profile?\" → paste id → Profile → done.",
|
| 458 |
"compare.tip": "<strong>Same recipe, multiple models</strong>. Pick 2-3 candidate models and one recipe. See verdicts in a single comparison table.<br><br><strong>Use case</strong>: \"I need long-context retrieval at 16K — which is best: Llama-3-8B, Mistral-7B, or Qwen-7B?\" → pick 3 + X-2 + 16K → see winner.",
|
| 459 |
|
|
|
|
| 1087 |
"help.v07.arena.body": "Chatbot Arena oculta los intervalos de confianza en su leaderboard público — una diferencia de 5 Elo puede ser estadísticamente irrelevante. Pega datos crudos de votos pairwise (model_a, model_b, winner) → MLE Bradley-Terry + bootstrap de 200 iteraciones → Elos ranked con CIs 95% y un panel de \"empates estadísticos\" listando pares cuyos CIs se solapan. Prueba el botón Cargar sample. <em>Caso de uso</em>: antes de afirmar \"modelo A vence a modelo B\", verifica que sus CIs no se solapen.",
|
| 1088 |
"help.v07.contam.title": "🧪 Prior de Contaminación",
|
| 1089 |
"help.v07.contam.body": "Prior bayesiano-ish sobre si un score de benchmark está contaminado. Introduce la fecha cutoff de entrenamiento de tu modelo → la herramienta puntúa 20+ benchmarks populares (MMLU, HellaSwag, GSM8K, HumanEval, IFEval, MMLU-Pro, GPQA, AIME, MATH-500, BBH, MUSR…) por P(contaminación) según gap temporal, inclusión en corpus y historial de leaks conocidos. Open LLM Leaderboard v1 fue cancelado en 2024 tras la contaminación de MMLU/HellaSwag. <em>Caso de uso</em>: decide qué scores te puedes creer al comparar dos modelos.",
|
| 1090 |
+
"help.v07.quant.title": "⚖️ Clasificador de régimen de cuantización",
|
| 1091 |
+
"help.v07.quant.body": "Predice γ-shift y ΔPPL para cualquier (modelo × esquema de cuantización: NF4, AWQ, GPTQ, GGUF Q4_K_M / Q5_K_M / Q8_0, int8, FP8…). Arch-aware: d_head pequeño + GQA agresivo → más sensible; los esquemas calibrados (AWQ) absorben mejor el shift que los no calibrados (NF4). Recomienda alternativas más seguras si detecta cliff. <em>Caso de uso</em>: antes de cuantizar, predice si tu combo arquitectura × esquema mantendrá la PPL aceptable, con sugerencia concreta de switch si no.",
|
| 1092 |
|
| 1093 |
// v0.7 — Inventory modal 5ª card
|
| 1094 |
"inv.v07.title": "🆕 Pack anti-bullshit v0.7",
|
|
|
|
| 1096 |
"inv.v07.template": "<strong>📜 Chat-template</strong> — flag CLI exacto para que lm-eval no divida tu accuracy entre 2 silenciosamente",
|
| 1097 |
"inv.v07.arena": "<strong>🎯 Arena CI</strong> — recupera los intervalos de confianza que Chatbot Arena oculta",
|
| 1098 |
"inv.v07.contam": "<strong>🧪 Contaminación</strong> — puntúa 20+ benchmarks por probabilidad de contaminación",
|
| 1099 |
+
"inv.v07.quant": "<strong>⚖️ Quant</strong> — predice γ-shift + ΔPPL para cualquier combo (modelo × esquema de cuantización)",
|
| 1100 |
+
|
| 1101 |
+
// v0.7.3 — anti-bullshit pack #5: Quant-regime classifier
|
| 1102 |
+
"modes.quant": "⚖️ Quant",
|
| 1103 |
+
"mode_desc.quant": "Predice γ-shift y ΔPPL para cualquier (modelo × esquema de cuantización). Arch-aware: d_head pequeño + GQA → más sensible. Recomienda alternativas más seguras si detecta cliff.",
|
| 1104 |
+
"quant.title": "⚖️ Clasificador de régimen de cuantización",
|
| 1105 |
+
"quant.tip": "Predice γ-shift (y la ΔPPL resultante) para un par (modelo × esquema). Claims genéricos como 'AWQ ~95% retención' son demasiado vagos — TAF usa d_head, ratio GQA, flag SWA y tamaño del modelo para dar veredicto arquitectura-específico. Resuelve: la comunidad HF reporta cliffs de cuantización impredecibles (NF4 -2 PPL en Phi-3 pero bien en Llama-3-8B).",
|
| 1106 |
+
"quant.desc": "<strong>¿Cuantizar romperá tu modelo?</strong> Pega un id HF, elige esquema de cuantización — obtén γ-shift predicho, banda ΔPPL esperada y alternativa recomendada si es un cliff. Solo navegador, sin GPU, sin set de calibración.",
|
| 1107 |
+
"quant.id_label": "ID modelo HF:",
|
| 1108 |
+
"quant.fetch_btn": "📥 Fetch config",
|
| 1109 |
+
"quant.scheme_label": "Esquema cuant:",
|
| 1110 |
+
"quant.run_btn": "⚖️ Predecir",
|
| 1111 |
+
"quant.all_btn": "📊 Comparar todos los esquemas",
|
| 1112 |
+
"quant.regime.safe": "✅ SEGURO",
|
| 1113 |
+
"quant.regime.mild": "✅ COMPRESIÓN LEVE",
|
| 1114 |
+
"quant.regime.significant": "⚠ DEGRADACIÓN SIGNIFICATIVA",
|
| 1115 |
+
"quant.regime.cliff": "❌ CLIFF FUERTE",
|
| 1116 |
+
"quant.label.gamma_shift": "γ shift",
|
| 1117 |
+
"quant.label.delta_ppl": "ΔPPL (est.)",
|
| 1118 |
+
"quant.label.arch_mult": "Multiplicador arch",
|
| 1119 |
+
"quant.section.breakdown": "Desglose",
|
| 1120 |
+
"quant.section.reco": "Recomendación",
|
| 1121 |
+
"quant.section.compare": "Todos los esquemas (ordenados por seguridad)",
|
| 1122 |
+
"quant.field.scheme": "Esquema",
|
| 1123 |
+
"quant.field.calibrated": "calibrado",
|
| 1124 |
+
"quant.field.uncalibrated": "no calibrado",
|
| 1125 |
+
"quant.field.base_penalty": "Penalización base",
|
| 1126 |
+
"quant.field.arch_mult_full": "Multiplicador arquitectónico",
|
| 1127 |
+
"quant.field.gamma_shift": "γ shift predicho",
|
| 1128 |
+
"quant.field.ppl_band": "Banda ΔPPL (est.)",
|
| 1129 |
+
"quant.field.params": "Parámetros",
|
| 1130 |
+
"quant.col.scheme": "Esquema",
|
| 1131 |
+
"quant.col.bits": "Bits",
|
| 1132 |
+
"quant.col.gamma_shift": "γ shift",
|
| 1133 |
+
"quant.col.ppl_band": "Banda ΔPPL",
|
| 1134 |
+
"quant.col.regime": "Régimen",
|
| 1135 |
+
"quant.reco.switch_to_awq": "<strong>Cambia a {scheme}</strong> — el 4-bit calibrado maneja d_head pequeño + GQA mucho mejor que NF4. ΔPPL esperada cae ~2-3×.",
|
| 1136 |
+
"quant.reco.switch_to_q5_km": "<strong>Cambia a {scheme}</strong> — Q5 mantiene más dimensiones de head intactas a bajo coste (solo ~25% más grande).",
|
| 1137 |
+
"quant.reco.switch_to_q4_km": "<strong>Cambia a {scheme}</strong> — Q3/Q2 son demasiado agresivos para esta arquitectura.",
|
| 1138 |
+
"quant.reco.consider_awq": "<strong>Considera {scheme}</strong> — la calibración reduce γ-shift significativamente en esta arquitectura.",
|
| 1139 |
+
"quant.reco.use_higher_bits": "<strong>Usa alternativa de mayor bit</strong> — esta arquitectura no absorbe 4-bit limpiamente. Prueba 5 u 8-bit.",
|
| 1140 |
+
"quant.reco.verify_with_eval": "<strong>Verifica con eval real</strong> — el shift predicho está en el límite. Corre NIAH a tu contexto objetivo antes de desplegar.",
|
| 1141 |
+
"quant.reco.no_action": "No requiere acción — la cuantización es segura para esta arquitectura.",
|
| 1142 |
+
"quant.summary.headline_all": "Todos los esquemas para <code>{modelId}</code>",
|
| 1143 |
+
"quant.status.empty_id": "⚠ Introduce un model id (ej. meta-llama/Llama-3.2-1B).",
|
| 1144 |
+
"quant.status.fetching": "⏳ Obteniendo config.json para {modelId}...",
|
| 1145 |
+
"quant.status.fetched": "✅ Config obtenido para {modelId}. Elige un esquema y click Predecir (o Comparar todos).",
|
| 1146 |
+
"quant.status.no_scheme": "⚠ Elige un esquema de cuantización del dropdown.",
|
| 1147 |
+
"quant.status.done": "✅ Régimen predicho: {regime}",
|
| 1148 |
+
"quant.status.done_all": "✅ Comparados {n} esquemas — ordenados por seguridad.",
|
| 1149 |
"share.import_desc": "¿Tienes un fichero JSON del análisis TAF de alguien? Cárgalo aquí para ver el veredicto + cadena localmente. La misma vista que si lo hubieras ejecutado tú.",
|
| 1150 |
"share.import_btn": "📂 Cargar JSON compartido",
|
| 1151 |
"synthesis.system": "Eres un asistente de diagnóstico preciso para LLMs transformer. Dados resultados de fórmulas TAF pre-calculados, escribe un resumen claro en español de 4-6 frases. Cita el número de sección (§X.Y) para cada número que menciones. Da siempre una recomendación concreta. NO inventes números.",
|
|
|
|
| 1238 |
"common.no": "No",
|
| 1239 |
|
| 1240 |
// Tooltips de modos
|
| 1241 |
+
"modes.tip": "<strong>Doce formas de usar la herramienta</strong>.<br><strong>📇 Perfil</strong>: pega un id → TAF Card de 5 recetas.<br><strong>🆚 Comparar</strong>: 2-3 modelos lado a lado en una receta.<br><strong>🔍 Inspeccionar config</strong>: pega config.json crudo → Perfil completo.<br><strong>💬 Pregunta</strong>: pregunta libre, el LLM del navegador elige la receta.<br><strong>📋 Receta</strong>: selección manual con control total del formulario.<br><strong>🩺 Diagnóstico CLI</strong>: genera comando Python para medir γ localmente.<br><strong>📊 Diagrama de fase</strong>: panel de 23 modelos en plano (log θ, γ).<br><strong>🪟 Desenmascarar</strong>: detecta max_position_embeddings engañoso (SWA / YaRN / RoPE-scaling).<br><strong>📜 Chat-template</strong>: detecta familia + da el flag CLI exacto para lm-eval / vLLM / transformers.<br><strong>🎯 Arena CI</strong>: reconstruye intervalos de confianza desde votos pairwise crudos; detecta empates estadísticos que Arena oculta.<br><strong>🧪 Contaminación</strong>: puntúa 20+ benchmarks por probabilidad de contaminación según cutoff de entrenamiento vs fecha de release.<br><strong>⚖️ Quant</strong>: predice γ-shift y ΔPPL para cualquier (modelo × esquema de cuantización); recomienda alternativa segura si hay cliff.",
|
| 1242 |
"profile.tip": "<strong>Diagnóstico completo en un click</strong>. Pega cualquier id de modelo HF (o elige preset). La herramienta ejecuta las 5 recetas (contexto largo, compresión KV, custom vs API, presupuesto, hardware) y produce una única <strong>TAF Card</strong> con veredicto por dimensión + números clave + clasificación arquitectónica.<br><br><strong>Caso de uso</strong>: \"Estoy evaluando Qwen2.5-32B para producción — ¿cuál es su perfil completo de viabilidad?\" → pega id → Perfilar → listo.",
|
| 1243 |
"compare.tip": "<strong>Misma receta, múltiples modelos</strong>. Elige 2-3 modelos candidatos y una receta. Ve los veredictos en una única tabla comparativa.<br><br><strong>Caso de uso</strong>: \"Necesito recuperación de contexto largo a 16K — ¿cuál es mejor: Llama-3-8B, Mistral-7B o Qwen-7B?\" → elige 3 + X-2 + 16K → ve el ganador.",
|
| 1244 |
|
|
|
|
| 1736 |
"help.v07.arena.body": "Chatbot Arena masque les intervalles de confiance de son leaderboard public — un écart de 5 Elo peut être statistiquement insignifiant. Collez des données brutes de votes pairwise (model_a, model_b, winner) → MLE Bradley-Terry + bootstrap 200 itérations → Elos classés avec CIs 95% et un panneau \"égalités statistiques\" listant les paires dont les CIs se chevauchent. Essayez le bouton Charger échantillon. <em>Cas d'usage</em> : avant de déclarer \"modèle A bat modèle B\", vérifiez que leurs CIs ne se chevauchent pas.",
|
| 1737 |
"help.v07.contam.title": "🧪 Prior de Contamination",
|
| 1738 |
"help.v07.contam.body": "Prior bayésien-ish sur la contamination d'un score de benchmark. Saisissez la date de cutoff d'entraînement de votre modèle → l'outil note 20+ benchmarks populaires (MMLU, HellaSwag, GSM8K, HumanEval, IFEval, MMLU-Pro, GPQA, AIME, MATH-500, BBH, MUSR…) par P(contamination) selon l'écart temporel, l'inclusion dans corpus et l'historique de leaks connus. Open LLM Leaderboard v1 a été tué en 2024 après la contamination de MMLU/HellaSwag. <em>Cas d'usage</em> : décidez quels scores croire en comparant deux modèles.",
|
| 1739 |
+
"help.v07.quant.title": "⚖️ Classificateur de régime de quantification",
|
| 1740 |
+
"help.v07.quant.body": "Prédit le γ-shift et ΔPPL pour tout (modèle × schéma de quantification : NF4, AWQ, GPTQ, GGUF Q4_K_M / Q5_K_M / Q8_0, int8, FP8…). Arch-aware : petit d_head + GQA agressif → plus sensible ; les schémas calibrés (AWQ) absorbent mieux le shift que les non calibrés (NF4). Recommande des alternatives plus sûres si un cliff est détecté. <em>Cas d'usage</em> : avant de quantifier, prédisez si votre combo architecture × schéma maintiendra la PPL acceptable, avec une suggestion concrète de switch sinon.",
|
| 1741 |
|
| 1742 |
// v0.7 — Inventory modal 5ème card
|
| 1743 |
"inv.v07.title": "🆕 Pack anti-bullshit v0.7",
|
|
|
|
| 1745 |
"inv.v07.template": "<strong>📜 Chat-template</strong> — flag CLI exact pour que lm-eval ne divise pas votre accuracy par 2 en silence",
|
| 1746 |
"inv.v07.arena": "<strong>🎯 Arena CI</strong> — récupère les intervalles de confiance que Chatbot Arena cache",
|
| 1747 |
"inv.v07.contam": "<strong>🧪 Contamination</strong> — note 20+ benchmarks par probabilité de contamination",
|
| 1748 |
+
"inv.v07.quant": "<strong>⚖️ Quant</strong> — prédit le γ-shift + ΔPPL pour tout combo (modèle × schéma de quantification)",
|
| 1749 |
+
|
| 1750 |
+
// v0.7.3 — anti-bullshit pack #5: Quant-regime classifier
|
| 1751 |
+
"modes.quant": "⚖️ Quant",
|
| 1752 |
+
"mode_desc.quant": "Prédit le γ-shift et ΔPPL pour tout (modèle × schéma de quantification). Arch-aware : petit d_head + GQA → plus sensible. Recommande des alternatives plus sûres si un cliff est détecté.",
|
| 1753 |
+
"quant.title": "⚖️ Classificateur de régime de quantification",
|
| 1754 |
+
"quant.tip": "Prédit le γ-shift (et la ΔPPL résultante) pour une paire (modèle × schéma). Les claims génériques comme 'AWQ ~95% retention' sont trop vagues — TAF utilise d_head, ratio GQA, flag SWA et taille du modèle pour donner un verdict arch-spécifique. Résout : la communauté HF rapporte des cliffs de quantification imprédictibles (NF4 -2 PPL sur Phi-3 mais OK sur Llama-3-8B).",
|
| 1755 |
+
"quant.desc": "<strong>La quantification cassera-t-elle votre modèle ?</strong> Collez un id HF, choisissez un schéma — obtenez le γ-shift prédit, la bande ΔPPL attendue et une alternative recommandée si c'est un cliff. Navigateur uniquement, sans GPU, sans set de calibration.",
|
| 1756 |
+
"quant.id_label": "ID modèle HF :",
|
| 1757 |
+
"quant.fetch_btn": "📥 Récupérer config",
|
| 1758 |
+
"quant.scheme_label": "Schéma quant :",
|
| 1759 |
+
"quant.run_btn": "⚖️ Prédire",
|
| 1760 |
+
"quant.all_btn": "📊 Comparer tous les schémas",
|
| 1761 |
+
"quant.regime.safe": "✅ SÛR",
|
| 1762 |
+
"quant.regime.mild": "✅ COMPRESSION LÉGÈRE",
|
| 1763 |
+
"quant.regime.significant": "⚠ DÉGRADATION SIGNIFICATIVE",
|
| 1764 |
+
"quant.regime.cliff": "❌ CLIFF SÉVÈRE",
|
| 1765 |
+
"quant.label.gamma_shift": "γ shift",
|
| 1766 |
+
"quant.label.delta_ppl": "ΔPPL (est.)",
|
| 1767 |
+
"quant.label.arch_mult": "Multiplicateur arch",
|
| 1768 |
+
"quant.section.breakdown": "Détail",
|
| 1769 |
+
"quant.section.reco": "Recommandation",
|
| 1770 |
+
"quant.section.compare": "Tous les schémas (triés par sécurité)",
|
| 1771 |
+
"quant.field.scheme": "Schéma",
|
| 1772 |
+
"quant.field.calibrated": "calibré",
|
| 1773 |
+
"quant.field.uncalibrated": "non calibré",
|
| 1774 |
+
"quant.field.base_penalty": "Pénalité de base",
|
| 1775 |
+
"quant.field.arch_mult_full": "Multiplicateur architectural",
|
| 1776 |
+
"quant.field.gamma_shift": "γ shift prédit",
|
| 1777 |
+
"quant.field.ppl_band": "Bande ΔPPL (est.)",
|
| 1778 |
+
"quant.field.params": "Paramètres",
|
| 1779 |
+
"quant.col.scheme": "Schéma",
|
| 1780 |
+
"quant.col.bits": "Bits",
|
| 1781 |
+
"quant.col.gamma_shift": "γ shift",
|
| 1782 |
+
"quant.col.ppl_band": "Bande ΔPPL",
|
| 1783 |
+
"quant.col.regime": "Régime",
|
| 1784 |
+
"quant.reco.switch_to_awq": "<strong>Passez à {scheme}</strong> — le 4-bit calibré gère bien mieux les petits d_head + GQA que NF4. ΔPPL attendue chute ~2-3×.",
|
| 1785 |
+
"quant.reco.switch_to_q5_km": "<strong>Passez à {scheme}</strong> — Q5 garde plus de dimensions de head intactes à faible coût (~25% plus grand seulement).",
|
| 1786 |
+
"quant.reco.switch_to_q4_km": "<strong>Passez à {scheme}</strong> — Q3/Q2 sont trop agressifs pour cette architecture.",
|
| 1787 |
+
"quant.reco.consider_awq": "<strong>Considérez {scheme}</strong> — la calibration réduit significativement le γ-shift sur cette architecture.",
|
| 1788 |
+
"quant.reco.use_higher_bits": "<strong>Utilisez une alternative à plus de bits</strong> — cette architecture n'absorbe pas le 4-bit proprement. Essayez 5 ou 8-bit.",
|
| 1789 |
+
"quant.reco.verify_with_eval": "<strong>Vérifiez avec une vraie éval</strong> — le shift prédit est borderline. Lancez NIAH à votre contexte cible avant de déployer.",
|
| 1790 |
+
"quant.reco.no_action": "Pas d'action requise — la quantification est sûre pour cette architecture.",
|
| 1791 |
+
"quant.summary.headline_all": "Tous les schémas pour <code>{modelId}</code>",
|
| 1792 |
+
"quant.status.empty_id": "⚠ Saisissez un model id (ex. meta-llama/Llama-3.2-1B).",
|
| 1793 |
+
"quant.status.fetching": "⏳ Récupération config.json pour {modelId}...",
|
| 1794 |
+
"quant.status.fetched": "✅ Config récupéré pour {modelId}. Choisissez un schéma et cliquez Prédire (ou Comparer tous).",
|
| 1795 |
+
"quant.status.no_scheme": "⚠ Choisissez un schéma de quantification dans le dropdown.",
|
| 1796 |
+
"quant.status.done": "✅ Régime prédit : {regime}",
|
| 1797 |
+
"quant.status.done_all": "✅ Comparé {n} schémas — triés par sécurité.",
|
| 1798 |
"share.import_desc": "Vous avez un fichier JSON de l'analyse TAF de quelqu'un ? Chargez-le ici pour voir le verdict + la chaîne localement. La même vue que si vous l'aviez exécuté vous-même.",
|
| 1799 |
"share.import_btn": "📂 Charger JSON partagé",
|
| 1800 |
"synthesis.system": "Vous êtes un assistant de diagnostic précis pour LLMs transformer. Étant donné des résultats de formules TAF pré-calculés, écrivez un résumé clair en français de 4-6 phrases. Citez le numéro de section (§X.Y) pour chaque nombre mentionné. Donnez toujours une recommandation concrète. N'INVENTEZ PAS de nombres.",
|
|
|
|
| 1887 |
"common.no": "Non",
|
| 1888 |
|
| 1889 |
// Tooltips des modes
|
| 1890 |
+
"modes.tip": "<strong>Douze façons d'utiliser l'outil</strong>.<br><strong>📇 Profil</strong>: collez un id → TAF Card avec 5 recettes.<br><strong>🆚 Comparer</strong>: 2-3 modèles côte à côte sur une recette.<br><strong>🔍 Inspecter config</strong>: collez config.json brut → Profil complet.<br><strong>💬 Question</strong>: question libre, le LLM du navigateur choisit la recette.<br><strong>📋 Recette</strong>: sélection manuelle avec contrôle total du formulaire.<br><strong>🩺 Diagnostic CLI</strong>: génère commande Python pour mesurer γ localement.<br><strong>📊 Diagramme de phase</strong>: panel de 23 modèles dans le plan (log θ, γ).<br><strong>🪟 Démasquer</strong>: détecte un max_position_embeddings trompeur (SWA / YaRN / RoPE-scaling).<br><strong>📜 Chat-template</strong>: détecte la famille + donne le flag CLI exact pour lm-eval / vLLM / transformers.<br><strong>🎯 Arena CI</strong>: reconstruit les intervalles de confiance depuis les votes pairwise bruts ; détecte les égalités statistiques qu'Arena cache.<br><strong>🧪 Contamination</strong>: note 20+ benchmarks pour leur probabilité de contamination selon le cutoff d'entraînement vs la date de sortie.<br><strong>⚖️ Quant</strong>: prédit γ-shift et ΔPPL pour tout (modèle × schéma de quantification) ; recommande une alternative sûre en cas de cliff.",
|
| 1891 |
"profile.tip": "<strong>Diagnostic complet en un clic</strong>. Collez n'importe quel id de modèle HF (ou choisissez préréglage). L'outil exécute les 5 recettes (contexte long, compression KV, custom vs API, budget, hardware) et produit une <strong>TAF Card</strong> unique avec verdict par dimension + nombres clés + classification architecturale.<br><br><strong>Cas d'usage</strong>: « J'évalue Qwen2.5-32B pour la production — quel est son profil complet de viabilité ? » → collez id → Profiler → fait.",
|
| 1892 |
"compare.tip": "<strong>Même recette, plusieurs modèles</strong>. Choisissez 2-3 modèles candidats et une recette. Voyez les verdicts dans un seul tableau comparatif.<br><br><strong>Cas d'usage</strong>: « J'ai besoin de récupération longue contexte à 16K — quel est le meilleur : Llama-3-8B, Mistral-7B ou Qwen-7B ? » → choisissez 3 + X-2 + 16K → voyez le gagnant.",
|
| 1893 |
|
|
|
|
| 2385 |
"help.v07.arena.body": "Chatbot Arena 在公开排行榜中删除了置信区间 — 5 Elo 的差距在统计上可能毫无意义。粘贴原始 pairwise 投票数据(model_a, model_b, winner)→ Bradley-Terry MLE + 200 次 bootstrap → 排序 Elo + 95% CI + \"统计并列\" 面板,列出 CI 重叠的配对。尝试加载样本按钮。<em>用例</em>:宣称 \"模型 A 胜过模型 B\" 之前,验证它们的 CI 不重叠。",
|
| 2386 |
"help.v07.contam.title": "🧪 污染先验",
|
| 2387 |
"help.v07.contam.body": "对 benchmark 分数是否被污染做贝叶斯式的先验估计。输入模型训练 cutoff 日期 → 工具按 P(污染) 评估 20+ 主流 benchmark(MMLU、HellaSwag、GSM8K、HumanEval、IFEval、MMLU-Pro、GPQA、AIME、MATH-500、BBH、MUSR…),基于时间差距、语料库纳入和已知泄漏历史。Open LLM Leaderboard v1 在 2024 年因 MMLU/HellaSwag 分数被污染而停用。<em>用例</em>:比较两个模型时决定相信哪些分数。",
|
| 2388 |
+
"help.v07.quant.title": "⚖️ 量化机制分类器",
|
| 2389 |
+
"help.v07.quant.body": "预测任意(模型 × 量化方案:NF4、AWQ、GPTQ、GGUF Q4_K_M / Q5_K_M / Q8_0、int8、FP8…)的 γ-shift 与 ΔPPL。架构感知:小 d_head + 激进 GQA → 更敏感;校准方案(AWQ)比未校准方案(NF4)更好地吸收偏移。检测到 cliff 时推荐更安全的替代方案。<em>用例</em>:量化之前,预测你的特定架构 × 方案组合是否能保持 PPL 可接受,否则给出具体的切换建议。",
|
| 2390 |
|
| 2391 |
// v0.7 — Inventory 模态第 5 卡
|
| 2392 |
"inv.v07.title": "🆕 v0.7 anti-bullshit 套件",
|
|
|
|
| 2394 |
"inv.v07.template": "<strong>📜 Chat-template</strong> — 精确 CLI flag,让 lm-eval 不会静默对半你的 accuracy",
|
| 2395 |
"inv.v07.arena": "<strong>🎯 Arena CI</strong> — 恢复 Chatbot Arena 隐藏的置信区间",
|
| 2396 |
"inv.v07.contam": "<strong>🧪 污染</strong> — 按污染概率对 20+ benchmark 评级",
|
| 2397 |
+
"inv.v07.quant": "<strong>⚖️ Quant</strong> — 预测任意(模型 × 量化方案)组合的 γ-shift + ΔPPL",
|
| 2398 |
+
|
| 2399 |
+
// v0.7.3 — anti-bullshit pack #5: Quant-regime classifier
|
| 2400 |
+
"modes.quant": "⚖️ Quant",
|
| 2401 |
+
"mode_desc.quant": "预测任意(模型 × 量化方案)的 γ-shift 与 ΔPPL。架构感知:小 d_head + GQA → 更敏感。检测到 cliff 时推荐更安全的替代方案。",
|
| 2402 |
+
"quant.title": "⚖️ 量化机制分类器",
|
| 2403 |
+
"quant.tip": "预测给定(模型 × ���化方案)的 γ-shift(及由此产生的 ΔPPL)。\"AWQ 保留 ~95%\" 这类通用说法太模糊 — TAF 利用 d_head、GQA 比、SWA 标志和模型大小给出特定于架构的判定。解决:HF 社区普遍报告不可预测的量化 cliff(NF4 在 Phi-3 上 -2 PPL,但在 Llama-3-8B 上没问题)。",
|
| 2404 |
+
"quant.desc": "<strong>量化会破坏你的模型吗?</strong>粘贴 HF 模型 id,选择量化方案 — 获取预测的 γ-shift、预期 ΔPPL 区间,以及在 cliff 情况下的推荐替代方案。仅浏览器,无 GPU,无需校准集。",
|
| 2405 |
+
"quant.id_label": "HF 模型 id:",
|
| 2406 |
+
"quant.fetch_btn": "📥 获取 config",
|
| 2407 |
+
"quant.scheme_label": "量化方案:",
|
| 2408 |
+
"quant.run_btn": "⚖️ 预测",
|
| 2409 |
+
"quant.all_btn": "📊 比较所有方案",
|
| 2410 |
+
"quant.regime.safe": "✅ 安全",
|
| 2411 |
+
"quant.regime.mild": "✅ 轻度压缩",
|
| 2412 |
+
"quant.regime.significant": "⚠ 显著退化",
|
| 2413 |
+
"quant.regime.cliff": "❌ 重大 CLIFF",
|
| 2414 |
+
"quant.label.gamma_shift": "γ 偏移",
|
| 2415 |
+
"quant.label.delta_ppl": "ΔPPL(估)",
|
| 2416 |
+
"quant.label.arch_mult": "架构乘数",
|
| 2417 |
+
"quant.section.breakdown": "细节分解",
|
| 2418 |
+
"quant.section.reco": "建议",
|
| 2419 |
+
"quant.section.compare": "所有方案(按安全性排序)",
|
| 2420 |
+
"quant.field.scheme": "方案",
|
| 2421 |
+
"quant.field.calibrated": "已校准",
|
| 2422 |
+
"quant.field.uncalibrated": "未校准",
|
| 2423 |
+
"quant.field.base_penalty": "基础惩罚",
|
| 2424 |
+
"quant.field.arch_mult_full": "架构乘数",
|
| 2425 |
+
"quant.field.gamma_shift": "预测 γ 偏移",
|
| 2426 |
+
"quant.field.ppl_band": "ΔPPL 区间(估)",
|
| 2427 |
+
"quant.field.params": "参数量",
|
| 2428 |
+
"quant.col.scheme": "方案",
|
| 2429 |
+
"quant.col.bits": "比特",
|
| 2430 |
+
"quant.col.gamma_shift": "γ 偏移",
|
| 2431 |
+
"quant.col.ppl_band": "ΔPPL 区间",
|
| 2432 |
+
"quant.col.regime": "机制",
|
| 2433 |
+
"quant.reco.switch_to_awq": "<strong>切换到 {scheme}</strong> — 校准的 4-bit 处理小 d_head + GQA 比 NF4 好得多。预期 ΔPPL 下降 ~2-3 倍。",
|
| 2434 |
+
"quant.reco.switch_to_q5_km": "<strong>切换到 {scheme}</strong> — Q5 以低成本保留更多 head 维度(仅大约 25% 文件更大)。",
|
| 2435 |
+
"quant.reco.switch_to_q4_km": "<strong>切换到 {scheme}</strong> — Q3/Q2 对此架构过于激进。",
|
| 2436 |
+
"quant.reco.consider_awq": "<strong>考虑 {scheme}</strong> — 在此架构上校准能显著降低 γ-shift。",
|
| 2437 |
+
"quant.reco.use_higher_bits": "<strong>使用更高比特的替代</strong> — 此架构无法干净吸收 4-bit。尝试 5 或 8-bit。",
|
| 2438 |
+
"quant.reco.verify_with_eval": "<strong>用真实 eval 验证</strong> — 预测偏移在边缘。部署前在目标上下文运行 NIAH。",
|
| 2439 |
+
"quant.reco.no_action": "无需操作 — 此架构下量化是安全的。",
|
| 2440 |
+
"quant.summary.headline_all": "<code>{modelId}</code> 的所有方案",
|
| 2441 |
+
"quant.status.empty_id": "⚠ 输入 model id(例如 meta-llama/Llama-3.2-1B)。",
|
| 2442 |
+
"quant.status.fetching": "⏳ 正在获取 {modelId} 的 config.json...",
|
| 2443 |
+
"quant.status.fetched": "✅ 已获取 {modelId} 的 config。选择方案并点击预测(或比较所有)。",
|
| 2444 |
+
"quant.status.no_scheme": "⚠ 从下拉中选择一个量化方案。",
|
| 2445 |
+
"quant.status.done": "✅ 预测机制:{regime}",
|
| 2446 |
+
"quant.status.done_all": "✅ 已比较 {n} 个方案 — 按安全性排序。",
|
| 2447 |
"share.import_desc": "有他人 TAF 分析的 JSON 文件? 在这里加载以本地查看判定 + 链。与您自己运行的视图相同。",
|
| 2448 |
"share.import_btn": "📂 加载共享的 JSON",
|
| 2449 |
"synthesis.system": "您是 transformer LLM 的精确诊断助手。给定预先计算的 TAF 公式结果,用 4-6 句中文写出清晰的摘要。为每个提到的数字引用章节号 (§X.Y)。始终给出具体建议。不要编造数字。",
|
|
|
|
| 2536 |
"common.no": "否",
|
| 2537 |
|
| 2538 |
// 模式提示
|
| 2539 |
+
"modes.tip": "<strong>十二种使用方式</strong>。<br><strong>📇 画像</strong>: 粘贴模型 id → 5 个配方的 TAF 卡。<br><strong>🆚 比较</strong>: 2-3 个模型在一个配方上并排比较。<br><strong>🔍 检查 config</strong>: 粘贴原始 config.json → 完整画像。<br><strong>💬 提问</strong>: 自由形式问题,浏览器 LLM 选择配方。<br><strong>📋 配方</strong>: 手动选择,完全控制表单。<br><strong>🩺 CLI 诊断</strong>: 生成 Python 命令在本地测量 γ。<br><strong>📊 相图</strong>: 23 个面板模型在 (log θ, γ) 平面上。<br><strong>🪟 揭示</strong>: 检测误导的 max_position_embeddings(SWA / YaRN / RoPE 缩放)。<br><strong>📜 Chat-template</strong>: 检测系列 + 给出 lm-eval / vLLM / transformers 的精确 CLI flag。<br><strong>🎯 Arena CI</strong>: 从原始 pairwise 投票数据重建置信区间;检测 Arena 隐藏的统计并列。<br><strong>🧪 污染</strong>: 根据训练 cutoff 与发布日期,对 20+ benchmark 进行污染概率评估。<br><strong>⚖️ Quant</strong>: 预测任意(模型 × 量化方案)的 γ-shift 与 ΔPPL;cliff 时推荐更安全替代方案。",
|
| 2540 |
"profile.tip": "<strong>一键完整诊断</strong>。粘贴任意 HF 模型 id (或选择预设)。工具运行所有 5 个配方 (长上下文、KV 压缩、自定义 vs API、预算、硬件),生成单个 <strong>TAF 卡</strong>,显示每个维度的判定 + 关键数字 + 架构分类。<br><br><strong>用例</strong>: \"我正在为生产评估 Qwen2.5-32B — 它的完整可行性概况是什么?\" → 粘贴 id → 画像 → 完成。",
|
| 2541 |
"compare.tip": "<strong>同一配方,多个模型</strong>。选择 2-3 个候选模型和一个配方。在单个比较表中查看判定。<br><br><strong>用例</strong>: \"我需要在 16K 进行长上下文检索 — 哪个最好: Llama-3-8B、Mistral-7B 或 Qwen-7B?\" → 选择 3 个 + X-2 + 16K → 看赢家。",
|
| 2542 |
|
|
@@ -15,6 +15,7 @@ import { unmaskConfig } from "./swa_unmasker.js";
|
|
| 15 |
import { sniffChatTemplate } from "./chat_template_sniffer.js";
|
| 16 |
import { parseVotesCSV, computeArenaCI, SAMPLE_VOTES_CSV } from "./arena_ci.js";
|
| 17 |
import { rateAllBenchmarks, BENCHMARK_DB } from "./contamination_prior.js";
|
|
|
|
| 18 |
|
| 19 |
const TAF_BROWSER_URL = "python/taf_browser.py";
|
| 20 |
const ENABLE_WEBLLM = true;
|
|
@@ -190,7 +191,8 @@ document.querySelectorAll(".mode-btn").forEach(btn => {
|
|
| 190 |
["ask-section", "recipe-section", "form-section",
|
| 191 |
"profile-section", "compare-section", "inspector-section",
|
| 192 |
"diagnose-section", "phase-section", "unmask-section",
|
| 193 |
-
"template-section", "arena-section", "contam-section"
|
|
|
|
| 194 |
const el = $(id);
|
| 195 |
if (el) el.style.display = "none";
|
| 196 |
});
|
|
@@ -200,6 +202,7 @@ document.querySelectorAll(".mode-btn").forEach(btn => {
|
|
| 200 |
compare: "compare-section", inspector: "inspector-section",
|
| 201 |
diagnose: "diagnose-section", phase: "phase-section", unmask: "unmask-section",
|
| 202 |
template: "template-section", arena: "arena-section", contam: "contam-section",
|
|
|
|
| 203 |
};
|
| 204 |
const sectionId = sectionMap[mode];
|
| 205 |
if (sectionId) $(sectionId).style.display = "";
|
|
@@ -980,6 +983,178 @@ $("contam-cutoff")?.addEventListener("keydown", (e) => {
|
|
| 980 |
if (e.key === "Enter") { e.preventDefault(); runContamCompute(); }
|
| 981 |
});
|
| 982 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 983 |
function configToPreset(cfg, modelId) {
|
| 984 |
const n_attn = cfg.num_attention_heads || cfg.n_head || 0;
|
| 985 |
const n_kv = cfg.num_key_value_heads || cfg.num_attention_heads || cfg.n_head || 0;
|
|
|
|
| 15 |
import { sniffChatTemplate } from "./chat_template_sniffer.js";
|
| 16 |
import { parseVotesCSV, computeArenaCI, SAMPLE_VOTES_CSV } from "./arena_ci.js";
|
| 17 |
import { rateAllBenchmarks, BENCHMARK_DB } from "./contamination_prior.js";
|
| 18 |
+
import { predictQuantShift, predictAllSchemes, QUANT_SCHEMES } from "./quant_regime.js";
|
| 19 |
|
| 20 |
const TAF_BROWSER_URL = "python/taf_browser.py";
|
| 21 |
const ENABLE_WEBLLM = true;
|
|
|
|
| 191 |
["ask-section", "recipe-section", "form-section",
|
| 192 |
"profile-section", "compare-section", "inspector-section",
|
| 193 |
"diagnose-section", "phase-section", "unmask-section",
|
| 194 |
+
"template-section", "arena-section", "contam-section",
|
| 195 |
+
"quant-section"].forEach(id => {
|
| 196 |
const el = $(id);
|
| 197 |
if (el) el.style.display = "none";
|
| 198 |
});
|
|
|
|
| 202 |
compare: "compare-section", inspector: "inspector-section",
|
| 203 |
diagnose: "diagnose-section", phase: "phase-section", unmask: "unmask-section",
|
| 204 |
template: "template-section", arena: "arena-section", contam: "contam-section",
|
| 205 |
+
quant: "quant-section",
|
| 206 |
};
|
| 207 |
const sectionId = sectionMap[mode];
|
| 208 |
if (sectionId) $(sectionId).style.display = "";
|
|
|
|
| 983 |
if (e.key === "Enter") { e.preventDefault(); runContamCompute(); }
|
| 984 |
});
|
| 985 |
|
| 986 |
+
// ════════════════════════════════════════════════════════════════════
|
| 987 |
+
// ⚖️ Quant-regime classifier (v0.7.3 anti-bullshit pack #5)
|
| 988 |
+
// ════════════════════════════════════════════════════════════════════
|
| 989 |
+
|
| 990 |
+
const QUANT_REGIME_COLOR = {
|
| 991 |
+
safe: "#3fb950",
|
| 992 |
+
mild: "#3fb950",
|
| 993 |
+
significant: "#f1c40f",
|
| 994 |
+
cliff: "#f85149",
|
| 995 |
+
};
|
| 996 |
+
|
| 997 |
+
// Populate scheme dropdown from QUANT_SCHEMES on first render. Idempotent.
|
| 998 |
+
function populateQuantSchemes() {
|
| 999 |
+
const sel = $("quant-scheme");
|
| 1000 |
+
if (!sel || sel.options.length > 1) return;
|
| 1001 |
+
for (const s of QUANT_SCHEMES) {
|
| 1002 |
+
const opt = document.createElement("option");
|
| 1003 |
+
opt.value = s.id;
|
| 1004 |
+
opt.textContent = s.label;
|
| 1005 |
+
sel.appendChild(opt);
|
| 1006 |
+
}
|
| 1007 |
+
}
|
| 1008 |
+
|
| 1009 |
+
// Cache config across "Fetch" + "Predict" / "Compare" actions on the same id.
|
| 1010 |
+
let __quantLastConfig = null;
|
| 1011 |
+
let __quantLastModelId = null;
|
| 1012 |
+
|
| 1013 |
+
async function quantFetchConfig() {
|
| 1014 |
+
const modelId = ($("quant-id").value || "").trim();
|
| 1015 |
+
if (!modelId) {
|
| 1016 |
+
$("quant-status").textContent = t("quant.status.empty_id") || "⚠ Enter a model id.";
|
| 1017 |
+
return null;
|
| 1018 |
+
}
|
| 1019 |
+
$("quant-status").textContent = tFmt("quant.status.fetching", { modelId });
|
| 1020 |
+
$("quant-fetch-btn").disabled = true;
|
| 1021 |
+
try {
|
| 1022 |
+
const cfg = await fetchHfConfig(modelId);
|
| 1023 |
+
__quantLastConfig = cfg;
|
| 1024 |
+
__quantLastModelId = modelId;
|
| 1025 |
+
$("quant-status").textContent = tFmt("quant.status.fetched", { modelId });
|
| 1026 |
+
return cfg;
|
| 1027 |
+
} catch (err) {
|
| 1028 |
+
$("quant-status").textContent = `❌ ${err.message}`;
|
| 1029 |
+
return null;
|
| 1030 |
+
} finally {
|
| 1031 |
+
$("quant-fetch-btn").disabled = false;
|
| 1032 |
+
}
|
| 1033 |
+
}
|
| 1034 |
+
|
| 1035 |
+
function renderQuantSingle(result, modelId) {
|
| 1036 |
+
const escapeHtml = (s) => String(s).replace(/[&<>"']/g, c =>
|
| 1037 |
+
({"&":"&","<":"<",">":">",'"':""","'":"'"}[c]));
|
| 1038 |
+
const fmtN = (x) => x === null || x === undefined ? "—" : Number(x).toLocaleString();
|
| 1039 |
+
const color = QUANT_REGIME_COLOR[result.regime] || "#8b949e";
|
| 1040 |
+
const regimeLabel = t(`quant.regime.${result.regime}`) || result.regime;
|
| 1041 |
+
|
| 1042 |
+
let recoHtml = "";
|
| 1043 |
+
if (result.recommend_code) {
|
| 1044 |
+
const recoText = result.recommend_scheme
|
| 1045 |
+
? tFmt("quant.reco." + result.recommend_code, {
|
| 1046 |
+
scheme: QUANT_SCHEMES.find(s => s.id === result.recommend_scheme)?.label || result.recommend_scheme,
|
| 1047 |
+
})
|
| 1048 |
+
: (t("quant.reco." + result.recommend_code) || result.recommend_code);
|
| 1049 |
+
recoHtml = `<p class="unmask-reco">${recoText}</p>`;
|
| 1050 |
+
} else {
|
| 1051 |
+
recoHtml = `<p class="unmask-reco">${t("quant.reco.no_action") || "No action needed — quantization is safe for this architecture."}</p>`;
|
| 1052 |
+
}
|
| 1053 |
+
|
| 1054 |
+
return `
|
| 1055 |
+
<div class="unmask-result">
|
| 1056 |
+
<div class="unmask-hero" style="border-color: ${color};">
|
| 1057 |
+
<div class="unmask-verdict" style="color: ${color};">${regimeLabel}</div>
|
| 1058 |
+
<div class="unmask-model"><code>${escapeHtml(modelId)}</code> + <code>${escapeHtml(result.scheme_label)}</code></div>
|
| 1059 |
+
<div class="unmask-numbers">
|
| 1060 |
+
<div><span class="unmask-num-label">${t("quant.label.gamma_shift") || "γ shift"}</span><span class="unmask-num-val">+${result.gamma_shift.toFixed(3)}</span></div>
|
| 1061 |
+
<div><span class="unmask-num-label">${t("quant.label.delta_ppl") || "ΔPPL (est.)"}</span><span class="unmask-num-val">+${result.delta_ppl.mid.toFixed(2)}</span></div>
|
| 1062 |
+
<div><span class="unmask-num-label">${t("quant.label.arch_mult") || "Arch multiplier"}</span><span class="unmask-num-val">×${result.arch_multiplier}</span></div>
|
| 1063 |
+
</div>
|
| 1064 |
+
</div>
|
| 1065 |
+
<div class="unmask-details">
|
| 1066 |
+
<details class="unmask-panel" open>
|
| 1067 |
+
<summary class="unmask-panel-title">${t("quant.section.breakdown") || "Breakdown"}</summary>
|
| 1068 |
+
<ul>
|
| 1069 |
+
<li><strong>${t("quant.field.scheme") || "Scheme"}:</strong> ${escapeHtml(result.scheme_label)} (${result.scheme_bits}-bit, ${result.scheme_calibrated ? (t("quant.field.calibrated") || "calibrated") : (t("quant.field.uncalibrated") || "uncalibrated")})</li>
|
| 1070 |
+
<li><strong>${t("quant.field.base_penalty") || "Base penalty"}:</strong> ${result.base_penalty.toFixed(3)}</li>
|
| 1071 |
+
<li><strong>${t("quant.field.arch_mult_full") || "Architecture multiplier"}:</strong> ×${result.arch_multiplier} (d_head, GQA, SWA, params)</li>
|
| 1072 |
+
<li><strong>${t("quant.field.gamma_shift") || "Predicted γ shift"}:</strong> +${result.gamma_shift.toFixed(3)}</li>
|
| 1073 |
+
<li><strong>${t("quant.field.ppl_band") || "ΔPPL band (est.)"}:</strong> ${result.delta_ppl.low.toFixed(2)} – ${result.delta_ppl.high.toFixed(2)}</li>
|
| 1074 |
+
<li><strong>${t("quant.field.params") || "Parameters"}:</strong> ${fmtN(result.n_params)}</li>
|
| 1075 |
+
</ul>
|
| 1076 |
+
</details>
|
| 1077 |
+
<details class="unmask-panel" open>
|
| 1078 |
+
<summary class="unmask-panel-title">${t("quant.section.reco") || "Recommendation"}</summary>
|
| 1079 |
+
${recoHtml}
|
| 1080 |
+
</details>
|
| 1081 |
+
</div>
|
| 1082 |
+
</div>
|
| 1083 |
+
`;
|
| 1084 |
+
}
|
| 1085 |
+
|
| 1086 |
+
function renderQuantAll(rows, modelId) {
|
| 1087 |
+
const escapeHtml = (s) => String(s).replace(/[&<>"']/g, c =>
|
| 1088 |
+
({"&":"&","<":"<",">":">",'"':""","'":"'"}[c]));
|
| 1089 |
+
let body = "";
|
| 1090 |
+
for (const r of rows) {
|
| 1091 |
+
const color = QUANT_REGIME_COLOR[r.regime] || "#8b949e";
|
| 1092 |
+
const regimeLabel = t(`quant.regime.${r.regime}`) || r.regime;
|
| 1093 |
+
body += `<tr>
|
| 1094 |
+
<td><strong>${escapeHtml(r.scheme_label)}</strong></td>
|
| 1095 |
+
<td class="arena-spread">${r.scheme_bits}-bit ${r.scheme_calibrated ? "✓" : ""}</td>
|
| 1096 |
+
<td class="arena-elo">+${r.gamma_shift.toFixed(3)}</td>
|
| 1097 |
+
<td class="arena-spread">${r.delta_ppl.low.toFixed(2)}–${r.delta_ppl.high.toFixed(2)}</td>
|
| 1098 |
+
<td style="color: ${color};"><strong>${regimeLabel}</strong></td>
|
| 1099 |
+
</tr>`;
|
| 1100 |
+
}
|
| 1101 |
+
return `
|
| 1102 |
+
<div class="arena-result">
|
| 1103 |
+
<div class="unmask-hero" style="border-color: #58a6ff;">
|
| 1104 |
+
<div class="unmask-verdict" style="color: #58a6ff;">${tFmt("quant.summary.headline_all", { modelId })}</div>
|
| 1105 |
+
</div>
|
| 1106 |
+
<div class="unmask-details">
|
| 1107 |
+
<details class="unmask-panel" open>
|
| 1108 |
+
<summary class="unmask-panel-title">${t("quant.section.compare") || "All schemes (sorted by safety)"}</summary>
|
| 1109 |
+
<table class="arena-table">
|
| 1110 |
+
<thead><tr>
|
| 1111 |
+
<th>${t("quant.col.scheme") || "Scheme"}</th>
|
| 1112 |
+
<th>${t("quant.col.bits") || "Bits"}</th>
|
| 1113 |
+
<th>${t("quant.col.gamma_shift") || "γ shift"}</th>
|
| 1114 |
+
<th>${t("quant.col.ppl_band") || "ΔPPL band"}</th>
|
| 1115 |
+
<th>${t("quant.col.regime") || "Regime"}</th>
|
| 1116 |
+
</tr></thead>
|
| 1117 |
+
<tbody>${body}</tbody>
|
| 1118 |
+
</table>
|
| 1119 |
+
</details>
|
| 1120 |
+
</div>
|
| 1121 |
+
</div>
|
| 1122 |
+
`;
|
| 1123 |
+
}
|
| 1124 |
+
|
| 1125 |
+
async function runQuantPredict() {
|
| 1126 |
+
const cfg = __quantLastConfig || await quantFetchConfig();
|
| 1127 |
+
if (!cfg) return;
|
| 1128 |
+
const schemeId = $("quant-scheme").value;
|
| 1129 |
+
if (!schemeId) {
|
| 1130 |
+
$("quant-status").textContent = t("quant.status.no_scheme") || "⚠ Pick a quant scheme.";
|
| 1131 |
+
return;
|
| 1132 |
+
}
|
| 1133 |
+
const result = predictQuantShift(cfg, schemeId);
|
| 1134 |
+
if (!result) {
|
| 1135 |
+
$("quant-status").textContent = "❌ Unknown scheme.";
|
| 1136 |
+
return;
|
| 1137 |
+
}
|
| 1138 |
+
$("quant-output").innerHTML = renderQuantSingle(result, __quantLastModelId);
|
| 1139 |
+
$("quant-status").textContent = tFmt("quant.status.done", { regime: t(`quant.regime.${result.regime}`) || result.regime });
|
| 1140 |
+
}
|
| 1141 |
+
|
| 1142 |
+
async function runQuantAll() {
|
| 1143 |
+
const cfg = __quantLastConfig || await quantFetchConfig();
|
| 1144 |
+
if (!cfg) return;
|
| 1145 |
+
const rows = predictAllSchemes(cfg);
|
| 1146 |
+
$("quant-output").innerHTML = renderQuantAll(rows, __quantLastModelId);
|
| 1147 |
+
$("quant-status").textContent = tFmt("quant.status.done_all", { n: rows.length });
|
| 1148 |
+
}
|
| 1149 |
+
|
| 1150 |
+
populateQuantSchemes();
|
| 1151 |
+
$("quant-fetch-btn")?.addEventListener("click", quantFetchConfig);
|
| 1152 |
+
$("quant-run-btn")?.addEventListener("click", runQuantPredict);
|
| 1153 |
+
$("quant-all-btn")?.addEventListener("click", runQuantAll);
|
| 1154 |
+
$("quant-id")?.addEventListener("keydown", (e) => {
|
| 1155 |
+
if (e.key === "Enter") { e.preventDefault(); quantFetchConfig(); }
|
| 1156 |
+
});
|
| 1157 |
+
|
| 1158 |
function configToPreset(cfg, modelId) {
|
| 1159 |
const n_attn = cfg.num_attention_heads || cfg.n_head || 0;
|
| 1160 |
const n_kv = cfg.num_key_value_heads || cfg.num_attention_heads || cfg.n_head || 0;
|
|
@@ -0,0 +1,147 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
// Quant-regime classifier (v0.7.3 anti-bullshit pack #5)
|
| 2 |
+
// Predicts γ shift under quantization given (architecture × quant scheme).
|
| 3 |
+
// Pure logic — no human strings. Solves: HF community widely complains that
|
| 4 |
+
// quantization "cliffs" are unpredictable per model. Generic "AWQ ~95% retention"
|
| 5 |
+
// claims are too vague — TAF gives architecture-specific verdict.
|
| 6 |
+
//
|
| 7 |
+
// Calibration sources: Maarten Grootendorst's quant comparison newsletter,
|
| 8 |
+
// llama.cpp PPL benchmarks, GPTQ/AWQ papers.
|
| 9 |
+
|
| 10 |
+
export const QUANT_SCHEMES = [
|
| 11 |
+
{ id: "fp8", label: "FP8 (Hopper)", bits: 8, base_penalty: 0.007, calibrated: false, hardware: "h100+" },
|
| 12 |
+
{ id: "int8", label: "int8 (LLM.int8())", bits: 8, base_penalty: 0.010, calibrated: false, hardware: "any" },
|
| 13 |
+
{ id: "gguf_q8_0", label: "GGUF Q8_0", bits: 8, base_penalty: 0.008, calibrated: false, hardware: "cpu/any" },
|
| 14 |
+
{ id: "gguf_q5_km", label: "GGUF Q5_K_M", bits: 5, base_penalty: 0.020, calibrated: false, hardware: "cpu/any" },
|
| 15 |
+
{ id: "awq", label: "AWQ (4-bit, calibrated)", bits: 4, base_penalty: 0.020, calibrated: true, hardware: "any" },
|
| 16 |
+
{ id: "gptq", label: "GPTQ (4-bit, calibrated)", bits: 4, base_penalty: 0.035, calibrated: true, hardware: "any" },
|
| 17 |
+
{ id: "gguf_q4_km", label: "GGUF Q4_K_M", bits: 4, base_penalty: 0.050, calibrated: false, hardware: "cpu/any" },
|
| 18 |
+
{ id: "nf4", label: "NF4 (bitsandbytes, uncalibrated)", bits: 4, base_penalty: 0.070, calibrated: false, hardware: "any" },
|
| 19 |
+
{ id: "gguf_q3_km", label: "GGUF Q3_K_M (aggressive)", bits: 3, base_penalty: 0.110, calibrated: false, hardware: "cpu/any" },
|
| 20 |
+
{ id: "gguf_q2_k", label: "GGUF Q2_K (extreme)", bits: 2, base_penalty: 0.180, calibrated: false, hardware: "cpu/any" },
|
| 21 |
+
];
|
| 22 |
+
|
| 23 |
+
const REGIME_BANDS = [
|
| 24 |
+
{ id: "safe", max_gamma_shift: 0.015, label_code: "safe" },
|
| 25 |
+
{ id: "mild", max_gamma_shift: 0.04, label_code: "mild" },
|
| 26 |
+
{ id: "significant", max_gamma_shift: 0.08, label_code: "significant" },
|
| 27 |
+
{ id: "cliff", max_gamma_shift: 1.0, label_code: "cliff" },
|
| 28 |
+
];
|
| 29 |
+
|
| 30 |
+
function bandFor(gammaShift) {
|
| 31 |
+
for (const b of REGIME_BANDS) if (gammaShift <= b.max_gamma_shift) return b.id;
|
| 32 |
+
return "cliff";
|
| 33 |
+
}
|
| 34 |
+
|
| 35 |
+
// Architecture-specific multiplier on the base quant penalty.
|
| 36 |
+
// More sensitive: small d_head, aggressive GQA ratio, very small models (pre-IH).
|
| 37 |
+
// Less sensitive: large d_head, post-IH, MHA (no GQA pressure).
|
| 38 |
+
function archMultiplier(config) {
|
| 39 |
+
let mult = 1.0;
|
| 40 |
+
const n_attn = config.num_attention_heads ?? null;
|
| 41 |
+
const n_kv = config.num_key_value_heads ?? n_attn;
|
| 42 |
+
const hidden = config.hidden_size ?? null;
|
| 43 |
+
const d_head = config.head_dim ?? (n_attn && hidden ? hidden / n_attn : null);
|
| 44 |
+
const n_params = inferNParams(config);
|
| 45 |
+
const hasSWA = typeof config.sliding_window === "number" && config.sliding_window > 0;
|
| 46 |
+
const hasGQA = n_attn && n_kv && n_kv < n_attn;
|
| 47 |
+
const gqaRatio = hasGQA ? n_attn / n_kv : 1;
|
| 48 |
+
|
| 49 |
+
// d_head sensitivity (small head = more compression damage)
|
| 50 |
+
if (d_head !== null) {
|
| 51 |
+
if (d_head < 64) mult *= 1.5;
|
| 52 |
+
else if (d_head < 96) mult *= 1.2;
|
| 53 |
+
else if (d_head < 128) mult *= 1.05;
|
| 54 |
+
// d_head >= 128: no penalty
|
| 55 |
+
}
|
| 56 |
+
// GQA pressure (heavily-shared kv heads = more interference under quant)
|
| 57 |
+
if (gqaRatio >= 8) mult *= 1.3;
|
| 58 |
+
else if (gqaRatio >= 4) mult *= 1.15;
|
| 59 |
+
// SWA: localized attention is somewhat more robust to head-level noise
|
| 60 |
+
if (hasSWA) mult *= 0.92;
|
| 61 |
+
// Post-IH (large) models more robust; pre-IH (small) less robust
|
| 62 |
+
if (n_params !== null) {
|
| 63 |
+
if (n_params < 1.5e9) mult *= 1.4; // <1.5B = pre-IH
|
| 64 |
+
else if (n_params < 4e9) mult *= 1.15; // borderline
|
| 65 |
+
else if (n_params >= 30e9) mult *= 0.85; // very large = robust
|
| 66 |
+
else if (n_params >= 7e9) mult *= 0.95;
|
| 67 |
+
}
|
| 68 |
+
return mult;
|
| 69 |
+
}
|
| 70 |
+
|
| 71 |
+
function inferNParams(config) {
|
| 72 |
+
if (typeof config.num_parameters === "number") return config.num_parameters;
|
| 73 |
+
if (typeof config.n_params === "number") return config.n_params;
|
| 74 |
+
// Estimate from h × layers × ~12h (transformer rule-of-thumb)
|
| 75 |
+
const h = config.hidden_size ?? null;
|
| 76 |
+
const L = config.num_hidden_layers ?? null;
|
| 77 |
+
const v = config.vocab_size ?? null;
|
| 78 |
+
if (h && L) {
|
| 79 |
+
const transformer = 12 * L * h * h;
|
| 80 |
+
const embed = v ? v * h : 0;
|
| 81 |
+
return transformer + 2 * embed;
|
| 82 |
+
}
|
| 83 |
+
return null;
|
| 84 |
+
}
|
| 85 |
+
|
| 86 |
+
// Predict ΔPPL band from γ shift, scaled by model size.
|
| 87 |
+
// Empirical fit (rough): ΔPPL ≈ 8 × γ_shift² × (1 + log10(N)/4).
|
| 88 |
+
// Returns {low, mid, high} estimate as a band (50% uncertainty).
|
| 89 |
+
function predictDeltaPPL(gammaShift, nParams) {
|
| 90 |
+
if (gammaShift <= 0) return { low: 0, mid: 0, high: 0 };
|
| 91 |
+
const sizeBoost = nParams ? 1 + Math.log10(nParams / 1e9) / 4 : 1;
|
| 92 |
+
const mid = 8 * gammaShift * gammaShift * sizeBoost;
|
| 93 |
+
return {
|
| 94 |
+
low: Math.max(0, Math.round((mid * 0.6) * 100) / 100),
|
| 95 |
+
mid: Math.round(mid * 100) / 100,
|
| 96 |
+
high: Math.round((mid * 1.5) * 100) / 100,
|
| 97 |
+
};
|
| 98 |
+
}
|
| 99 |
+
|
| 100 |
+
export function predictQuantShift(config, schemeId) {
|
| 101 |
+
const scheme = QUANT_SCHEMES.find(s => s.id === schemeId);
|
| 102 |
+
if (!scheme) return null;
|
| 103 |
+
|
| 104 |
+
const mult = archMultiplier(config);
|
| 105 |
+
const gammaShift = scheme.base_penalty * mult;
|
| 106 |
+
const regime = bandFor(gammaShift);
|
| 107 |
+
const nParams = inferNParams(config);
|
| 108 |
+
const deltaPPL = predictDeltaPPL(gammaShift, nParams);
|
| 109 |
+
|
| 110 |
+
// Recommendation logic (which scheme to switch to if regime is bad).
|
| 111 |
+
let recommendCode = null;
|
| 112 |
+
let recommendScheme = null;
|
| 113 |
+
if (regime === "cliff") {
|
| 114 |
+
// Suggest stepping up to next-better: q4_km → q5_km, nf4 → awq, q3 → q4, q2 → q4
|
| 115 |
+
if (scheme.id === "nf4") { recommendCode = "switch_to_awq"; recommendScheme = "awq"; }
|
| 116 |
+
else if (scheme.id === "gguf_q4_km") { recommendCode = "switch_to_q5_km"; recommendScheme = "gguf_q5_km"; }
|
| 117 |
+
else if (scheme.id === "gguf_q3_km") { recommendCode = "switch_to_q4_km"; recommendScheme = "gguf_q4_km"; }
|
| 118 |
+
else if (scheme.id === "gguf_q2_k") { recommendCode = "switch_to_q4_km"; recommendScheme = "gguf_q4_km"; }
|
| 119 |
+
else if (scheme.id === "gptq") { recommendCode = "switch_to_awq"; recommendScheme = "awq"; }
|
| 120 |
+
else recommendCode = "use_higher_bits";
|
| 121 |
+
} else if (regime === "significant") {
|
| 122 |
+
if (scheme.id === "nf4") { recommendCode = "consider_awq"; recommendScheme = "awq"; }
|
| 123 |
+
else recommendCode = "verify_with_eval";
|
| 124 |
+
}
|
| 125 |
+
|
| 126 |
+
return {
|
| 127 |
+
scheme: scheme.id,
|
| 128 |
+
scheme_label: scheme.label,
|
| 129 |
+
scheme_bits: scheme.bits,
|
| 130 |
+
scheme_calibrated: scheme.calibrated,
|
| 131 |
+
arch_multiplier: Math.round(mult * 100) / 100,
|
| 132 |
+
base_penalty: scheme.base_penalty,
|
| 133 |
+
gamma_shift: Math.round(gammaShift * 1000) / 1000,
|
| 134 |
+
regime,
|
| 135 |
+
delta_ppl: deltaPPL,
|
| 136 |
+
n_params: nParams,
|
| 137 |
+
recommend_code: recommendCode,
|
| 138 |
+
recommend_scheme: recommendScheme,
|
| 139 |
+
};
|
| 140 |
+
}
|
| 141 |
+
|
| 142 |
+
// Batch: predict all schemes for one config. Useful for "show me the trade-offs".
|
| 143 |
+
export function predictAllSchemes(config) {
|
| 144 |
+
return QUANT_SCHEMES.map(s => predictQuantShift(config, s.id))
|
| 145 |
+
.filter(Boolean)
|
| 146 |
+
.sort((a, b) => a.gamma_shift - b.gamma_shift);
|
| 147 |
+
}
|