Spaces:
Running
Running
Commit ·
6378efa
1
Parent(s): 7700f2f
v0.7.7-fix: ⓘ tooltips on each task tile, 4 langs (EN/ES/FR/ZH)
Browse filesEach of the 5 task tiles now has an info icon (ⓘ) next to the title that opens a detailed tooltip listing the modes inside, what each one does, and concrete example use cases. Matches the existing tooltip pattern (modes.tip, etc.).
5 new keys × 4 langs (698 total, 0 missing / 0 extra parity).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- index.html +20 -5
- js/i18n.js +20 -0
index.html
CHANGED
|
@@ -355,7 +355,10 @@
|
|
| 355 |
<p class="recipe-desc" data-i18n="tiles.subtitle">Pick a task. Each one opens the right tool below. Or scroll down for the full list of 14 modes.</p>
|
| 356 |
<div class="tiles-grid">
|
| 357 |
<div class="task-tile">
|
| 358 |
-
<h3
|
|
|
|
|
|
|
|
|
|
| 359 |
<p class="tile-desc" data-i18n="tile.diagnose.desc">Will this specific model work for my use case?</p>
|
| 360 |
<div class="tile-modes">
|
| 361 |
<button data-mode-link="profile" data-i18n="modes.profile">📇 Profile a model</button>
|
|
@@ -366,7 +369,10 @@
|
|
| 366 |
</div>
|
| 367 |
</div>
|
| 368 |
<div class="task-tile">
|
| 369 |
-
<h3
|
|
|
|
|
|
|
|
|
|
| 370 |
<p class="tile-desc" data-i18n="tile.trust.desc">Should I believe this number? Bug or noise?</p>
|
| 371 |
<div class="tile-modes">
|
| 372 |
<button data-mode-link="contam" data-i18n="modes.contam">🧪 Contamination</button>
|
|
@@ -375,7 +381,10 @@
|
|
| 375 |
</div>
|
| 376 |
</div>
|
| 377 |
<div class="task-tile">
|
| 378 |
-
<h3
|
|
|
|
|
|
|
|
|
|
| 379 |
<p class="tile-desc" data-i18n="tile.eval.desc">Get the exact CLI flag for lm-eval / vLLM / transformers.</p>
|
| 380 |
<div class="tile-modes">
|
| 381 |
<button data-mode-link="template" data-i18n="modes.template">📜 Chat-template</button>
|
|
@@ -383,7 +392,10 @@
|
|
| 383 |
</div>
|
| 384 |
</div>
|
| 385 |
<div class="task-tile">
|
| 386 |
-
<h3
|
|
|
|
|
|
|
|
|
|
| 387 |
<p class="tile-desc" data-i18n="tile.compare.desc">Side-by-side, or browse the empirical model landscape.</p>
|
| 388 |
<div class="tile-modes">
|
| 389 |
<button data-mode-link="compare" data-i18n="modes.compare">🆚 Compare models</button>
|
|
@@ -391,7 +403,10 @@
|
|
| 391 |
</div>
|
| 392 |
</div>
|
| 393 |
<div class="task-tile">
|
| 394 |
-
<h3
|
|
|
|
|
|
|
|
|
|
| 395 |
<p class="tile-desc" data-i18n="tile.manual.desc">Pick a specific recipe by hand, or ask in plain English.</p>
|
| 396 |
<div class="tile-modes">
|
| 397 |
<button data-mode-link="recipe" data-i18n="modes.recipe">📋 Pick recipe</button>
|
|
|
|
| 355 |
<p class="recipe-desc" data-i18n="tiles.subtitle">Pick a task. Each one opens the right tool below. Or scroll down for the full list of 14 modes.</p>
|
| 356 |
<div class="tiles-grid">
|
| 357 |
<div class="task-tile">
|
| 358 |
+
<h3>
|
| 359 |
+
<span data-i18n="tile.diagnose.title">🔬 Diagnose a model</span>
|
| 360 |
+
<span class="info"><span class="tooltip" data-i18n="tile.diagnose.tip">Start here when you have a specific model id and want a full diagnostic: <strong>Profile</strong> runs all 5 recipes at once. <strong>Unmask</strong> checks if max_position_embeddings is honest. <strong>NIAH→Reason</strong> predicts retrieval-vs-reasoning gap. <strong>Quant</strong> predicts whether quantizing will break it. <strong>Inspect</strong> lets you paste raw config.json for private/in-dev models.</span></span>
|
| 361 |
+
</h3>
|
| 362 |
<p class="tile-desc" data-i18n="tile.diagnose.desc">Will this specific model work for my use case?</p>
|
| 363 |
<div class="tile-modes">
|
| 364 |
<button data-mode-link="profile" data-i18n="modes.profile">📇 Profile a model</button>
|
|
|
|
| 369 |
</div>
|
| 370 |
</div>
|
| 371 |
<div class="task-tile">
|
| 372 |
+
<h3>
|
| 373 |
+
<span data-i18n="tile.trust.title">✓ Trust a benchmark score</span>
|
| 374 |
+
<span class="info"><span class="tooltip" data-i18n="tile.trust.tip">When you see a score and want to know if it's real. <strong>Contamination</strong> rates 20+ benchmarks for likelihood the model saw them during training. <strong>Drift</strong> tells you if a gap between two evals is numerical noise or a real bug (chat-template mismatch, KV-cache layout, etc.). <strong>Arena CI</strong> reconstructs the confidence intervals Chatbot Arena hides — many top-Elo "wins" are statistically tied.</span></span>
|
| 375 |
+
</h3>
|
| 376 |
<p class="tile-desc" data-i18n="tile.trust.desc">Should I believe this number? Bug or noise?</p>
|
| 377 |
<div class="tile-modes">
|
| 378 |
<button data-mode-link="contam" data-i18n="modes.contam">🧪 Contamination</button>
|
|
|
|
| 381 |
</div>
|
| 382 |
</div>
|
| 383 |
<div class="task-tile">
|
| 384 |
+
<h3>
|
| 385 |
+
<span data-i18n="tile.eval.title">⚙️ Set up an eval correctly</span>
|
| 386 |
+
<span class="info"><span class="tooltip" data-i18n="tile.eval.tip">Before you run lm-eval-harness or vLLM serve, get the right CLI flag. <strong>Chat-template Sniffer</strong> detects the template family (Llama-3 / ChatML / Mistral / Phi-3 / DeepSeek / Alpaca / custom / none) and emits the exact <code>--apply_chat_template</code> / <code>--chat-template</code> invocation. Solves issue #1841 in lm-eval-harness (silent ÷2 accuracy). <strong>Diagnose CLI</strong> generates the Python command to measure γ_obs on your local GPU.</span></span>
|
| 387 |
+
</h3>
|
| 388 |
<p class="tile-desc" data-i18n="tile.eval.desc">Get the exact CLI flag for lm-eval / vLLM / transformers.</p>
|
| 389 |
<div class="tile-modes">
|
| 390 |
<button data-mode-link="template" data-i18n="modes.template">📜 Chat-template</button>
|
|
|
|
| 392 |
</div>
|
| 393 |
</div>
|
| 394 |
<div class="task-tile">
|
| 395 |
+
<h3>
|
| 396 |
+
<span data-i18n="tile.compare.title">🆚 Compare models</span>
|
| 397 |
+
<span class="info"><span class="tooltip" data-i18n="tile.compare.tip"><strong>Compare</strong>: pick 2-3 candidate models + one recipe, see verdicts in a side-by-side table (e.g. Llama-3-8B vs Mistral-7B at 32k context). <strong>Phase diagram</strong>: scatter of 23 empirical models on the (log θ, γ) plane, with the Padé curve overlaid. Hover dots for details, click to load that model into the Recipe form.</span></span>
|
| 398 |
+
</h3>
|
| 399 |
<p class="tile-desc" data-i18n="tile.compare.desc">Side-by-side, or browse the empirical model landscape.</p>
|
| 400 |
<div class="tile-modes">
|
| 401 |
<button data-mode-link="compare" data-i18n="modes.compare">🆚 Compare models</button>
|
|
|
|
| 403 |
</div>
|
| 404 |
</div>
|
| 405 |
<div class="task-tile">
|
| 406 |
+
<h3>
|
| 407 |
+
<span data-i18n="tile.manual.title">📋 Manual / free-form</span>
|
| 408 |
+
<span class="info"><span class="tooltip" data-i18n="tile.manual.tip"><strong>Recipe</strong>: pick a specific X-N recipe (X-1 custom-vs-API, X-2 long context, X-3 budget, X-5 hardware, X-19 KV compression, X-21 imprint, X-22 compute-context invariant, X-23 IH-phase) and fill the form by hand for full control. <strong>Ask</strong>: type a free-form question; an in-browser 0.5B LLM (Qwen2.5) picks the right recipe and runs it. Best for "what would happen if..." exploration.</span></span>
|
| 409 |
+
</h3>
|
| 410 |
<p class="tile-desc" data-i18n="tile.manual.desc">Pick a specific recipe by hand, or ask in plain English.</p>
|
| 411 |
<div class="tile-modes">
|
| 412 |
<button data-mode-link="recipe" data-i18n="modes.recipe">📋 Pick recipe</button>
|
js/i18n.js
CHANGED
|
@@ -479,6 +479,11 @@ export const TRANSLATIONS = {
|
|
| 479 |
"tile.compare.desc": "Side-by-side, or browse the empirical model landscape.",
|
| 480 |
"tile.manual.title": "📋 Manual / free-form",
|
| 481 |
"tile.manual.desc": "Pick a specific recipe by hand, or ask in plain English.",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 482 |
"share.import_desc": "Got a JSON file from someone else's TAF analysis? Load it here to see the verdict + chain locally. Same view as if you'd run it yourself.",
|
| 483 |
"share.import_btn": "📂 Load shared JSON",
|
| 484 |
"synthesis.system": "You are a precise transformer LLM diagnostic assistant. Given pre-computed TAF formula results, write a clear plain-English summary in 4-6 sentences. Cite the section number (§X.Y) for each number you mention. Always give a concrete recommendation. Do NOT invent numbers.",
|
|
@@ -1382,6 +1387,11 @@ export const TRANSLATIONS = {
|
|
| 1382 |
"tile.compare.desc": "Lado a lado, o explora el panel empírico de modelos.",
|
| 1383 |
"tile.manual.title": "📋 Manual / libre",
|
| 1384 |
"tile.manual.desc": "Elige una receta concreta a mano, o pregunta en inglés llano.",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1385 |
"share.import_desc": "¿Tienes un fichero JSON del análisis TAF de alguien? Cárgalo aquí para ver el veredicto + cadena localmente. La misma vista que si lo hubieras ejecutado tú.",
|
| 1386 |
"share.import_btn": "📂 Cargar JSON compartido",
|
| 1387 |
"synthesis.system": "Eres un asistente de diagnóstico preciso para LLMs transformer. Dados resultados de fórmulas TAF pre-calculados, escribe un resumen claro en español de 4-6 frases. Cita el número de sección (§X.Y) para cada número que menciones. Da siempre una recomendación concreta. NO inventes números.",
|
|
@@ -2149,6 +2159,11 @@ export const TRANSLATIONS = {
|
|
| 2149 |
"tile.compare.desc": "Côte à côte, ou explorez le panel empirique de modèles.",
|
| 2150 |
"tile.manual.title": "📋 Manuel / libre",
|
| 2151 |
"tile.manual.desc": "Choisissez une recette à la main, ou demandez en langage naturel.",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2152 |
"share.import_desc": "Vous avez un fichier JSON de l'analyse TAF de quelqu'un ? Chargez-le ici pour voir le verdict + la chaîne localement. La même vue que si vous l'aviez exécuté vous-même.",
|
| 2153 |
"share.import_btn": "📂 Charger JSON partagé",
|
| 2154 |
"synthesis.system": "Vous êtes un assistant de diagnostic précis pour LLMs transformer. Étant donné des résultats de formules TAF pré-calculés, écrivez un résumé clair en français de 4-6 phrases. Citez le numéro de section (§X.Y) pour chaque nombre mentionné. Donnez toujours une recommandation concrète. N'INVENTEZ PAS de nombres.",
|
|
@@ -2916,6 +2931,11 @@ export const TRANSLATIONS = {
|
|
| 2916 |
"tile.compare.desc": "并排,或浏览经验模型面板。",
|
| 2917 |
"tile.manual.title": "📋 手动 / 自由",
|
| 2918 |
"tile.manual.desc": "手动挑一个具体 recipe,或用自然语言提问。",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2919 |
"share.import_desc": "有他人 TAF 分析的 JSON 文件? 在这里加载以本地查看判定 + 链。与您自己运行的视图相同。",
|
| 2920 |
"share.import_btn": "📂 加载共享的 JSON",
|
| 2921 |
"synthesis.system": "您是 transformer LLM 的精确诊断助手。给定预先计算的 TAF 公式结果,用 4-6 句中文写出清晰的摘要。为每个提到的数字引用章节号 (§X.Y)。始终给出具体建议。不要编造数字。",
|
|
|
|
| 479 |
"tile.compare.desc": "Side-by-side, or browse the empirical model landscape.",
|
| 480 |
"tile.manual.title": "📋 Manual / free-form",
|
| 481 |
"tile.manual.desc": "Pick a specific recipe by hand, or ask in plain English.",
|
| 482 |
+
"tile.diagnose.tip": "Start here when you have a specific model id and want a full diagnostic: <strong>Profile</strong> runs all 5 recipes at once. <strong>Unmask</strong> checks if max_position_embeddings is honest. <strong>NIAH→Reason</strong> predicts retrieval-vs-reasoning gap. <strong>Quant</strong> predicts whether quantizing will break it. <strong>Inspect</strong> lets you paste raw config.json for private/in-dev models.",
|
| 483 |
+
"tile.trust.tip": "When you see a score and want to know if it's real. <strong>Contamination</strong> rates 20+ benchmarks for likelihood the model saw them during training. <strong>Drift</strong> tells you if a gap between two evals is numerical noise or a real bug (chat-template mismatch, KV-cache layout, etc.). <strong>Arena CI</strong> reconstructs the confidence intervals Chatbot Arena hides — many top-Elo "wins" are statistically tied.",
|
| 484 |
+
"tile.eval.tip": "Before you run lm-eval-harness or vLLM serve, get the right CLI flag. <strong>Chat-template Sniffer</strong> detects the template family (Llama-3 / ChatML / Mistral / Phi-3 / DeepSeek / Alpaca / custom / none) and emits the exact <code>--apply_chat_template</code> / <code>--chat-template</code> invocation. Solves issue #1841 in lm-eval-harness (silent ÷2 accuracy). <strong>Diagnose CLI</strong> generates the Python command to measure γ_obs on your local GPU.",
|
| 485 |
+
"tile.compare.tip": "<strong>Compare</strong>: pick 2-3 candidate models + one recipe, see verdicts in a side-by-side table (e.g. Llama-3-8B vs Mistral-7B at 32k context). <strong>Phase diagram</strong>: scatter of 23 empirical models on the (log θ, γ) plane, with the Padé curve overlaid. Hover dots for details, click to load that model into the Recipe form.",
|
| 486 |
+
"tile.manual.tip": "<strong>Recipe</strong>: pick a specific X-N recipe (X-1 custom-vs-API, X-2 long context, X-3 budget, X-5 hardware, X-19 KV compression, X-21 imprint, X-22 compute-context invariant, X-23 IH-phase) and fill the form by hand for full control. <strong>Ask</strong>: type a free-form question; an in-browser 0.5B LLM (Qwen2.5) picks the right recipe and runs it. Best for "what would happen if..." exploration.",
|
| 487 |
"share.import_desc": "Got a JSON file from someone else's TAF analysis? Load it here to see the verdict + chain locally. Same view as if you'd run it yourself.",
|
| 488 |
"share.import_btn": "📂 Load shared JSON",
|
| 489 |
"synthesis.system": "You are a precise transformer LLM diagnostic assistant. Given pre-computed TAF formula results, write a clear plain-English summary in 4-6 sentences. Cite the section number (§X.Y) for each number you mention. Always give a concrete recommendation. Do NOT invent numbers.",
|
|
|
|
| 1387 |
"tile.compare.desc": "Lado a lado, o explora el panel empírico de modelos.",
|
| 1388 |
"tile.manual.title": "📋 Manual / libre",
|
| 1389 |
"tile.manual.desc": "Elige una receta concreta a mano, o pregunta en inglés llano.",
|
| 1390 |
+
"tile.diagnose.tip": "Empieza aquí cuando tengas un id de modelo concreto y quieras diagnóstico completo: <strong>Profile</strong> corre las 5 recetas a la vez. <strong>Unmask</strong> comprueba si max_position_embeddings es honesto. <strong>NIAH→Reason</strong> predice el gap retrieval-vs-reasoning. <strong>Quant</strong> predice si cuantizar lo romperá. <strong>Inspect</strong> permite pegar config.json crudo para modelos privados / en desarrollo.",
|
| 1391 |
+
"tile.trust.tip": "Cuando ves un score y quieres saber si es real. <strong>Contamination</strong> puntúa 20+ benchmarks por probabilidad de que el modelo los viera en entrenamiento. <strong>Drift</strong> te dice si el gap entre dos evals es ruido numérico o bug real (chat-template mismatch, layout KV-cache, etc.). <strong>Arena CI</strong> reconstruye los intervalos de confianza que Chatbot Arena oculta — muchas "victorias" top-Elo están estadísticamente empatadas.",
|
| 1392 |
+
"tile.eval.tip": "Antes de correr lm-eval-harness o vLLM serve, obtén el flag CLI correcto. <strong>Chat-template Sniffer</strong> detecta la familia de template (Llama-3 / ChatML / Mistral / Phi-3 / DeepSeek / Alpaca / custom / none) y emite la invocación exacta <code>--apply_chat_template</code> / <code>--chat-template</code>. Resuelve el issue #1841 de lm-eval-harness (÷2 accuracy silencioso). <strong>Diagnose CLI</strong> genera el comando Python para medir γ_obs en tu GPU local.",
|
| 1393 |
+
"tile.compare.tip": "<strong>Compare</strong>: elige 2-3 modelos candidatos + una receta, ve veredictos en tabla lado a lado (ej. Llama-3-8B vs Mistral-7B a 32k). <strong>Phase diagram</strong>: scatter de 23 modelos empíricos en el plano (log θ, γ), con la curva Padé superpuesta. Hover puntos para detalles, click para cargar ese modelo en la Recipe form.",
|
| 1394 |
+
"tile.manual.tip": "<strong>Recipe</strong>: elige una receta X-N específica (X-1 custom-vs-API, X-2 long context, X-3 budget, X-5 hardware, X-19 compresión KV, X-21 imprint, X-22 compute-context invariant, X-23 IH-phase) y rellena la form a mano para control total. <strong>Ask</strong>: escribe una pregunta libre; un LLM 0.5B (Qwen2.5) en tu navegador elige la receta correcta y la ejecuta. Ideal para exploración "qué pasaría si...".",
|
| 1395 |
"share.import_desc": "¿Tienes un fichero JSON del análisis TAF de alguien? Cárgalo aquí para ver el veredicto + cadena localmente. La misma vista que si lo hubieras ejecutado tú.",
|
| 1396 |
"share.import_btn": "📂 Cargar JSON compartido",
|
| 1397 |
"synthesis.system": "Eres un asistente de diagnóstico preciso para LLMs transformer. Dados resultados de fórmulas TAF pre-calculados, escribe un resumen claro en español de 4-6 frases. Cita el número de sección (§X.Y) para cada número que menciones. Da siempre una recomendación concreta. NO inventes números.",
|
|
|
|
| 2159 |
"tile.compare.desc": "Côte à côte, ou explorez le panel empirique de modèles.",
|
| 2160 |
"tile.manual.title": "📋 Manuel / libre",
|
| 2161 |
"tile.manual.desc": "Choisissez une recette à la main, ou demandez en langage naturel.",
|
| 2162 |
+
"tile.diagnose.tip": "Commencez ici quand vous avez un id de modèle spécifique et voulez un diagnostic complet : <strong>Profile</strong> lance les 5 recettes d'un coup. <strong>Unmask</strong> vérifie si max_position_embeddings est honnête. <strong>NIAH→Reason</strong> prédit le gap retrieval-vs-reasoning. <strong>Quant</strong> prédit si quantifier va le casser. <strong>Inspect</strong> permet de coller un config.json brut pour modèles privés / en dev.",
|
| 2163 |
+
"tile.trust.tip": "Quand vous voyez un score et voulez savoir s'il est réel. <strong>Contamination</strong> note 20+ benchmarks selon la probabilité que le modèle les ait vus en entraînement. <strong>Drift</strong> vous dit si l'écart entre deux évals est du bruit numérique ou un vrai bug (chat-template mismatch, layout KV-cache, etc.). <strong>Arena CI</strong> reconstruit les intervalles de confiance que Chatbot Arena cache — beaucoup de "victoires" top-Elo sont statistiquement à égalité.",
|
| 2164 |
+
"tile.eval.tip": "Avant de lancer lm-eval-harness ou vLLM serve, obtenez le bon flag CLI. <strong>Chat-template Sniffer</strong> détecte la famille de template (Llama-3 / ChatML / Mistral / Phi-3 / DeepSeek / Alpaca / custom / none) et émet l'invocation exacte <code>--apply_chat_template</code> / <code>--chat-template</code>. Résout l'issue #1841 de lm-eval-harness (÷2 accuracy silencieux). <strong>Diagnose CLI</strong> génère la commande Python pour mesurer γ_obs sur votre GPU local.",
|
| 2165 |
+
"tile.compare.tip": "<strong>Compare</strong> : choisissez 2-3 modèles candidats + une recette, voyez les verdicts dans un tableau côte à côte (ex. Llama-3-8B vs Mistral-7B à 32k). <strong>Phase diagram</strong> : nuage de 23 modèles empiriques dans le plan (log θ, γ), avec la courbe Padé superposée. Survolez les points pour détails, cliquez pour charger ce modèle dans le formulaire Recipe.",
|
| 2166 |
+
"tile.manual.tip": "<strong>Recipe</strong> : choisissez une recette X-N spécifique (X-1 custom-vs-API, X-2 long context, X-3 budget, X-5 hardware, X-19 compression KV, X-21 imprint, X-22 compute-context invariant, X-23 IH-phase) et remplissez le formulaire à la main pour contrôle total. <strong>Ask</strong> : tapez une question libre ; un LLM 0.5B (Qwen2.5) dans votre navigateur choisit la bonne recette et la lance. Idéal pour explorer "que se passerait-il si...".",
|
| 2167 |
"share.import_desc": "Vous avez un fichier JSON de l'analyse TAF de quelqu'un ? Chargez-le ici pour voir le verdict + la chaîne localement. La même vue que si vous l'aviez exécuté vous-même.",
|
| 2168 |
"share.import_btn": "📂 Charger JSON partagé",
|
| 2169 |
"synthesis.system": "Vous êtes un assistant de diagnostic précis pour LLMs transformer. Étant donné des résultats de formules TAF pré-calculés, écrivez un résumé clair en français de 4-6 phrases. Citez le numéro de section (§X.Y) pour chaque nombre mentionné. Donnez toujours une recommandation concrète. N'INVENTEZ PAS de nombres.",
|
|
|
|
| 2931 |
"tile.compare.desc": "并排,或浏览经验模型面板。",
|
| 2932 |
"tile.manual.title": "📋 手动 / 自由",
|
| 2933 |
"tile.manual.desc": "手动挑一个具体 recipe,或用自然语言提问。",
|
| 2934 |
+
"tile.diagnose.tip": "当你有具体的 model id 并想要完整诊断时从这里开始:<strong>Profile</strong> 一次运行所有 5 个 recipe。<strong>Unmask</strong> 检查 max_position_embeddings 是否诚实。<strong>NIAH→Reason</strong> 预测 retrieval-vs-reasoning 的 gap。<strong>Quant</strong> 预测量化是否会破坏它。<strong>Inspect</strong> 允许粘贴原始 config.json,适用于私有 / 在研模型。",
|
| 2935 |
+
"tile.trust.tip": "当你看到一个分数想知道它是否可靠。<strong>Contamination</strong> 按模型在训练时看到 benchmark 的可能性给 20+ 个 benchmark 评级。<strong>Drift</strong> 告诉你两个 eval 之间的 gap 是数值噪声还是真实 bug(chat-template 不匹配、KV-cache 布局等)。<strong>Arena CI</strong> 重建 Chatbot Arena 隐藏的置信区间——很多 top-Elo 的 "胜利" 在统计上是并列。",
|
| 2936 |
+
"tile.eval.tip": "在运行 lm-eval-harness 或 vLLM serve 之前,获取正确的 CLI flag。<strong>Chat-template Sniffer</strong> 检测 template 系列(Llama-3 / ChatML / Mistral / Phi-3 / DeepSeek / Alpaca / custom / none)并输出精确的 <code>--apply_chat_template</code> / <code>--chat-template</code> 调用。解决 lm-eval-harness 的 issue #1841(accuracy 静默对半)。<strong>Diagnose CLI</strong> 生成 Python 命令在你的本地 GPU 上测量 γ_obs。",
|
| 2937 |
+
"tile.compare.tip": "<strong>Compare</strong>:选择 2-3 个候选模型 + 一个 recipe,在并排表格中看判定(例如 Llama-3-8B vs Mistral-7B 在 32k 上下文)。<strong>Phase diagram</strong>:23 个经验模型在 (log θ, γ) 平面上的散点图,叠加 Padé 曲线。悬停点查看详情,点击将该模型加载到 Recipe 表单。",
|
| 2938 |
+
"tile.manual.tip": "<strong>Recipe</strong>:挑选具体的 X-N recipe(X-1 自训 vs API、X-2 长上下文、X-3 预算、X-5 硬件、X-19 KV 压缩、X-21 imprint、X-22 compute-context 不变量、X-23 IH 相位)并手动填表,完全控制。<strong>Ask</strong>:输入自由问题;浏览器内的 0.5B LLM(Qwen2.5)选择合适的 recipe 并运行。最适合 "如果……会怎样" 的探索。",
|
| 2939 |
"share.import_desc": "有他人 TAF 分析的 JSON 文件? 在这里加载以本地查看判定 + 链。与您自己运行的视图相同。",
|
| 2940 |
"share.import_btn": "📂 加载共享的 JSON",
|
| 2941 |
"synthesis.system": "您是 transformer LLM 的精确诊断助手。给定预先计算的 TAF 公式结果,用 4-6 句中文写出清晰的摘要。为每个提到的数字引用章节号 (§X.Y)。始终给出具体建议。不要编造数字。",
|