Spaces:

karlexmarin
/

taf-agent

Running

karlexmarin Claude Opus 4.7 (1M context) commited on 5 days ago

Commit

3d389cc

1 Parent(s): 819758d

v0.8.4 Prompt-Cache Diff Predictor — anti-bullshit pack #10

Provider prompt caches each have different rules:
- Anthropic `cache_control` breaks at first token diff in marked prefix
- OpenAI auto-caches prefixes ≥1024 tokens; invalidates on any change
- Gemini context cache requires ≥32K tokens

A misplaced edit silently 10x's the bill — the API never warns, and the
cost only shows up on the next invoice. No public tool predicts this.

🔁 Cache Diff (18th mode):
- Two textareas: paste old + new prompt
- Tokenizer profile selector (English / code / CJK) since shipping
a real BPE in browser would mean 5-10MB WASM. Char-per-token
heuristic is robust to estimator drift because cache savings are
a RATIO, not absolute counts.
- Output: per-provider table (Claude Opus 4.7 / Sonnet 4.6 / Haiku
4.5 / GPT-5 / GPT-5 mini / Gemini 2.5 Pro) with hit ratio,
base→cached cost, savings $ + %, TTL note, marker requirement.
- Anthropic 25% write surcharge surfaced as separate row so users
see the amortization picture, not just the steady-state savings.
- Diff visualization: green common prefix + red divergent suffix
side-by-side with first-difference line number.
- Three examples: 99% hit (small Q&A edit) / cache busted (system
prompt edit) / below OpenAI min (short prompt).

Pure logic in `js/prompt_cache_diff.js` (codes + params, no human
strings); main.js renders with i18n. 41 i18n keys × 4 langs (EN/ES/FR/
ZH) = 164 keys, parity clean. Help modal v0.8.4 entry + Inventory
anti-bullshit-pack list + "Set up an eval correctly" task tile.

Pricing snapshot 2026-01 baked in with explicit "verify against current
docs" disclaimer in the attribution footer.

Source citations:
- https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching
- https://platform.openai.com/docs/guides/prompt-caching
- https://ai.google.dev/gemini-api/docs/caching

Verified: 5/5 logic cases (identical / small edit / front edit /
below-min / empty) + cost-arithmetic sanity (Anthropic 42% savings on
2K-tok prefix, OpenAI 30%, Gemini correctly rejects below-32K) +
164/164 i18n parity + headless e2e (tab/section/3 examples, providers
visible, below-min note rendered). 19 mode tabs total.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Files changed (4) hide show

index.html +46 -0
js/i18n.js +172 -0
js/main.js +199 -1
js/prompt_cache_diff.js +308 -0

index.html CHANGED Viewed

@@ -222,6 +222,9 @@
       <p><strong data-i18n="help.v083.peft.title">🔧 PEFT Anti-Pattern Checker</strong></p>
       <p data-i18n="help.v083.peft.body">PEFT's <code>get_peft_model(base, config)</code> creates a FRESH adapter — it does not load saved weights from a path. Users who paste tutorial code and try to resume from a checkpoint silently throw away their training. peft #2115 has the canonical bug report. This linter scans your training script for the pattern + 3 related issues (QLoRA ordering, target_modules/arch mismatch, lora_alpha ratio) and reports findings with line numbers and suggested fixes. <em>Use case</em>: before you launch a 10-hour LoRA fine-tune, paste your script — catch the silent bugs in 200ms.</p>
       <p><strong data-i18n="help.v081.hub.title">🧭 Solutions Hub</strong></p>
       <p data-i18n="help.v081.hub.body">tafagent as integrator, not silo. 30+ pains across 7 categories (eval reliability · diagnostics · setup · training · retrieval · multimodal · observability), each mapped to (a) the tafagent mode that addresses it, if any, and (b) the best-of-breed external tools the community already trusts (RAGAS, MTEB, HELM, MCP Schema Validator, llm-stats, llguidance, GlitchMiner, etc.). Search box matches across pain, scenario, and tool name. <em>Use case</em>: 'I have problem X — does tafagent solve it, and if not, who does?'</p>
@@ -336,6 +339,7 @@
             <li data-i18n="inv.v08.saturation"><strong>📈 Saturation</strong> — is your benchmark still useful, or are all frontier models tied at the top?</li>
             <li data-i18n="inv.v082.cot"><strong>📋 JSON CoT</strong> — lints structured-output schemas for the answer-before-reasoning anti-pattern that silently breaks Chain-of-Thought.</li>
             <li data-i18n="inv.v083.peft"><strong>🔧 PEFT Lint</strong> — catches the silent <code>get_peft_model</code> base-load (peft #2115) + QLoRA order + target_modules / arch mismatch.</li>
             <li data-i18n="inv.v081.hub"><strong>🧭 Solutions Hub</strong> — every documented pain mapped to a tafagent mode or curated external tool. Don't reinvent — find.</li>
           </ul>
         </details>
@@ -409,6 +413,7 @@
             <button data-mode-link="diagnose" data-i18n="modes.diagnose">🩺 Diagnose CLI</button>
             <button data-mode-link="cot" data-i18n="modes.cot">📋 JSON CoT</button>
             <button data-mode-link="peft" data-i18n="modes.peft">🔧 PEFT Lint</button>
           </div>
         </div>
         <div class="task-tile">
@@ -467,6 +472,7 @@
         <button class="mode-btn" data-mode="saturation" role="tab" aria-selected="false" data-i18n="modes.saturation">📈 Saturation</button>
         <button class="mode-btn" data-mode="cot" role="tab" aria-selected="false" data-i18n="modes.cot">📋 JSON CoT</button>
         <button class="mode-btn" data-mode="peft" role="tab" aria-selected="false" data-i18n="modes.peft">🔧 PEFT Lint</button>
         <button class="mode-btn" data-mode="hub" role="tab" aria-selected="false" data-i18n="modes.hub">🧭 Solutions</button>
       </div>
       <p id="mode-desc" class="recipe-desc" data-i18n="modes.desc">
@@ -1061,6 +1067,46 @@
       <div id="peft-output" style="margin-top: 1em;"></div>
     </section>
     <section id="hub-section" style="display:none;">
       <h2><span data-i18n="hub.title">🧭 Solutions Hub</span>
         <span class="info"><span class="tooltip" data-i18n="hub.tip">

       <p><strong data-i18n="help.v083.peft.title">🔧 PEFT Anti-Pattern Checker</strong></p>
       <p data-i18n="help.v083.peft.body">PEFT's <code>get_peft_model(base, config)</code> creates a FRESH adapter — it does not load saved weights from a path. Users who paste tutorial code and try to resume from a checkpoint silently throw away their training. peft #2115 has the canonical bug report. This linter scans your training script for the pattern + 3 related issues (QLoRA ordering, target_modules/arch mismatch, lora_alpha ratio) and reports findings with line numbers and suggested fixes. <em>Use case</em>: before you launch a 10-hour LoRA fine-tune, paste your script — catch the silent bugs in 200ms.</p>
+      <p><strong data-i18n="help.v084.cache.title">🔁 Prompt-Cache Diff Predictor</strong></p>
+      <p data-i18n="help.v084.cache.body">Provider prompt caches each have different rules: Anthropic's <code>cache_control</code> breaks at the first token diff in the marked prefix; OpenAI auto-caches prefixes ≥1024 tokens; Gemini context caches require ≥32K tokens. A misplaced edit silently 10x's your bill — the API never warns you, and the cost only shows up on the next invoice. Paste old + new prompt, the predictor finds the longest common prefix, estimates tokens with three tokenizer profiles (English / code / CJK), and shows per-provider hit ratio + $ delta vs no-cache for Claude Opus/Sonnet/Haiku, GPT-5/mini, and Gemini 2.5 Pro. <em>Use case</em>: 'I tweaked the system prompt and the bill jumped — what broke?' → paste both prompts, see exactly which provider stopped caching.</p>
       <p><strong data-i18n="help.v081.hub.title">🧭 Solutions Hub</strong></p>
       <p data-i18n="help.v081.hub.body">tafagent as integrator, not silo. 30+ pains across 7 categories (eval reliability · diagnostics · setup · training · retrieval · multimodal · observability), each mapped to (a) the tafagent mode that addresses it, if any, and (b) the best-of-breed external tools the community already trusts (RAGAS, MTEB, HELM, MCP Schema Validator, llm-stats, llguidance, GlitchMiner, etc.). Search box matches across pain, scenario, and tool name. <em>Use case</em>: 'I have problem X — does tafagent solve it, and if not, who does?'</p>
             <li data-i18n="inv.v08.saturation"><strong>📈 Saturation</strong> — is your benchmark still useful, or are all frontier models tied at the top?</li>
             <li data-i18n="inv.v082.cot"><strong>📋 JSON CoT</strong> — lints structured-output schemas for the answer-before-reasoning anti-pattern that silently breaks Chain-of-Thought.</li>
             <li data-i18n="inv.v083.peft"><strong>🔧 PEFT Lint</strong> — catches the silent <code>get_peft_model</code> base-load (peft #2115) + QLoRA order + target_modules / arch mismatch.</li>
+            <li data-i18n="inv.v084.cache"><strong>🔁 Cache Diff</strong> — predicts whether a prompt edit invalidated the provider's prompt cache. Per-provider hit ratio + $ delta.</li>
             <li data-i18n="inv.v081.hub"><strong>🧭 Solutions Hub</strong> — every documented pain mapped to a tafagent mode or curated external tool. Don't reinvent — find.</li>
           </ul>
         </details>
             <button data-mode-link="diagnose" data-i18n="modes.diagnose">🩺 Diagnose CLI</button>
             <button data-mode-link="cot" data-i18n="modes.cot">📋 JSON CoT</button>
             <button data-mode-link="peft" data-i18n="modes.peft">🔧 PEFT Lint</button>
+            <button data-mode-link="cache" data-i18n="modes.cache">🔁 Cache Diff</button>
           </div>
         </div>
         <div class="task-tile">
         <button class="mode-btn" data-mode="saturation" role="tab" aria-selected="false" data-i18n="modes.saturation">📈 Saturation</button>
         <button class="mode-btn" data-mode="cot" role="tab" aria-selected="false" data-i18n="modes.cot">📋 JSON CoT</button>
         <button class="mode-btn" data-mode="peft" role="tab" aria-selected="false" data-i18n="modes.peft">🔧 PEFT Lint</button>
+        <button class="mode-btn" data-mode="cache" role="tab" aria-selected="false" data-i18n="modes.cache">🔁 Cache Diff</button>
         <button class="mode-btn" data-mode="hub" role="tab" aria-selected="false" data-i18n="modes.hub">🧭 Solutions</button>
       </div>
       <p id="mode-desc" class="recipe-desc" data-i18n="modes.desc">
       <div id="peft-output" style="margin-top: 1em;"></div>
     </section>
+    <!-- Prompt-Cache Diff Predictor (mode=cache, v0.8.4 anti-bullshit pack #10) -->
+    <section id="cache-section" style="display:none;">
+      <h2><span data-i18n="cache.title">🔁 Prompt-Cache Diff Predictor</span>
+        <span class="info"><span class="tooltip" data-i18n="cache.tip">
+          <strong>Why this matters</strong>: Anthropic's `cache_control` cache breaks at the first token diff in the marked prefix. OpenAI auto-caches prefixes ≥1024 tokens but invalidates on any change. Gemini context cache requires ≥32K tokens. A misplaced edit silently 10x's your bill — and the API never warns you. Paste old + new prompt, see per-provider hit ratio + cost delta.
+        </span></span>
+      </h2>
+      <p class="recipe-desc" data-i18n="cache.desc">
+        <strong>Don't 10x your bill on a one-character edit.</strong> Paste your previous and current prompt — the predictor finds the longest common prefix, estimates tokens, and shows per-provider cache hit ratio + $ delta vs no-cache.
+      </p>
+      <div class="form-row" style="display:flex; gap:1em; flex-wrap:wrap;">
+        <div style="flex:1; min-width:300px;">
+          <label for="cache-old" data-i18n="cache.old_label">Old prompt:</label>
+          <textarea id="cache-old" rows="10" style="width:100%;font-family:monospace;font-size:0.85em;" data-i18n-placeholder="cache.old.placeholder" placeholder="You are a helpful assistant. …"></textarea>
+        </div>
+        <div style="flex:1; min-width:300px;">
+          <label for="cache-new" data-i18n="cache.new_label">New prompt:</label>
+          <textarea id="cache-new" rows="10" style="width:100%;font-family:monospace;font-size:0.85em;" data-i18n-placeholder="cache.new.placeholder" placeholder="You are a helpful assistant. …"></textarea>
+        </div>
+      </div>
+      <div class="form-row">
+        <label for="cache-profile" data-i18n="cache.profile_label">Tokenizer profile:</label>
+        <select id="cache-profile">
+          <option value="english" data-i18n="cache.profile.english">English (chars/4)</option>
+          <option value="code" data-i18n="cache.profile.code">Code (chars/3.5)</option>
+          <option value="mixed" data-i18n="cache.profile.mixed">CJK / Cyrillic (chars/2)</option>
+        </select>
+        <label for="cache-output-tokens" data-i18n="cache.output_label">Estimated output tokens:</label>
+        <input type="number" id="cache-output-tokens" value="500" min="0" max="100000" style="width:8em;" />
+      </div>
+      <div class="form-row">
+        <button type="button" id="cache-diff-btn" data-i18n="cache.diff_btn">🔍 Predict</button>
+        <button type="button" id="cache-example-good-btn" class="secondary" data-i18n="cache.example_good_btn">↳ Example: 99% hit</button>
+        <button type="button" id="cache-example-broken-btn" class="secondary" data-i18n="cache.example_broken_btn">↳ Example: cache busted</button>
+        <button type="button" id="cache-example-belowmin-btn" class="secondary" data-i18n="cache.example_belowmin_btn">↳ Example: below OpenAI min</button>
+      </div>
+      <p id="cache-status" class="recipe-desc" style="font-size:0.92em;"></p>
+      <div id="cache-output" style="margin-top: 1em;"></div>
+    </section>
     <section id="hub-section" style="display:none;">
       <h2><span data-i18n="hub.title">🧭 Solutions Hub</span>
         <span class="info"><span class="tooltip" data-i18n="hub.tip">

js/i18n.js CHANGED Viewed

@@ -594,6 +594,49 @@ export const TRANSLATIONS = {
     "help.v083.peft.title":        "🔧 PEFT Anti-Pattern Checker",
     "help.v083.peft.body":         "PEFT's <code>get_peft_model(base, config)</code> creates a FRESH adapter — it does not load saved weights from a path. Users who paste tutorial code and try to resume from a checkpoint silently throw away their training. peft #2115 has the canonical bug report. This linter scans your training script for the pattern + 3 related issues (QLoRA ordering, target_modules/arch mismatch, lora_alpha ratio) and reports findings with line numbers and suggested fixes. <em>Use case</em>: before you launch a 10-hour LoRA fine-tune, paste your script — catch the silent bugs in 200ms.",
     "inv.v081.hub":                "<strong>🧭 Solutions Hub</strong> — every documented pain mapped to a tafagent mode or curated external tool. Don't reinvent — find.",
     "help.v081.hub.title":         "🧭 Solutions Hub",
     "help.v081.hub.body":          "tafagent as integrator, not silo. 30+ pains across 7 categories (eval reliability · diagnostics · setup · training · retrieval · multimodal · observability), each mapped to (a) the tafagent mode that addresses it, if any, and (b) the best-of-breed external tools the community already trusts (RAGAS, MTEB, HELM, MCP Schema Validator, llm-stats, llguidance, GlitchMiner, etc.). Search box matches across pain, scenario, and tool name. <em>Use case</em>: 'I have problem X — does tafagent solve it, and if not, who does?'",
@@ -1647,6 +1690,49 @@ export const TRANSLATIONS = {
     "help.v083.peft.title":        "🔧 Verificador de anti-patrones PEFT",
     "help.v083.peft.body":         "El <code>get_peft_model(base, config)</code> de PEFT crea un adapter NUEVO — no carga pesos guardados desde una ruta. Quien pega código de tutorial e intenta reanudar desde un checkpoint tira silenciosamente su entrenamiento. peft #2115 tiene el bug report canónico. Este linter escanea tu script buscando el patrón + 3 issues relacionados (orden QLoRA, mismatch target_modules/arch, ratio lora_alpha) y reporta hallazgos con números de línea y sugerencias. <em>Caso de uso</em>: antes de lanzar un fine-tune LoRA de 10 horas, pega tu script — atrapa los bugs silenciosos en 200ms.",
     "inv.v081.hub":                "<strong>🧭 Solutions Hub</strong> — cada pain documentado mapeado a un mode tafagent o herramienta externa curada. No reinventes — encuentra.",
     "help.v081.hub.title":         "🧭 Solutions Hub",
     "help.v081.hub.body":          "tafagent como integrador, no silo. 30+ pains en 7 categorías (eval reliability · diagnósticos · setup · training · retrieval · multimodal · observability), cada uno mapeado a (a) el mode tafagent que lo resuelve, si existe, y (b) las herramientas externas best-of-breed que la comunidad ya usa (RAGAS, MTEB, HELM, MCP Schema Validator, llm-stats, llguidance, GlitchMiner, etc.). Caja de búsqueda matchea pain, scenario, y nombre de herramienta. <em>Caso de uso</em>: 'tengo problema X — ¿lo resuelve tafagent, y si no, quién?'",
@@ -2564,6 +2650,49 @@ export const TRANSLATIONS = {
     "help.v083.peft.title":        "🔧 Vérificateur d'anti-patterns PEFT",
     "help.v083.peft.body":         "Le <code>get_peft_model(base, config)</code> de PEFT crée un NOUVEL adaptateur — il ne charge pas les poids sauvegardés depuis un chemin. Quiconque colle du code de tuto et essaie de reprendre depuis un checkpoint jette silencieusement son entraînement. peft #2115 contient le bug report canonique. Ce linter scanne votre script à la recherche du pattern + 3 problèmes liés (ordre QLoRA, mismatch target_modules/arch, ratio lora_alpha) et rapporte les découvertes avec numéros de ligne et corrections suggérées. <em>Cas d'usage</em> : avant de lancer un fine-tune LoRA de 10 heures, collez votre script — attrapez les bugs silencieux en 200ms.",
     "inv.v081.hub":                "<strong>🧭 Solutions Hub</strong> — chaque pain documenté mappé à un mode tafagent ou outil externe curé. Ne réinventez pas — trouvez.",
     "help.v081.hub.title":         "🧭 Solutions Hub",
     "help.v081.hub.body":          "tafagent comme intégrateur, pas silo. 30+ pains à travers 7 catégories (eval reliability · diagnostics · setup · training · retrieval · multimodal · observability), chacun mappé à (a) le mode tafagent qui le résout, s'il existe, et (b) les outils externes best-of-breed que la communauté utilise déjà (RAGAS, MTEB, HELM, MCP Schema Validator, llm-stats, llguidance, GlitchMiner, etc.). La barre de recherche matche pain, scénario, et nom d'outil. <em>Cas d'usage</em> : 'j'ai le problème X — tafagent le résout-il, et sinon, qui ?'",
@@ -3481,6 +3610,49 @@ export const TRANSLATIONS = {
     "help.v083.peft.title":        "🔧 PEFT 反模式检查器",
     "help.v083.peft.body":         "PEFT 的 <code>get_peft_model(base, config)</code> 创建一个新的 adapter——它不从路径加载已保存的权重。粘贴教程代码并尝试从 checkpoint 恢复的人会静默地丢掉训练。peft #2115 是规范的 bug 报告。这个 linter 扫描你的脚本查找该模式 + 3 个相关问题（QLoRA 顺序、target_modules/架构不匹配、lora_alpha 比率），并报告带行号和建议修复的发现。<em>用例</em>：在启动 10 小时的 LoRA fine-tune 之前，粘贴你的脚本——在 200ms 内捕获静默 bug。",
     "inv.v081.hub":                "<strong>🧭 Solutions Hub</strong> — 每个文档化的问题都映射到一个 tafagent 模式或精选外部工具。别重复发明 — 去找。",
     "help.v081.hub.title":         "🧭 Solutions Hub",
     "help.v081.hub.body":          "tafagent 作为集成者而非孤岛。30+ 问题跨 7 类别（评估可靠性 · 诊断 · 设置 · 训练 · 检索 · 多模态 · 可观测性），每个映射到（a）解决它的 tafagent 模式（若存在），以及（b）社区已信任的最佳外部工具（RAGAS、MTEB、HELM、MCP Schema Validator、llm-stats、llguidance、GlitchMiner 等）。搜索框匹配 pain、场景和工具名称。<em>用例</em>：'我有问题 X — tafagent 解决它吗，如果不，谁解决？'",

     "help.v083.peft.title":        "🔧 PEFT Anti-Pattern Checker",
     "help.v083.peft.body":         "PEFT's <code>get_peft_model(base, config)</code> creates a FRESH adapter — it does not load saved weights from a path. Users who paste tutorial code and try to resume from a checkpoint silently throw away their training. peft #2115 has the canonical bug report. This linter scans your training script for the pattern + 3 related issues (QLoRA ordering, target_modules/arch mismatch, lora_alpha ratio) and reports findings with line numbers and suggested fixes. <em>Use case</em>: before you launch a 10-hour LoRA fine-tune, paste your script — catch the silent bugs in 200ms.",
+    // v0.8.4 — anti-bullshit pack #10: Prompt-Cache Diff Predictor
+    "modes.cache":                 "🔁 Cache Diff",
+    "mode_desc.cache":             "Predicts whether a prompt edit kept the provider's prompt cache alive or invalidated it. Per-provider hit ratio + $ delta vs no-cache.",
+    "cache.title":                 "🔁 Prompt-Cache Diff Predictor",
+    "cache.tip":                   "Anthropic's <code>cache_control</code> cache breaks at the first token diff in the marked prefix. OpenAI auto-caches prefixes ≥1024 tokens but invalidates on any change. Gemini context cache requires ≥32K tokens. A misplaced edit silently 10x's your bill — and the API never warns you. Paste old + new prompt, see per-provider hit ratio + cost delta.",
+    "cache.desc":                  "<strong>Don't 10x your bill on a one-character edit.</strong> Paste your previous and current prompt — the predictor finds the longest common prefix, estimates tokens, and shows per-provider cache hit ratio + $ delta vs no-cache.",
+    "cache.old_label":             "Old prompt:",
+    "cache.new_label":             "New prompt:",
+    "cache.old.placeholder":       "You are a helpful assistant. …",
+    "cache.new.placeholder":       "You are a helpful assistant. …",
+    "cache.profile_label":         "Tokenizer profile:",
+    "cache.profile.english":       "English (chars/4)",
+    "cache.profile.code":          "Code (chars/3.5)",
+    "cache.profile.mixed":         "CJK / Cyrillic (chars/2)",
+    "cache.output_label":          "Estimated output tokens:",
+    "cache.diff_btn":              "🔍 Predict",
+    "cache.example_good_btn":      "↳ Example: 99% hit",
+    "cache.example_broken_btn":    "↳ Example: cache busted",
+    "cache.example_belowmin_btn":  "↳ Example: below OpenAI min",
+    "cache.status.done":           "✅ {verdict} — {hit}% theoretical hit",
+    "cache.verdict.identical":          "✅ Identical — full cache hit",
+    "cache.verdict.divergent_can_cache":"⚠ Partial cache hit — providers vary",
+    "cache.verdict.divergent_below_min":"❌ Below all provider minimums — no caching possible",
+    "cache.verdict.fully_divergent":    "❌ Fully divergent — cache invalidated",
+    "cache.verdict.empty_input":        "ℹ Empty input",
+    "cache.summary.tokens":        "Common prefix {common} / {total} tokens ({pct}% theoretical hit ratio).",
+    "cache.summary.diff_at":       "First difference at line {line}.",
+    "cache.col.provider":          "Provider",
+    "cache.col.hit":               "Hit",
+    "cache.col.cost":              "Base → cached",
+    "cache.col.savings":           "Savings",
+    "cache.note.requires_marker":  "(requires cache_control marker)",
+    "cache.note.below_min":        "(prefix < {min} tokens — provider min)",
+    "cache.write_surcharge":       "+ {cost} cache-write surcharge first time (Anthropic)",
+    "cache.diff.title":            "Where the cache breaks",
+    "cache.diff.legend":           "Green = shared prefix (cacheable). Red = first edit (everything from here is re-billed).",
+    "cache.hint.empty":            "Paste two prompts, then Predict.",
+    "cache.attribution":           "Refs:",
+    "cache.attribution.snapshot":  "Prices snapshot 2026-01; verify against current provider docs before acting on $.",
+    "inv.v084.cache":              "<strong>🔁 Cache Diff</strong> — predicts whether a prompt edit invalidated the provider's prompt cache. Per-provider hit ratio + $ delta.",
+    "help.v084.cache.title":       "🔁 Prompt-Cache Diff Predictor",
+    "help.v084.cache.body":        "Provider prompt caches each have different rules: Anthropic's <code>cache_control</code> breaks at the first token diff in the marked prefix; OpenAI auto-caches prefixes ≥1024 tokens; Gemini context caches require ≥32K tokens. A misplaced edit silently 10x's your bill — the API never warns you, and the cost only shows up on the next invoice. Paste old + new prompt, the predictor finds the longest common prefix, estimates tokens with three tokenizer profiles (English / code / CJK), and shows per-provider hit ratio + $ delta vs no-cache for Claude Opus/Sonnet/Haiku, GPT-5/mini, and Gemini 2.5 Pro. <em>Use case</em>: 'I tweaked the system prompt and the bill jumped — what broke?' → paste both prompts, see exactly which provider stopped caching.",
     "inv.v081.hub":                "<strong>🧭 Solutions Hub</strong> — every documented pain mapped to a tafagent mode or curated external tool. Don't reinvent — find.",
     "help.v081.hub.title":         "🧭 Solutions Hub",
     "help.v081.hub.body":          "tafagent as integrator, not silo. 30+ pains across 7 categories (eval reliability · diagnostics · setup · training · retrieval · multimodal · observability), each mapped to (a) the tafagent mode that addresses it, if any, and (b) the best-of-breed external tools the community already trusts (RAGAS, MTEB, HELM, MCP Schema Validator, llm-stats, llguidance, GlitchMiner, etc.). Search box matches across pain, scenario, and tool name. <em>Use case</em>: 'I have problem X — does tafagent solve it, and if not, who does?'",
     "help.v083.peft.title":        "🔧 Verificador de anti-patrones PEFT",
     "help.v083.peft.body":         "El <code>get_peft_model(base, config)</code> de PEFT crea un adapter NUEVO — no carga pesos guardados desde una ruta. Quien pega código de tutorial e intenta reanudar desde un checkpoint tira silenciosamente su entrenamiento. peft #2115 tiene el bug report canónico. Este linter escanea tu script buscando el patrón + 3 issues relacionados (orden QLoRA, mismatch target_modules/arch, ratio lora_alpha) y reporta hallazgos con números de línea y sugerencias. <em>Caso de uso</em>: antes de lanzar un fine-tune LoRA de 10 horas, pega tu script — atrapa los bugs silenciosos en 200ms.",
+    // v0.8.4 — anti-bullshit pack #10: Prompt-Cache Diff Predictor
+    "modes.cache":                 "🔁 Cache Diff",
+    "mode_desc.cache":             "Predice si una edición del prompt mantuvo viva la prompt cache del proveedor o la invalidó. Hit ratio por proveedor + delta $ vs sin caché.",
+    "cache.title":                 "🔁 Predictor de Diff de Prompt-Cache",
+    "cache.tip":                   "El <code>cache_control</code> de Anthropic se rompe al primer token diferente del prefijo marcado. OpenAI auto-cachea prefijos ≥1024 tokens pero invalida ante cualquier cambio. La context cache de Gemini requiere ≥32K tokens. Una edición mal puesta silenciosamente 10x tu factura — y la API nunca avisa. Pega prompt viejo + nuevo, ve el hit ratio por proveedor + delta de coste.",
+    "cache.desc":                  "<strong>No 10x tu factura por un edit de un carácter.</strong> Pega tu prompt anterior y el actual — el predictor halla el prefijo común más largo, estima tokens, y muestra hit ratio por proveedor + delta $ vs sin caché.",
+    "cache.old_label":             "Prompt viejo:",
+    "cache.new_label":             "Prompt nuevo:",
+    "cache.old.placeholder":       "Eres un asistente útil. …",
+    "cache.new.placeholder":       "Eres un asistente útil. …",
+    "cache.profile_label":         "Perfil de tokenizer:",
+    "cache.profile.english":       "Inglés (chars/4)",
+    "cache.profile.code":          "Código (chars/3.5)",
+    "cache.profile.mixed":         "CJK / Cirílico (chars/2)",
+    "cache.output_label":          "Tokens de salida estimados:",
+    "cache.diff_btn":              "🔍 Predecir",
+    "cache.example_good_btn":      "↳ Ejemplo: hit 99%",
+    "cache.example_broken_btn":    "↳ Ejemplo: caché rota",
+    "cache.example_belowmin_btn":  "↳ Ejemplo: bajo mínimo OpenAI",
+    "cache.status.done":           "✅ {verdict} — {hit}% hit teórico",
+    "cache.verdict.identical":          "✅ Idénticos — hit completo",
+    "cache.verdict.divergent_can_cache":"⚠ Hit parcial — varía por proveedor",
+    "cache.verdict.divergent_below_min":"❌ Por debajo de mínimos — no hay caché posible",
+    "cache.verdict.fully_divergent":    "❌ Totalmente divergentes — caché invalidada",
+    "cache.verdict.empty_input":        "ℹ Entrada vacía",
+    "cache.summary.tokens":        "Prefijo común {common} / {total} tokens ({pct}% hit ratio teórico).",
+    "cache.summary.diff_at":       "Primera diferencia en la línea {line}.",
+    "cache.col.provider":          "Proveedor",
+    "cache.col.hit":               "Hit",
+    "cache.col.cost":              "Base → cached",
+    "cache.col.savings":           "Ahorro",
+    "cache.note.requires_marker":  "(requiere marcador cache_control)",
+    "cache.note.below_min":        "(prefijo < {min} tokens — mínimo del proveedor)",
+    "cache.write_surcharge":       "+ {cost} sobrecargo de cache-write la primera vez (Anthropic)",
+    "cache.diff.title":            "Dónde se rompe la caché",
+    "cache.diff.legend":           "Verde = prefijo compartido (cacheable). Rojo = primera edición (todo desde aquí se re-factura).",
+    "cache.hint.empty":            "Pega dos prompts, luego Predecir.",
+    "cache.attribution":           "Referencias:",
+    "cache.attribution.snapshot":  "Precios snapshot 2026-01; verifica con la doc actual del proveedor antes de actuar sobre $.",
+    "inv.v084.cache":              "<strong>🔁 Cache Diff</strong> — predice si un edit del prompt invalidó la prompt cache del proveedor. Hit ratio por proveedor + delta $.",
+    "help.v084.cache.title":       "🔁 Predictor de Diff de Prompt-Cache",
+    "help.v084.cache.body":        "Las prompt caches de cada proveedor tienen reglas distintas: el <code>cache_control</code> de Anthropic se rompe al primer token diferente del prefijo marcado; OpenAI auto-cachea prefijos ≥1024 tokens; las context caches de Gemini requieren ≥32K tokens. Una edición mal puesta silenciosamente 10x tu factura — la API no avisa, y el coste solo aparece en la siguiente factura. Pega prompt viejo + nuevo, el predictor halla el prefijo común más largo, estima tokens con tres perfiles de tokenizer (inglés / código / CJK), y muestra hit ratio por proveedor + delta $ vs sin caché para Claude Opus/Sonnet/Haiku, GPT-5/mini, y Gemini 2.5 Pro. <em>Caso de uso</em>: 'Tweaké el system prompt y la factura saltó — ¿qué se rompió?' → pega ambos prompts, ve exactamente qué proveedor dejó de cachear.",
     "inv.v081.hub":                "<strong>🧭 Solutions Hub</strong> — cada pain documentado mapeado a un mode tafagent o herramienta externa curada. No reinventes — encuentra.",
     "help.v081.hub.title":         "🧭 Solutions Hub",
     "help.v081.hub.body":          "tafagent como integrador, no silo. 30+ pains en 7 categorías (eval reliability · diagnósticos · setup · training · retrieval · multimodal · observability), cada uno mapeado a (a) el mode tafagent que lo resuelve, si existe, y (b) las herramientas externas best-of-breed que la comunidad ya usa (RAGAS, MTEB, HELM, MCP Schema Validator, llm-stats, llguidance, GlitchMiner, etc.). Caja de búsqueda matchea pain, scenario, y nombre de herramienta. <em>Caso de uso</em>: 'tengo problema X — ¿lo resuelve tafagent, y si no, quién?'",
     "help.v083.peft.title":        "🔧 Vérificateur d'anti-patterns PEFT",
     "help.v083.peft.body":         "Le <code>get_peft_model(base, config)</code> de PEFT crée un NOUVEL adaptateur — il ne charge pas les poids sauvegardés depuis un chemin. Quiconque colle du code de tuto et essaie de reprendre depuis un checkpoint jette silencieusement son entraînement. peft #2115 contient le bug report canonique. Ce linter scanne votre script à la recherche du pattern + 3 problèmes liés (ordre QLoRA, mismatch target_modules/arch, ratio lora_alpha) et rapporte les découvertes avec numéros de ligne et corrections suggérées. <em>Cas d'usage</em> : avant de lancer un fine-tune LoRA de 10 heures, collez votre script — attrapez les bugs silencieux en 200ms.",
+    // v0.8.4 — anti-bullshit pack #10: Prompt-Cache Diff Predictor
+    "modes.cache":                 "🔁 Cache Diff",
+    "mode_desc.cache":             "Prédit si une édition du prompt a gardé le cache prompt du fournisseur vivant ou l'a invalidé. Taux de hit par fournisseur + delta $ vs sans cache.",
+    "cache.title":                 "🔁 Prédicteur de Diff Prompt-Cache",
+    "cache.tip":                   "Le <code>cache_control</code> d'Anthropic casse au premier token différent du préfixe marqué. OpenAI auto-cache les préfixes ≥1024 tokens mais invalide à tout changement. Le context cache Gemini requiert ≥32K tokens. Une édition mal placée 10x silencieusement votre facture — et l'API ne prévient jamais. Collez ancien + nouveau prompt, voyez le taux de hit par fournisseur + delta de coût.",
+    "cache.desc":                  "<strong>Ne 10x pas votre facture sur une édition d'un caractère.</strong> Collez votre prompt précédent et actuel — le prédicteur trouve le plus long préfixe commun, estime les tokens, et montre le taux de hit par fournisseur + delta $ vs sans cache.",
+    "cache.old_label":             "Ancien prompt :",
+    "cache.new_label":             "Nouveau prompt :",
+    "cache.old.placeholder":       "Vous êtes un assistant utile. …",
+    "cache.new.placeholder":       "Vous êtes un assistant utile. …",
+    "cache.profile_label":         "Profil de tokenizer :",
+    "cache.profile.english":       "Anglais (chars/4)",
+    "cache.profile.code":          "Code (chars/3.5)",
+    "cache.profile.mixed":         "CJK / Cyrillique (chars/2)",
+    "cache.output_label":          "Tokens de sortie estimés :",
+    "cache.diff_btn":              "🔍 Prédire",
+    "cache.example_good_btn":      "↳ Exemple : 99% hit",
+    "cache.example_broken_btn":    "↳ Exemple : cache cassé",
+    "cache.example_belowmin_btn":  "↳ Exemple : sous le minimum OpenAI",
+    "cache.status.done":           "✅ {verdict} — {hit}% hit théorique",
+    "cache.verdict.identical":          "✅ Identiques — hit complet",
+    "cache.verdict.divergent_can_cache":"⚠ Hit partiel — varie selon fournisseur",
+    "cache.verdict.divergent_below_min":"❌ En dessous des minimums — pas de cache possible",
+    "cache.verdict.fully_divergent":    "❌ Totalement divergents — cache invalidé",
+    "cache.verdict.empty_input":        "ℹ Entrée vide",
+    "cache.summary.tokens":        "Préfixe commun {common} / {total} tokens (taux de hit théorique {pct}%).",
+    "cache.summary.diff_at":       "Première différence à la ligne {line}.",
+    "cache.col.provider":          "Fournisseur",
+    "cache.col.hit":               "Hit",
+    "cache.col.cost":              "Base → cached",
+    "cache.col.savings":           "Économies",
+    "cache.note.requires_marker":  "(nécessite le marqueur cache_control)",
+    "cache.note.below_min":        "(préfixe < {min} tokens — min du fournisseur)",
+    "cache.write_surcharge":       "+ {cost} surcharge cache-write la première fois (Anthropic)",
+    "cache.diff.title":            "Où le cache casse",
+    "cache.diff.legend":           "Vert = préfixe partagé (cacheable). Rouge = première édition (tout à partir d'ici est re-facturé).",
+    "cache.hint.empty":            "Collez deux prompts, puis Prédire.",
+    "cache.attribution":           "Réfs :",
+    "cache.attribution.snapshot":  "Prix snapshot 2026-01 ; vérifiez avec la doc actuelle du fournisseur avant d'agir sur $.",
+    "inv.v084.cache":              "<strong>🔁 Cache Diff</strong> — prédit si une édition du prompt a invalidé le cache prompt du fournisseur. Taux de hit par fournisseur + delta $.",
+    "help.v084.cache.title":       "🔁 Prédicteur de Diff Prompt-Cache",
+    "help.v084.cache.body":        "Les caches prompt de chaque fournisseur ont des règles différentes : le <code>cache_control</code> d'Anthropic casse au premier token différent du préfixe marqué ; OpenAI auto-cache les préfixes ≥1024 tokens ; les context caches Gemini requièrent ≥32K tokens. Une édition mal placée 10x silencieusement votre facture — l'API ne prévient pas, et le coût n'apparaît qu'à la facture suivante. Collez ancien + nouveau prompt, le prédicteur trouve le plus long préfixe commun, estime les tokens avec trois profils de tokenizer (anglais / code / CJK), et montre le taux de hit par fournisseur + delta $ vs sans cache pour Claude Opus/Sonnet/Haiku, GPT-5/mini, et Gemini 2.5 Pro. <em>Cas d'usage</em> : 'J'ai modifié le system prompt et la facture a sauté — qu'est-ce qui a cassé ?' → collez les deux prompts, voyez exactement quel fournisseur a arrêté de cacher.",
     "inv.v081.hub":                "<strong>🧭 Solutions Hub</strong> — chaque pain documenté mappé à un mode tafagent ou outil externe curé. Ne réinventez pas — trouvez.",
     "help.v081.hub.title":         "🧭 Solutions Hub",
     "help.v081.hub.body":          "tafagent comme intégrateur, pas silo. 30+ pains à travers 7 catégories (eval reliability · diagnostics · setup · training · retrieval · multimodal · observability), chacun mappé à (a) le mode tafagent qui le résout, s'il existe, et (b) les outils externes best-of-breed que la communauté utilise déjà (RAGAS, MTEB, HELM, MCP Schema Validator, llm-stats, llguidance, GlitchMiner, etc.). La barre de recherche matche pain, scénario, et nom d'outil. <em>Cas d'usage</em> : 'j'ai le problème X — tafagent le résout-il, et sinon, qui ?'",
     "help.v083.peft.title":        "🔧 PEFT 反模式检查器",
     "help.v083.peft.body":         "PEFT 的 <code>get_peft_model(base, config)</code> 创建一个新的 adapter——它不从路径加载已保存的权重。粘贴教程代码并尝试从 checkpoint 恢复的人会静默地丢掉训练。peft #2115 是规范的 bug 报告。这个 linter 扫描你的脚本查找该模式 + 3 个相关问题（QLoRA 顺序、target_modules/架构不匹配、lora_alpha 比率），并报告带行号和建议修复的发现。<em>用例</em>：在启动 10 小时的 LoRA fine-tune 之前，粘贴你的脚本——在 200ms 内捕获静默 bug。",
+    // v0.8.4 — anti-bullshit pack #10: Prompt-Cache Diff Predictor
+    "modes.cache":                 "🔁 缓存差异",
+    "mode_desc.cache":             "预测 prompt 编辑是否保留了提供商的 prompt cache 还是使其失效。每个提供商的命中率 + 与无缓存的 $ 差额。",
+    "cache.title":                 "🔁 Prompt-Cache 差异预测器",
+    "cache.tip":                   "Anthropic 的 <code>cache_control</code> 缓存在标记前缀的第一个 token 差异处中断。OpenAI 自动缓存 ≥1024 token 的前缀，但任何更改都会使其失效。Gemini context cache 需要 ≥32K token。位置不当的编辑会悄悄使你的账单 10 倍——API 永远不会警告。粘贴新旧 prompt，查看每个提供商的命中率 + 成本差额。",
+    "cache.desc":                  "<strong>不要因一个字符的编辑使账单 10 倍。</strong> 粘贴你之前和当前的 prompt——预测器找到最长公共前缀，估算 token，并显示每个提供商的命中率 + 与无缓存的 $ 差额。",
+    "cache.old_label":             "旧 prompt：",
+    "cache.new_label":             "新 prompt：",
+    "cache.old.placeholder":       "你是一个有帮助的助手。…",
+    "cache.new.placeholder":       "你是一个有帮助的助手。…",
+    "cache.profile_label":         "Tokenizer 配置：",
+    "cache.profile.english":       "英语（chars/4）",
+    "cache.profile.code":          "代码（chars/3.5）",
+    "cache.profile.mixed":         "中日韩 / 西里尔（chars/2）",
+    "cache.output_label":          "估计输出 token：",
+    "cache.diff_btn":              "🔍 预测",
+    "cache.example_good_btn":      "↳ 示例：99% 命中",
+    "cache.example_broken_btn":    "↳ 示例：缓存失效",
+    "cache.example_belowmin_btn":  "↳ 示例：低于 OpenAI 最小值",
+    "cache.status.done":           "✅ {verdict} — {hit}% 理论命中",
+    "cache.verdict.identical":          "✅ 完全相同——完整命中",
+    "cache.verdict.divergent_can_cache":"⚠ 部分命中——按提供商不同",
+    "cache.verdict.divergent_below_min":"❌ 低于所有提供商最小值——无法缓存",
+    "cache.verdict.fully_divergent":    "❌ 完全不同——缓存失效",
+    "cache.verdict.empty_input":        "ℹ 空输入",
+    "cache.summary.tokens":        "公共前缀 {common} / {total} token（{pct}% 理论命中率）。",
+    "cache.summary.diff_at":       "第一个差异在第 {line} 行。",
+    "cache.col.provider":          "提供商",
+    "cache.col.hit":               "命中",
+    "cache.col.cost":              "基础 → 缓存",
+    "cache.col.savings":           "节省",
+    "cache.note.requires_marker":  "（需要 cache_control 标记）",
+    "cache.note.below_min":        "（前缀 < {min} token——提供商最小值）",
+    "cache.write_surcharge":       "+ {cost} 首次缓存写入附加费（Anthropic）",
+    "cache.diff.title":            "缓存在哪里中断",
+    "cache.diff.legend":           "绿色 = 共享前缀（可缓存）。红色 = 首次编辑（从这里开始全部重新计费）。",
+    "cache.hint.empty":            "粘贴两个 prompt，然后预测。",
+    "cache.attribution":           "参考：",
+    "cache.attribution.snapshot":  "价格快照 2026-01；在按 $ 行动前请用提供商当前文档验证。",
+    "inv.v084.cache":              "<strong>🔁 缓存差异</strong> — 预测 prompt 编辑是否使提供商的 prompt cache 失效。每个提供商的命中率 + $ 差额。",
+    "help.v084.cache.title":       "🔁 Prompt-Cache 差异预测器",
+    "help.v084.cache.body":        "每个提供商的 prompt cache 有不同规则：Anthropic 的 <code>cache_control</code> 在标记前缀的第一个 token 差异处中断；OpenAI 自动缓存 ≥1024 token 的前缀；Gemini context cache 需要 ≥32K token。位置不当的编辑会悄悄使你的账单 10 倍——API 不会警告，成本只在下张账单上出现。粘贴新旧 prompt，预测器找到最长公共前缀，用三种 tokenizer 配置（英语/代码/CJK）估算 token，并显示每个提供商的命中率 + 与无缓存的 $ 差额，包括 Claude Opus/Sonnet/Haiku、GPT-5/mini 和 Gemini 2.5 Pro。<em>用例</em>：『我调整了 system prompt 后账单暴涨——什么坏了？』→ 粘贴两个 prompt，看到底哪个提供商停止缓存。",
     "inv.v081.hub":                "<strong>🧭 Solutions Hub</strong> — 每个文档化的问题都映射到一个 tafagent 模式或精选外部工具。别重复发明 — 去找。",
     "help.v081.hub.title":         "🧭 Solutions Hub",
     "help.v081.hub.body":          "tafagent 作为集成者而非孤岛。30+ 问题跨 7 类别（评估可靠性 · 诊断 · 设置 · 训练 · 检索 · 多模态 · 可观测性），每个映射到（a）解决它的 tafagent 模式（若存在），以及（b）社区已信任的最佳外部工具（RAGAS、MTEB、HELM、MCP Schema Validator、llm-stats、llguidance、GlitchMiner 等）。搜索框匹配 pain、场景和工具名称。<em>用例</em>：'我有问题 X — tafagent 解决它吗，如果不，谁解决？'",

js/main.js CHANGED Viewed

@@ -29,6 +29,7 @@ import {
 } from "./solutions_hub.js";
 import { lintJsonCot, reorderJsonText, classifyFieldName } from "./json_cot_linter.js";
 import { lintPeftCode, ARCH_TARGET_MODULES } from "./peft_anti_pattern.js";
 // Attach HF Hub search-as-you-type to all 5 model id inputs (Profile, Recipe,
 // Unmask, Template, Quant). Hits public huggingface.co/api/models. Idempotent.
@@ -220,6 +221,7 @@ document.addEventListener("click", (e) => {
       saturation: "saturation-section",
       cot: "cot-section",
       peft: "peft-section",
       hub: "hub-section",
     }[targetMode];
     if (sectionId) {
@@ -245,7 +247,7 @@ document.querySelectorAll(".mode-btn").forEach(btn => {
      "diagnose-section", "phase-section", "unmask-section",
      "template-section", "arena-section", "contam-section",
      "quant-section", "drift-section", "niah-section",
-     "saturation-section", "cot-section", "peft-section", "hub-section"].forEach(id => {
       const el = $(id);
       if (el) el.style.display = "none";
     });
@@ -259,6 +261,7 @@ document.querySelectorAll(".mode-btn").forEach(btn => {
       saturation: "saturation-section",
       cot: "cot-section",
       peft: "peft-section",
       hub: "hub-section",
     };
     const sectionId = sectionMap[mode];
@@ -268,6 +271,7 @@ document.querySelectorAll(".mode-btn").forEach(btn => {
     if (mode === "saturation") initSaturation();
     if (mode === "cot") initCot();
     if (mode === "peft") initPeft();
     if (mode === "hub") initHub();
   });
 });
@@ -3712,6 +3716,200 @@ $("peft-example-clean-btn")?.addEventListener("click", () => {
   runPeftLint();
 });
 // ════════════════════════════════════════════════════════════════════
 // Bootstrap
 // ════════════════════════════════════════════════════════════════════

 } from "./solutions_hub.js";
 import { lintJsonCot, reorderJsonText, classifyFieldName } from "./json_cot_linter.js";
 import { lintPeftCode, ARCH_TARGET_MODULES } from "./peft_anti_pattern.js";
+import { diffPromptCache, PROVIDERS as CACHE_PROVIDERS } from "./prompt_cache_diff.js";
 // Attach HF Hub search-as-you-type to all 5 model id inputs (Profile, Recipe,
 // Unmask, Template, Quant). Hits public huggingface.co/api/models. Idempotent.
       saturation: "saturation-section",
       cot: "cot-section",
       peft: "peft-section",
+      cache: "cache-section",
       hub: "hub-section",
     }[targetMode];
     if (sectionId) {
      "diagnose-section", "phase-section", "unmask-section",
      "template-section", "arena-section", "contam-section",
      "quant-section", "drift-section", "niah-section",
+     "saturation-section", "cot-section", "peft-section", "cache-section", "hub-section"].forEach(id => {
       const el = $(id);
       if (el) el.style.display = "none";
     });
       saturation: "saturation-section",
       cot: "cot-section",
       peft: "peft-section",
+      cache: "cache-section",
       hub: "hub-section",
     };
     const sectionId = sectionMap[mode];
     if (mode === "saturation") initSaturation();
     if (mode === "cot") initCot();
     if (mode === "peft") initPeft();
+    if (mode === "cache") initCacheDiff();
     if (mode === "hub") initHub();
   });
 });
   runPeftLint();
 });
+// ════════════════════════════════════════════════════════════════════
+// 🔁 Prompt-Cache Diff Predictor (v0.8.4 anti-bullshit pack #10)
+// ════════════════════════════════════════════════════════════════════
+const CACHE_VERDICT_BG = {
+  identical:           "#3fb950",
+  divergent_can_cache: "#d29922",
+  divergent_below_min: "#f0883e",
+  fully_divergent:     "#f85149",
+  empty_input:         "#8b949e",
+};
+let __cacheInited = false;
+function initCacheDiff() {
+  if (__cacheInited) return;
+  __cacheInited = true;
+  // No-op (no async data); placeholder kept for symmetry.
+}
+function fmtUsd(n) {
+  if (n == null || isNaN(n)) return "—";
+  if (n === 0) return "$0";
+  if (n < 0.01) return `$${n.toFixed(6)}`;
+  if (n < 1)    return `$${n.toFixed(4)}`;
+  return `$${n.toFixed(2)}`;
+}
+function fmtPct(n) {
+  if (n == null || isNaN(n)) return "—";
+  return `${Math.round(n * 100)}%`;
+}
+function renderCacheProvider(p) {
+  const bgRow = p.reason === "below_min" ? "#21262d" : "#161b22";
+  const noteHtml = [];
+  if (p.requires_explicit && p.reason !== "below_min") {
+    noteHtml.push(`<span class="subtle" style="font-size:0.8em;">${t("cache.note.requires_marker") || "(requires cache_control marker)"}</span>`);
+  }
+  if (p.reason === "below_min") {
+    noteHtml.push(`<span class="subtle" style="font-size:0.8em;color:#f0883e;">${tFmt("cache.note.below_min", { min: p.min_cache_tokens.toLocaleString() }) || `(prefix < ${p.min_cache_tokens.toLocaleString()} tokens — provider min)`}</span>`);
+  }
+  const noteCell = noteHtml.length ? `<br>${noteHtml.join(" ")}` : "";
+  const ttlMin = p.cache_ttl_seconds >= 3600
+    ? `${Math.round(p.cache_ttl_seconds / 3600)}h`
+    : `${Math.round(p.cache_ttl_seconds / 60)}min`;
+  const savingsColor = p.savings_usd > 0 ? "#3fb950" : (p.reason ? "#8b949e" : "#d29922");
+  const writeRow = p.cache_write_surcharge_usd && p.cache_write_surcharge_usd > 0
+    ? `<tr style="background:${bgRow};"><td colspan="4" class="subtle" style="font-size:0.8em;padding-left:1em;">${tFmt("cache.write_surcharge", { cost: fmtUsd(p.cache_write_surcharge_usd) }) || `+ ${fmtUsd(p.cache_write_surcharge_usd)} cache-write surcharge first time (Anthropic)`}</td></tr>`
+    : "";
+  return `
+    <tr style="background:${bgRow};">
+      <td><strong>${escapeHtml(p.provider_name)}</strong>${noteCell}<br><span class="subtle" style="font-size:0.78em;">TTL ${ttlMin}</span></td>
+      <td style="text-align:right;">${fmtPct(p.hit_ratio)}</td>
+      <td style="text-align:right;">${fmtUsd(p.base_cost_usd)} → ${fmtUsd(p.cached_cost_usd)}</td>
+      <td style="text-align:right;color:${savingsColor};"><strong>${fmtUsd(p.savings_usd)}</strong> (${fmtPct(p.savings_pct ?? 0)})</td>
+    </tr>
+    ${writeRow}
+  `;
+}
+function renderCacheDiffVisualization(oldText, newText, lcpChars) {
+  // Truncate context — show last 200 chars of common prefix, and the
+  // first 200 chars of each diverging suffix. Keeps UI tight.
+  const ctxBefore = 200;
+  const startCommon = Math.max(0, lcpChars - ctxBefore);
+  const commonTail = oldText.slice(startCommon, lcpChars);
+  const oldDiv = oldText.slice(lcpChars);
+  const newDiv = newText.slice(lcpChars);
+  const commonLeader = startCommon > 0 ? "…" : "";
+  return `
+    <details style="margin-top:1em;">
+      <summary style="cursor:pointer;"><strong>${t("cache.diff.title") || "Where the cache breaks"}</strong></summary>
+      <div style="background:#0d1117;padding:0.75em;border-radius:4px;font-family:monospace;font-size:0.85em;line-height:1.4;overflow-x:auto;white-space:pre-wrap;">
+<span style="color:#3fb950;">${escapeHtml(commonLeader + commonTail)}</span><span style="color:#f85149;text-decoration:underline;">${escapeHtml(oldDiv.slice(0, 200))}</span><span class="subtle">  ← old</span>
+<span style="color:#3fb950;">${escapeHtml(commonLeader + commonTail)}</span><span style="color:#3fb950;text-decoration:underline;">${escapeHtml(newDiv.slice(0, 200))}</span><span class="subtle">  ← new</span>
+      </div>
+      <p class="subtle" style="font-size:0.82em;">${t("cache.diff.legend") || "Green = shared prefix (cacheable). Red = first edit (everything from here is re-billed)."}</p>
+    </details>
+  `;
+}
+function renderCacheResult(result, oldText, newText) {
+  const verdict = t(`cache.verdict.${result.code}`) || result.code;
+  const verdictBg = CACHE_VERDICT_BG[result.code] || "#8b949e";
+  const verdictBadge = `<span class="badge" style="background:${verdictBg};">${verdict}</span>`;
+  if (result.code === "empty_input") {
+    return `<div class="arena-result">
+      <p style="font-size:1.1em;">${verdictBadge}</p>
+      <p class="recipe-desc">${t("cache.hint.empty") || "Paste two prompts, then Predict."}</p>
+    </div>`;
+  }
+  const p = result.params;
+  const summary = `
+    <p class="recipe-desc">
+      ${tFmt("cache.summary.tokens", { common: p.tokens_common.toLocaleString(), total: p.tokens_total.toLocaleString(), pct: Math.round(p.hit_ratio * 100) })
+        || `Common prefix ${p.tokens_common.toLocaleString()} / ${p.tokens_total.toLocaleString()} tokens (${Math.round(p.hit_ratio * 100)}% theoretical hit ratio).`}
+    </p>
+    <p class="recipe-desc subtle">
+      ${tFmt("cache.summary.diff_at", { line: p.diff_point.line }) || `First difference at line ${p.diff_point.line}.`}
+    </p>
+  `;
+  const rows = (result.providers || []).map(renderCacheProvider).join("");
+  const table = rows ? `
+    <table class="lean-table" style="margin-top:1em;width:100%;">
+      <thead><tr>
+        <th style="text-align:left;">${t("cache.col.provider") || "Provider"}</th>
+        <th style="text-align:right;">${t("cache.col.hit") || "Hit"}</th>
+        <th style="text-align:right;">${t("cache.col.cost") || "Base → cached"}</th>
+        <th style="text-align:right;">${t("cache.col.savings") || "Savings"}</th>
+      </tr></thead>
+      <tbody>${rows}</tbody>
+    </table>
+  ` : "";
+  const diffViz = result.code !== "identical"
+    ? renderCacheDiffVisualization(oldText, newText, p.lcp_chars)
+    : "";
+  const attribution = `
+    <p class="recipe-desc subtle" style="font-size:0.82em;margin-top:1em;">
+      ${t("cache.attribution") || "Refs:"}
+      <a href="https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching" target="_blank" rel="noopener noreferrer">Anthropic prompt caching</a> ·
+      <a href="https://platform.openai.com/docs/guides/prompt-caching" target="_blank" rel="noopener noreferrer">OpenAI prompt caching</a> ·
+      <a href="https://ai.google.dev/gemini-api/docs/caching" target="_blank" rel="noopener noreferrer">Gemini context caching</a>
+      <br><em>${t("cache.attribution.snapshot") || "Prices snapshot 2026-01; verify against current provider docs before acting on $."}</em>
+    </p>
+  `;
+  return `<div class="arena-result">
+    <p style="font-size:1.1em;">${verdictBadge}</p>
+    ${summary}
+    ${table}
+    ${diffViz}
+    ${attribution}
+  </div>`;
+}
+function runCacheDiff() {
+  const oldText = $("cache-old")?.value || "";
+  const newText = $("cache-new")?.value || "";
+  const profile = $("cache-profile")?.value || "english";
+  const outputTokens = parseInt($("cache-output-tokens")?.value || "500", 10);
+  const result = diffPromptCache(oldText, newText, {
+    profile,
+    outputTokensEstimate: outputTokens,
+  });
+  $("cache-output").innerHTML = renderCacheResult(result, oldText, newText);
+  $("cache-status").textContent = tFmt("cache.status.done", {
+    verdict: t(`cache.verdict.${result.code}`) || result.code,
+    hit: Math.round((result.params?.hit_ratio || 0) * 100),
+  });
+}
+const CACHE_LONG_SYS = "You are a helpful, harmless, and honest assistant. " +
+  "Always cite your sources. ".repeat(40) +
+  "Always show your reasoning step by step. ".repeat(40) +
+  "Be concise. Format code with backticks. ".repeat(40) +
+  "\n\nUser tools available:\n- search\n- calculator\n- code_runner\n";
+const CACHE_EXAMPLE_GOOD_OLD = CACHE_LONG_SYS + "\nUser: What is 2 + 2?";
+const CACHE_EXAMPLE_GOOD_NEW = CACHE_LONG_SYS + "\nUser: What is 2 + 3?";
+const CACHE_EXAMPLE_BROKEN_OLD = CACHE_LONG_SYS.replace("helpful, harmless, and honest", "helpful AND honest")
+  + "\nUser: What is 2 + 2?";
+const CACHE_EXAMPLE_BROKEN_NEW = CACHE_LONG_SYS + "\nUser: What is 2 + 2?";
+const CACHE_EXAMPLE_BELOWMIN_OLD = "Q: name 3 colors";
+const CACHE_EXAMPLE_BELOWMIN_NEW = "Q: name 4 colors";
+$("cache-diff-btn")?.addEventListener("click", runCacheDiff);
+$("cache-example-good-btn")?.addEventListener("click", () => {
+  $("cache-old").value = CACHE_EXAMPLE_GOOD_OLD;
+  $("cache-new").value = CACHE_EXAMPLE_GOOD_NEW;
+  runCacheDiff();
+});
+$("cache-example-broken-btn")?.addEventListener("click", () => {
+  $("cache-old").value = CACHE_EXAMPLE_BROKEN_OLD;
+  $("cache-new").value = CACHE_EXAMPLE_BROKEN_NEW;
+  runCacheDiff();
+});
+$("cache-example-belowmin-btn")?.addEventListener("click", () => {
+  $("cache-old").value = CACHE_EXAMPLE_BELOWMIN_OLD;
+  $("cache-new").value = CACHE_EXAMPLE_BELOWMIN_NEW;
+  runCacheDiff();
+});
 // ════════════════════════════════════════════════════════════════════
 // Bootstrap
 // ════════════════════════════════════════════════════════════════════

js/prompt_cache_diff.js ADDED Viewed

	@@ -0,0 +1,308 @@

+// Prompt-Cache Diff Predictor (v0.8.4 anti-bullshit pack #10)
+//
+// Pain: small prompt edits silently invalidate provider prompt caches,
+// turning a 50% discount into a 0% discount and 10x'ing the bill.
+// Users debug this blind because:
+//   - Anthropic's `cache_control` cache breaks at the first token diff
+//     in the marked prefix (TTL 5 min default, 1 hour beta).
+//   - OpenAI auto-caches prefixes ≥1024 tokens but invalidates on any
+//     prefix change; the 50% read discount only applies on hit.
+//   - Gemini's context cache requires explicit creation, ≥32K tokens,
+//     and any prefix edit forces a new cache.
+//
+// Tool: paste old + new prompt → compute longest common prefix in
+// tokens → predict per-provider cache hit ratio + $ delta vs no-cache.
+//
+// Pure logic — no human strings; main.js does i18n. Returns
+// {code, params, providers: [{provider_id, ...}]}.
+// =============================================================================
+// Token estimation — heuristic, browser-only
+// =============================================================================
+//
+// Real tokenizers vary by ±15% between Llama / GPT / Claude / Qwen and
+// running them in-browser would mean shipping a 5-10 MB WASM blob. For a
+// cache-diff predictor the absolute count doesn't matter — what matters
+// is the RATIO of common-prefix to divergent-suffix tokens, which is
+// robust to estimator choice. The three profiles below cover 95% of
+// real prompts; users with extreme cases can paste pre-tokenized counts.
+const TOKEN_PROFILES = {
+  english: { chars_per_token: 4.0, label_key: "cache.profile.english" },
+  code:    { chars_per_token: 3.5, label_key: "cache.profile.code" },
+  mixed:   { chars_per_token: 2.0, label_key: "cache.profile.mixed" }, // CJK / Cyrillic
+};
+export function estimateTokens(text, profile = "english") {
+  if (typeof text !== "string" || !text) return 0;
+  const cpt = TOKEN_PROFILES[profile]?.chars_per_token ?? 4.0;
+  return Math.ceil(text.length / cpt);
+}
+// =============================================================================
+// Provider rules — pricing + cache mechanics
+// =============================================================================
+//
+// Prices are USD per million tokens, snapshot 2026-01 (knowledge cutoff).
+// `cache_read_multiplier` is the fraction of input price billed on a
+// cache hit (Anthropic 0.10 = 10%; OpenAI/Gemini 0.50 = 50%; etc).
+// `cache_write_multiplier` accounts for Anthropic's 25% write surcharge
+// the first time a prefix is seen.
+//
+// `min_cache_tokens` is the floor below which the provider cannot cache
+// (OpenAI auto-cache requires ≥1024; Gemini context cache ≥32K).
+// Anthropic has no min token floor but requires explicit cache_control
+// marker — we treat that as min=0 with a `requires_explicit` flag for UI.
+export const PROVIDERS = {
+  anthropic_opus: {
+    name: "Claude Opus 4.7",
+    min_cache_tokens: 0,
+    requires_explicit: true,
+    cache_ttl_seconds: 300,                 // 5 min default
+    input_per_mt:  15.00,
+    output_per_mt: 75.00,
+    cache_write_multiplier: 1.25,
+    cache_read_multiplier:  0.10,           // 10% of input
+  },
+  anthropic_sonnet: {
+    name: "Claude Sonnet 4.6",
+    min_cache_tokens: 0,
+    requires_explicit: true,
+    cache_ttl_seconds: 300,
+    input_per_mt:   3.00,
+    output_per_mt: 15.00,
+    cache_write_multiplier: 1.25,
+    cache_read_multiplier:  0.10,
+  },
+  anthropic_haiku: {
+    name: "Claude Haiku 4.5",
+    min_cache_tokens: 0,
+    requires_explicit: true,
+    cache_ttl_seconds: 300,
+    input_per_mt:   1.00,
+    output_per_mt:  5.00,
+    cache_write_multiplier: 1.25,
+    cache_read_multiplier:  0.10,
+  },
+  openai_gpt5: {
+    name: "OpenAI GPT-5",
+    min_cache_tokens: 1024,
+    requires_explicit: false,
+    cache_ttl_seconds: 600,                 // ~5-10 min observed
+    input_per_mt:   5.00,
+    output_per_mt: 15.00,
+    cache_write_multiplier: 1.00,
+    cache_read_multiplier:  0.50,           // 50% of input
+  },
+  openai_gpt5_mini: {
+    name: "OpenAI GPT-5 mini",
+    min_cache_tokens: 1024,
+    requires_explicit: false,
+    cache_ttl_seconds: 600,
+    input_per_mt:   0.30,
+    output_per_mt:  1.20,
+    cache_write_multiplier: 1.00,
+    cache_read_multiplier:  0.50,
+  },
+  gemini_25_pro: {
+    name: "Gemini 2.5 Pro",
+    min_cache_tokens: 32768,
+    requires_explicit: true,
+    cache_ttl_seconds: 3600,                // 1 hour default for context cache
+    input_per_mt:   1.25,
+    output_per_mt: 10.00,
+    cache_write_multiplier: 1.00,
+    cache_read_multiplier:  0.25,           // 25% of input
+  },
+};
+// =============================================================================
+// Longest common prefix — character-level
+// =============================================================================
+export function longestCommonPrefix(a, b) {
+  if (typeof a !== "string" || typeof b !== "string") return 0;
+  const n = Math.min(a.length, b.length);
+  let i = 0;
+  while (i < n && a.charCodeAt(i) === b.charCodeAt(i)) i++;
+  return i;
+}
+// First differing line — useful for the UI "your edit landed here" hint.
+function firstDifferingLine(a, b, prefixLen) {
+  // Walk back to the start of the line containing the diff
+  let i = prefixLen;
+  while (i > 0 && a[i - 1] !== "\n" && b[i - 1] !== "\n") i--;
+  // Count line number (1-indexed)
+  let line = 1;
+  for (let j = 0; j < i; j++) {
+    if (a[j] === "\n") line++;
+  }
+  return { offset: i, line };
+}
+// =============================================================================
+// Per-provider cache analysis
+// =============================================================================
+function analyseProvider(
+  providerId,
+  totalTokensNew,
+  commonTokens,
+  divergeTokens,
+  outputTokens,
+) {
+  const p = PROVIDERS[providerId];
+  if (!p) return null;
+  const inputPrice = p.input_per_mt / 1_000_000;
+  const outputPrice = p.output_per_mt / 1_000_000;
+  const baseCost =
+    totalTokensNew * inputPrice + outputTokens * outputPrice;
+  // Can the provider cache anything? Two failure modes:
+  //   (a) common prefix below provider's minimum cacheable size
+  //   (b) provider requires an explicit marker AND the user almost
+  //       certainly didn't include one in the paste — we still report
+  //       the best-case savings but tag the result as `requires_marker`.
+  let canCache = true;
+  let reason = null;
+  if (commonTokens < p.min_cache_tokens) {
+    canCache = false;
+    reason = "below_min";
+  }
+  if (!canCache) {
+    return {
+      provider_id: providerId,
+      provider_name: p.name,
+      base_cost_usd: baseCost,
+      cached_cost_usd: baseCost,
+      savings_usd: 0,
+      hit_ratio: 0,
+      tokens_cached: 0,
+      tokens_billed_input: totalTokensNew,
+      reason,
+      min_cache_tokens: p.min_cache_tokens,
+      requires_explicit: p.requires_explicit,
+      cache_ttl_seconds: p.cache_ttl_seconds,
+    };
+  }
+  // Cost on cache HIT for the prefix:
+  //   cache-read: commonTokens × inputPrice × cache_read_multiplier
+  //   fresh:      divergeTokens × inputPrice
+  //   output:     outputTokens × outputPrice
+  const cachedInputCost =
+    commonTokens * inputPrice * p.cache_read_multiplier +
+    divergeTokens * inputPrice;
+  const cachedCost = cachedInputCost + outputTokens * outputPrice;
+  // Cache write surcharge (Anthropic). Surfaced as `cache_write_cost`
+  // separately so users see the amortization picture.
+  const cacheWriteSurcharge =
+    commonTokens * inputPrice * (p.cache_write_multiplier - 1.0);
+  const savings = baseCost - cachedCost;
+  const hitRatio = totalTokensNew === 0 ? 0 : commonTokens / totalTokensNew;
+  return {
+    provider_id: providerId,
+    provider_name: p.name,
+    base_cost_usd: baseCost,
+    cached_cost_usd: cachedCost,
+    cache_write_surcharge_usd: cacheWriteSurcharge,
+    savings_usd: savings,
+    savings_pct: baseCost === 0 ? 0 : savings / baseCost,
+    hit_ratio: hitRatio,
+    tokens_cached: commonTokens,
+    tokens_billed_input: divergeTokens,
+    reason: null,
+    min_cache_tokens: p.min_cache_tokens,
+    requires_explicit: p.requires_explicit,
+    cache_ttl_seconds: p.cache_ttl_seconds,
+  };
+}
+// =============================================================================
+// Public entry point
+// =============================================================================
+export function diffPromptCache(
+  oldPrompt,
+  newPrompt,
+  {
+    profile = "english",
+    outputTokensEstimate = 500,
+    providers = null,
+  } = {},
+) {
+  if (typeof oldPrompt !== "string" || typeof newPrompt !== "string") {
+    return { code: "empty_input", params: {} };
+  }
+  const oldTrim = oldPrompt;
+  const newTrim = newPrompt;
+  if (!oldTrim && !newTrim) {
+    return { code: "empty_input", params: {} };
+  }
+  const lcpChars = longestCommonPrefix(oldTrim, newTrim);
+  const isIdentical = oldTrim === newTrim;
+  const totalCharsNew = newTrim.length;
+  const divergeChars = totalCharsNew - lcpChars;
+  const tokensCommon  = estimateTokens(oldTrim.slice(0, lcpChars), profile);
+  const tokensDiverge = estimateTokens(newTrim.slice(lcpChars),    profile);
+  const tokensTotal   = tokensCommon + tokensDiverge;
+  const providerIds = providers ?? Object.keys(PROVIDERS);
+  const providerResults = providerIds
+    .map(id => analyseProvider(id, tokensTotal, tokensCommon, tokensDiverge, outputTokensEstimate))
+    .filter(r => r !== null);
+  const diffPoint = isIdentical
+    ? { offset: oldTrim.length, line: oldTrim.split("\n").length }
+    : firstDifferingLine(oldTrim, newTrim, lcpChars);
+  let code;
+  if (isIdentical) {
+    code = "identical";
+  } else if (lcpChars === 0) {
+    code = "fully_divergent";
+  } else if (providerResults.every(r => r.reason === "below_min")) {
+    code = "divergent_below_min";
+  } else {
+    code = "divergent_can_cache";
+  }
+  return {
+    code,
+    params: {
+      profile,
+      lcp_chars: lcpChars,
+      diverge_chars: divergeChars,
+      tokens_common: tokensCommon,
+      tokens_diverge: tokensDiverge,
+      tokens_total: tokensTotal,
+      hit_ratio: tokensTotal === 0 ? 0 : tokensCommon / tokensTotal,
+      diff_point: diffPoint,
+      output_tokens: outputTokensEstimate,
+    },
+    providers: providerResults,
+  };
+}
+// Helper used by the UI: short summary string per provider, suitable for
+// rendering in a table row (i18n-substituted in main.js).
+export function summariseProvider(result) {
+  if (!result) return null;
+  return {
+    name: result.provider_name,
+    hit_pct: Math.round(result.hit_ratio * 100),
+    base: result.base_cost_usd,
+    cached: result.cached_cost_usd,
+    savings: result.savings_usd,
+    savings_pct: result.savings_pct ?? 0,
+    requires_explicit: result.requires_explicit,
+    reason: result.reason,
+  };
+}