karlexmarin Claude Opus 4.7 (1M context) commited on
Commit
3d389cc
·
1 Parent(s): 819758d

v0.8.4 Prompt-Cache Diff Predictor — anti-bullshit pack #10

Browse files

Provider prompt caches each have different rules:
- Anthropic `cache_control` breaks at first token diff in marked prefix
- OpenAI auto-caches prefixes ≥1024 tokens; invalidates on any change
- Gemini context cache requires ≥32K tokens

A misplaced edit silently 10x's the bill — the API never warns, and the
cost only shows up on the next invoice. No public tool predicts this.

🔁 Cache Diff (18th mode):
- Two textareas: paste old + new prompt
- Tokenizer profile selector (English / code / CJK) since shipping
a real BPE in browser would mean 5-10MB WASM. Char-per-token
heuristic is robust to estimator drift because cache savings are
a RATIO, not absolute counts.
- Output: per-provider table (Claude Opus 4.7 / Sonnet 4.6 / Haiku
4.5 / GPT-5 / GPT-5 mini / Gemini 2.5 Pro) with hit ratio,
base→cached cost, savings $ + %, TTL note, marker requirement.
- Anthropic 25% write surcharge surfaced as separate row so users
see the amortization picture, not just the steady-state savings.
- Diff visualization: green common prefix + red divergent suffix
side-by-side with first-difference line number.
- Three examples: 99% hit (small Q&A edit) / cache busted (system
prompt edit) / below OpenAI min (short prompt).

Pure logic in `js/prompt_cache_diff.js` (codes + params, no human
strings); main.js renders with i18n. 41 i18n keys × 4 langs (EN/ES/FR/
ZH) = 164 keys, parity clean. Help modal v0.8.4 entry + Inventory
anti-bullshit-pack list + "Set up an eval correctly" task tile.

Pricing snapshot 2026-01 baked in with explicit "verify against current
docs" disclaimer in the attribution footer.

Source citations:
- https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching
- https://platform.openai.com/docs/guides/prompt-caching
- https://ai.google.dev/gemini-api/docs/caching

Verified: 5/5 logic cases (identical / small edit / front edit /
below-min / empty) + cost-arithmetic sanity (Anthropic 42% savings on
2K-tok prefix, OpenAI 30%, Gemini correctly rejects below-32K) +
164/164 i18n parity + headless e2e (tab/section/3 examples, providers
visible, below-min note rendered). 19 mode tabs total.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Files changed (4) hide show
  1. index.html +46 -0
  2. js/i18n.js +172 -0
  3. js/main.js +199 -1
  4. js/prompt_cache_diff.js +308 -0
index.html CHANGED
@@ -222,6 +222,9 @@
222
  <p><strong data-i18n="help.v083.peft.title">🔧 PEFT Anti-Pattern Checker</strong></p>
223
  <p data-i18n="help.v083.peft.body">PEFT's <code>get_peft_model(base, config)</code> creates a FRESH adapter — it does not load saved weights from a path. Users who paste tutorial code and try to resume from a checkpoint silently throw away their training. peft #2115 has the canonical bug report. This linter scans your training script for the pattern + 3 related issues (QLoRA ordering, target_modules/arch mismatch, lora_alpha ratio) and reports findings with line numbers and suggested fixes. <em>Use case</em>: before you launch a 10-hour LoRA fine-tune, paste your script — catch the silent bugs in 200ms.</p>
224
 
 
 
 
225
  <p><strong data-i18n="help.v081.hub.title">🧭 Solutions Hub</strong></p>
226
  <p data-i18n="help.v081.hub.body">tafagent as integrator, not silo. 30+ pains across 7 categories (eval reliability · diagnostics · setup · training · retrieval · multimodal · observability), each mapped to (a) the tafagent mode that addresses it, if any, and (b) the best-of-breed external tools the community already trusts (RAGAS, MTEB, HELM, MCP Schema Validator, llm-stats, llguidance, GlitchMiner, etc.). Search box matches across pain, scenario, and tool name. <em>Use case</em>: 'I have problem X — does tafagent solve it, and if not, who does?'</p>
227
 
@@ -336,6 +339,7 @@
336
  <li data-i18n="inv.v08.saturation"><strong>📈 Saturation</strong> — is your benchmark still useful, or are all frontier models tied at the top?</li>
337
  <li data-i18n="inv.v082.cot"><strong>📋 JSON CoT</strong> — lints structured-output schemas for the answer-before-reasoning anti-pattern that silently breaks Chain-of-Thought.</li>
338
  <li data-i18n="inv.v083.peft"><strong>🔧 PEFT Lint</strong> — catches the silent <code>get_peft_model</code> base-load (peft #2115) + QLoRA order + target_modules / arch mismatch.</li>
 
339
  <li data-i18n="inv.v081.hub"><strong>🧭 Solutions Hub</strong> — every documented pain mapped to a tafagent mode or curated external tool. Don't reinvent — find.</li>
340
  </ul>
341
  </details>
@@ -409,6 +413,7 @@
409
  <button data-mode-link="diagnose" data-i18n="modes.diagnose">🩺 Diagnose CLI</button>
410
  <button data-mode-link="cot" data-i18n="modes.cot">📋 JSON CoT</button>
411
  <button data-mode-link="peft" data-i18n="modes.peft">🔧 PEFT Lint</button>
 
412
  </div>
413
  </div>
414
  <div class="task-tile">
@@ -467,6 +472,7 @@
467
  <button class="mode-btn" data-mode="saturation" role="tab" aria-selected="false" data-i18n="modes.saturation">📈 Saturation</button>
468
  <button class="mode-btn" data-mode="cot" role="tab" aria-selected="false" data-i18n="modes.cot">📋 JSON CoT</button>
469
  <button class="mode-btn" data-mode="peft" role="tab" aria-selected="false" data-i18n="modes.peft">🔧 PEFT Lint</button>
 
470
  <button class="mode-btn" data-mode="hub" role="tab" aria-selected="false" data-i18n="modes.hub">🧭 Solutions</button>
471
  </div>
472
  <p id="mode-desc" class="recipe-desc" data-i18n="modes.desc">
@@ -1061,6 +1067,46 @@
1061
  <div id="peft-output" style="margin-top: 1em;"></div>
1062
  </section>
1063
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1064
  <section id="hub-section" style="display:none;">
1065
  <h2><span data-i18n="hub.title">🧭 Solutions Hub</span>
1066
  <span class="info"><span class="tooltip" data-i18n="hub.tip">
 
222
  <p><strong data-i18n="help.v083.peft.title">🔧 PEFT Anti-Pattern Checker</strong></p>
223
  <p data-i18n="help.v083.peft.body">PEFT's <code>get_peft_model(base, config)</code> creates a FRESH adapter — it does not load saved weights from a path. Users who paste tutorial code and try to resume from a checkpoint silently throw away their training. peft #2115 has the canonical bug report. This linter scans your training script for the pattern + 3 related issues (QLoRA ordering, target_modules/arch mismatch, lora_alpha ratio) and reports findings with line numbers and suggested fixes. <em>Use case</em>: before you launch a 10-hour LoRA fine-tune, paste your script — catch the silent bugs in 200ms.</p>
224
 
225
+ <p><strong data-i18n="help.v084.cache.title">🔁 Prompt-Cache Diff Predictor</strong></p>
226
+ <p data-i18n="help.v084.cache.body">Provider prompt caches each have different rules: Anthropic's <code>cache_control</code> breaks at the first token diff in the marked prefix; OpenAI auto-caches prefixes ≥1024 tokens; Gemini context caches require ≥32K tokens. A misplaced edit silently 10x's your bill — the API never warns you, and the cost only shows up on the next invoice. Paste old + new prompt, the predictor finds the longest common prefix, estimates tokens with three tokenizer profiles (English / code / CJK), and shows per-provider hit ratio + $ delta vs no-cache for Claude Opus/Sonnet/Haiku, GPT-5/mini, and Gemini 2.5 Pro. <em>Use case</em>: 'I tweaked the system prompt and the bill jumped — what broke?' → paste both prompts, see exactly which provider stopped caching.</p>
227
+
228
  <p><strong data-i18n="help.v081.hub.title">🧭 Solutions Hub</strong></p>
229
  <p data-i18n="help.v081.hub.body">tafagent as integrator, not silo. 30+ pains across 7 categories (eval reliability · diagnostics · setup · training · retrieval · multimodal · observability), each mapped to (a) the tafagent mode that addresses it, if any, and (b) the best-of-breed external tools the community already trusts (RAGAS, MTEB, HELM, MCP Schema Validator, llm-stats, llguidance, GlitchMiner, etc.). Search box matches across pain, scenario, and tool name. <em>Use case</em>: 'I have problem X — does tafagent solve it, and if not, who does?'</p>
230
 
 
339
  <li data-i18n="inv.v08.saturation"><strong>📈 Saturation</strong> — is your benchmark still useful, or are all frontier models tied at the top?</li>
340
  <li data-i18n="inv.v082.cot"><strong>📋 JSON CoT</strong> — lints structured-output schemas for the answer-before-reasoning anti-pattern that silently breaks Chain-of-Thought.</li>
341
  <li data-i18n="inv.v083.peft"><strong>🔧 PEFT Lint</strong> — catches the silent <code>get_peft_model</code> base-load (peft #2115) + QLoRA order + target_modules / arch mismatch.</li>
342
+ <li data-i18n="inv.v084.cache"><strong>🔁 Cache Diff</strong> — predicts whether a prompt edit invalidated the provider's prompt cache. Per-provider hit ratio + $ delta.</li>
343
  <li data-i18n="inv.v081.hub"><strong>🧭 Solutions Hub</strong> — every documented pain mapped to a tafagent mode or curated external tool. Don't reinvent — find.</li>
344
  </ul>
345
  </details>
 
413
  <button data-mode-link="diagnose" data-i18n="modes.diagnose">🩺 Diagnose CLI</button>
414
  <button data-mode-link="cot" data-i18n="modes.cot">📋 JSON CoT</button>
415
  <button data-mode-link="peft" data-i18n="modes.peft">🔧 PEFT Lint</button>
416
+ <button data-mode-link="cache" data-i18n="modes.cache">🔁 Cache Diff</button>
417
  </div>
418
  </div>
419
  <div class="task-tile">
 
472
  <button class="mode-btn" data-mode="saturation" role="tab" aria-selected="false" data-i18n="modes.saturation">📈 Saturation</button>
473
  <button class="mode-btn" data-mode="cot" role="tab" aria-selected="false" data-i18n="modes.cot">📋 JSON CoT</button>
474
  <button class="mode-btn" data-mode="peft" role="tab" aria-selected="false" data-i18n="modes.peft">🔧 PEFT Lint</button>
475
+ <button class="mode-btn" data-mode="cache" role="tab" aria-selected="false" data-i18n="modes.cache">🔁 Cache Diff</button>
476
  <button class="mode-btn" data-mode="hub" role="tab" aria-selected="false" data-i18n="modes.hub">🧭 Solutions</button>
477
  </div>
478
  <p id="mode-desc" class="recipe-desc" data-i18n="modes.desc">
 
1067
  <div id="peft-output" style="margin-top: 1em;"></div>
1068
  </section>
1069
 
1070
+ <!-- Prompt-Cache Diff Predictor (mode=cache, v0.8.4 anti-bullshit pack #10) -->
1071
+ <section id="cache-section" style="display:none;">
1072
+ <h2><span data-i18n="cache.title">🔁 Prompt-Cache Diff Predictor</span>
1073
+ <span class="info"><span class="tooltip" data-i18n="cache.tip">
1074
+ <strong>Why this matters</strong>: Anthropic's `cache_control` cache breaks at the first token diff in the marked prefix. OpenAI auto-caches prefixes ≥1024 tokens but invalidates on any change. Gemini context cache requires ≥32K tokens. A misplaced edit silently 10x's your bill — and the API never warns you. Paste old + new prompt, see per-provider hit ratio + cost delta.
1075
+ </span></span>
1076
+ </h2>
1077
+ <p class="recipe-desc" data-i18n="cache.desc">
1078
+ <strong>Don't 10x your bill on a one-character edit.</strong> Paste your previous and current prompt — the predictor finds the longest common prefix, estimates tokens, and shows per-provider cache hit ratio + $ delta vs no-cache.
1079
+ </p>
1080
+ <div class="form-row" style="display:flex; gap:1em; flex-wrap:wrap;">
1081
+ <div style="flex:1; min-width:300px;">
1082
+ <label for="cache-old" data-i18n="cache.old_label">Old prompt:</label>
1083
+ <textarea id="cache-old" rows="10" style="width:100%;font-family:monospace;font-size:0.85em;" data-i18n-placeholder="cache.old.placeholder" placeholder="You are a helpful assistant. …"></textarea>
1084
+ </div>
1085
+ <div style="flex:1; min-width:300px;">
1086
+ <label for="cache-new" data-i18n="cache.new_label">New prompt:</label>
1087
+ <textarea id="cache-new" rows="10" style="width:100%;font-family:monospace;font-size:0.85em;" data-i18n-placeholder="cache.new.placeholder" placeholder="You are a helpful assistant. …"></textarea>
1088
+ </div>
1089
+ </div>
1090
+ <div class="form-row">
1091
+ <label for="cache-profile" data-i18n="cache.profile_label">Tokenizer profile:</label>
1092
+ <select id="cache-profile">
1093
+ <option value="english" data-i18n="cache.profile.english">English (chars/4)</option>
1094
+ <option value="code" data-i18n="cache.profile.code">Code (chars/3.5)</option>
1095
+ <option value="mixed" data-i18n="cache.profile.mixed">CJK / Cyrillic (chars/2)</option>
1096
+ </select>
1097
+ <label for="cache-output-tokens" data-i18n="cache.output_label">Estimated output tokens:</label>
1098
+ <input type="number" id="cache-output-tokens" value="500" min="0" max="100000" style="width:8em;" />
1099
+ </div>
1100
+ <div class="form-row">
1101
+ <button type="button" id="cache-diff-btn" data-i18n="cache.diff_btn">🔍 Predict</button>
1102
+ <button type="button" id="cache-example-good-btn" class="secondary" data-i18n="cache.example_good_btn">↳ Example: 99% hit</button>
1103
+ <button type="button" id="cache-example-broken-btn" class="secondary" data-i18n="cache.example_broken_btn">↳ Example: cache busted</button>
1104
+ <button type="button" id="cache-example-belowmin-btn" class="secondary" data-i18n="cache.example_belowmin_btn">↳ Example: below OpenAI min</button>
1105
+ </div>
1106
+ <p id="cache-status" class="recipe-desc" style="font-size:0.92em;"></p>
1107
+ <div id="cache-output" style="margin-top: 1em;"></div>
1108
+ </section>
1109
+
1110
  <section id="hub-section" style="display:none;">
1111
  <h2><span data-i18n="hub.title">🧭 Solutions Hub</span>
1112
  <span class="info"><span class="tooltip" data-i18n="hub.tip">
js/i18n.js CHANGED
@@ -594,6 +594,49 @@ export const TRANSLATIONS = {
594
  "help.v083.peft.title": "🔧 PEFT Anti-Pattern Checker",
595
  "help.v083.peft.body": "PEFT's <code>get_peft_model(base, config)</code> creates a FRESH adapter — it does not load saved weights from a path. Users who paste tutorial code and try to resume from a checkpoint silently throw away their training. peft #2115 has the canonical bug report. This linter scans your training script for the pattern + 3 related issues (QLoRA ordering, target_modules/arch mismatch, lora_alpha ratio) and reports findings with line numbers and suggested fixes. <em>Use case</em>: before you launch a 10-hour LoRA fine-tune, paste your script — catch the silent bugs in 200ms.",
596
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
597
  "inv.v081.hub": "<strong>🧭 Solutions Hub</strong> — every documented pain mapped to a tafagent mode or curated external tool. Don't reinvent — find.",
598
  "help.v081.hub.title": "🧭 Solutions Hub",
599
  "help.v081.hub.body": "tafagent as integrator, not silo. 30+ pains across 7 categories (eval reliability · diagnostics · setup · training · retrieval · multimodal · observability), each mapped to (a) the tafagent mode that addresses it, if any, and (b) the best-of-breed external tools the community already trusts (RAGAS, MTEB, HELM, MCP Schema Validator, llm-stats, llguidance, GlitchMiner, etc.). Search box matches across pain, scenario, and tool name. <em>Use case</em>: 'I have problem X — does tafagent solve it, and if not, who does?'",
@@ -1647,6 +1690,49 @@ export const TRANSLATIONS = {
1647
  "help.v083.peft.title": "🔧 Verificador de anti-patrones PEFT",
1648
  "help.v083.peft.body": "El <code>get_peft_model(base, config)</code> de PEFT crea un adapter NUEVO — no carga pesos guardados desde una ruta. Quien pega código de tutorial e intenta reanudar desde un checkpoint tira silenciosamente su entrenamiento. peft #2115 tiene el bug report canónico. Este linter escanea tu script buscando el patrón + 3 issues relacionados (orden QLoRA, mismatch target_modules/arch, ratio lora_alpha) y reporta hallazgos con números de línea y sugerencias. <em>Caso de uso</em>: antes de lanzar un fine-tune LoRA de 10 horas, pega tu script — atrapa los bugs silenciosos en 200ms.",
1649
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1650
  "inv.v081.hub": "<strong>🧭 Solutions Hub</strong> — cada pain documentado mapeado a un mode tafagent o herramienta externa curada. No reinventes — encuentra.",
1651
  "help.v081.hub.title": "🧭 Solutions Hub",
1652
  "help.v081.hub.body": "tafagent como integrador, no silo. 30+ pains en 7 categorías (eval reliability · diagnósticos · setup · training · retrieval · multimodal · observability), cada uno mapeado a (a) el mode tafagent que lo resuelve, si existe, y (b) las herramientas externas best-of-breed que la comunidad ya usa (RAGAS, MTEB, HELM, MCP Schema Validator, llm-stats, llguidance, GlitchMiner, etc.). Caja de búsqueda matchea pain, scenario, y nombre de herramienta. <em>Caso de uso</em>: 'tengo problema X — ¿lo resuelve tafagent, y si no, quién?'",
@@ -2564,6 +2650,49 @@ export const TRANSLATIONS = {
2564
  "help.v083.peft.title": "🔧 Vérificateur d'anti-patterns PEFT",
2565
  "help.v083.peft.body": "Le <code>get_peft_model(base, config)</code> de PEFT crée un NOUVEL adaptateur — il ne charge pas les poids sauvegardés depuis un chemin. Quiconque colle du code de tuto et essaie de reprendre depuis un checkpoint jette silencieusement son entraînement. peft #2115 contient le bug report canonique. Ce linter scanne votre script à la recherche du pattern + 3 problèmes liés (ordre QLoRA, mismatch target_modules/arch, ratio lora_alpha) et rapporte les découvertes avec numéros de ligne et corrections suggérées. <em>Cas d'usage</em> : avant de lancer un fine-tune LoRA de 10 heures, collez votre script — attrapez les bugs silencieux en 200ms.",
2566
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2567
  "inv.v081.hub": "<strong>🧭 Solutions Hub</strong> — chaque pain documenté mappé à un mode tafagent ou outil externe curé. Ne réinventez pas — trouvez.",
2568
  "help.v081.hub.title": "🧭 Solutions Hub",
2569
  "help.v081.hub.body": "tafagent comme intégrateur, pas silo. 30+ pains à travers 7 catégories (eval reliability · diagnostics · setup · training · retrieval · multimodal · observability), chacun mappé à (a) le mode tafagent qui le résout, s'il existe, et (b) les outils externes best-of-breed que la communauté utilise déjà (RAGAS, MTEB, HELM, MCP Schema Validator, llm-stats, llguidance, GlitchMiner, etc.). La barre de recherche matche pain, scénario, et nom d'outil. <em>Cas d'usage</em> : 'j'ai le problème X — tafagent le résout-il, et sinon, qui ?'",
@@ -3481,6 +3610,49 @@ export const TRANSLATIONS = {
3481
  "help.v083.peft.title": "🔧 PEFT 反模式检查器",
3482
  "help.v083.peft.body": "PEFT 的 <code>get_peft_model(base, config)</code> 创建一个新的 adapter——它不从路径加载已保存的权重。粘贴教程代码并尝试从 checkpoint 恢复的人会静默地丢掉训练。peft #2115 是规范的 bug 报告。这个 linter 扫描你的脚本查找该模式 + 3 个相关问题(QLoRA 顺序、target_modules/架构不匹配、lora_alpha 比率),并报告带行号和建议修复的发现。<em>用例</em>:在启动 10 小时的 LoRA fine-tune 之前,粘贴你的脚本——在 200ms 内捕获静默 bug。",
3483
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3484
  "inv.v081.hub": "<strong>🧭 Solutions Hub</strong> — 每个文档化的问题都映射到一个 tafagent 模式或精选外部工具。别重复发明 — 去找。",
3485
  "help.v081.hub.title": "🧭 Solutions Hub",
3486
  "help.v081.hub.body": "tafagent 作为集成者而非孤岛。30+ 问题跨 7 类别(评估可靠性 · 诊断 · 设置 · 训练 · 检索 · 多模态 · 可观测性),每个映射到(a)解决它的 tafagent 模式(若存在),以及(b)社区已信任的最佳外部工具(RAGAS、MTEB、HELM、MCP Schema Validator、llm-stats、llguidance、GlitchMiner 等)。搜索框匹配 pain、场景和工具名称。<em>用例</em>:'我有问题 X — tafagent 解决它吗,如果不,谁解决?'",
 
594
  "help.v083.peft.title": "🔧 PEFT Anti-Pattern Checker",
595
  "help.v083.peft.body": "PEFT's <code>get_peft_model(base, config)</code> creates a FRESH adapter — it does not load saved weights from a path. Users who paste tutorial code and try to resume from a checkpoint silently throw away their training. peft #2115 has the canonical bug report. This linter scans your training script for the pattern + 3 related issues (QLoRA ordering, target_modules/arch mismatch, lora_alpha ratio) and reports findings with line numbers and suggested fixes. <em>Use case</em>: before you launch a 10-hour LoRA fine-tune, paste your script — catch the silent bugs in 200ms.",
596
 
597
+ // v0.8.4 — anti-bullshit pack #10: Prompt-Cache Diff Predictor
598
+ "modes.cache": "🔁 Cache Diff",
599
+ "mode_desc.cache": "Predicts whether a prompt edit kept the provider's prompt cache alive or invalidated it. Per-provider hit ratio + $ delta vs no-cache.",
600
+ "cache.title": "🔁 Prompt-Cache Diff Predictor",
601
+ "cache.tip": "Anthropic's <code>cache_control</code> cache breaks at the first token diff in the marked prefix. OpenAI auto-caches prefixes ≥1024 tokens but invalidates on any change. Gemini context cache requires ≥32K tokens. A misplaced edit silently 10x's your bill — and the API never warns you. Paste old + new prompt, see per-provider hit ratio + cost delta.",
602
+ "cache.desc": "<strong>Don't 10x your bill on a one-character edit.</strong> Paste your previous and current prompt — the predictor finds the longest common prefix, estimates tokens, and shows per-provider cache hit ratio + $ delta vs no-cache.",
603
+ "cache.old_label": "Old prompt:",
604
+ "cache.new_label": "New prompt:",
605
+ "cache.old.placeholder": "You are a helpful assistant. …",
606
+ "cache.new.placeholder": "You are a helpful assistant. …",
607
+ "cache.profile_label": "Tokenizer profile:",
608
+ "cache.profile.english": "English (chars/4)",
609
+ "cache.profile.code": "Code (chars/3.5)",
610
+ "cache.profile.mixed": "CJK / Cyrillic (chars/2)",
611
+ "cache.output_label": "Estimated output tokens:",
612
+ "cache.diff_btn": "🔍 Predict",
613
+ "cache.example_good_btn": "↳ Example: 99% hit",
614
+ "cache.example_broken_btn": "↳ Example: cache busted",
615
+ "cache.example_belowmin_btn": "↳ Example: below OpenAI min",
616
+ "cache.status.done": "✅ {verdict} — {hit}% theoretical hit",
617
+ "cache.verdict.identical": "✅ Identical — full cache hit",
618
+ "cache.verdict.divergent_can_cache":"⚠ Partial cache hit — providers vary",
619
+ "cache.verdict.divergent_below_min":"❌ Below all provider minimums — no caching possible",
620
+ "cache.verdict.fully_divergent": "❌ Fully divergent — cache invalidated",
621
+ "cache.verdict.empty_input": "ℹ Empty input",
622
+ "cache.summary.tokens": "Common prefix {common} / {total} tokens ({pct}% theoretical hit ratio).",
623
+ "cache.summary.diff_at": "First difference at line {line}.",
624
+ "cache.col.provider": "Provider",
625
+ "cache.col.hit": "Hit",
626
+ "cache.col.cost": "Base → cached",
627
+ "cache.col.savings": "Savings",
628
+ "cache.note.requires_marker": "(requires cache_control marker)",
629
+ "cache.note.below_min": "(prefix < {min} tokens — provider min)",
630
+ "cache.write_surcharge": "+ {cost} cache-write surcharge first time (Anthropic)",
631
+ "cache.diff.title": "Where the cache breaks",
632
+ "cache.diff.legend": "Green = shared prefix (cacheable). Red = first edit (everything from here is re-billed).",
633
+ "cache.hint.empty": "Paste two prompts, then Predict.",
634
+ "cache.attribution": "Refs:",
635
+ "cache.attribution.snapshot": "Prices snapshot 2026-01; verify against current provider docs before acting on $.",
636
+ "inv.v084.cache": "<strong>🔁 Cache Diff</strong> — predicts whether a prompt edit invalidated the provider's prompt cache. Per-provider hit ratio + $ delta.",
637
+ "help.v084.cache.title": "🔁 Prompt-Cache Diff Predictor",
638
+ "help.v084.cache.body": "Provider prompt caches each have different rules: Anthropic's <code>cache_control</code> breaks at the first token diff in the marked prefix; OpenAI auto-caches prefixes ≥1024 tokens; Gemini context caches require ≥32K tokens. A misplaced edit silently 10x's your bill — the API never warns you, and the cost only shows up on the next invoice. Paste old + new prompt, the predictor finds the longest common prefix, estimates tokens with three tokenizer profiles (English / code / CJK), and shows per-provider hit ratio + $ delta vs no-cache for Claude Opus/Sonnet/Haiku, GPT-5/mini, and Gemini 2.5 Pro. <em>Use case</em>: 'I tweaked the system prompt and the bill jumped — what broke?' → paste both prompts, see exactly which provider stopped caching.",
639
+
640
  "inv.v081.hub": "<strong>🧭 Solutions Hub</strong> — every documented pain mapped to a tafagent mode or curated external tool. Don't reinvent — find.",
641
  "help.v081.hub.title": "🧭 Solutions Hub",
642
  "help.v081.hub.body": "tafagent as integrator, not silo. 30+ pains across 7 categories (eval reliability · diagnostics · setup · training · retrieval · multimodal · observability), each mapped to (a) the tafagent mode that addresses it, if any, and (b) the best-of-breed external tools the community already trusts (RAGAS, MTEB, HELM, MCP Schema Validator, llm-stats, llguidance, GlitchMiner, etc.). Search box matches across pain, scenario, and tool name. <em>Use case</em>: 'I have problem X — does tafagent solve it, and if not, who does?'",
 
1690
  "help.v083.peft.title": "🔧 Verificador de anti-patrones PEFT",
1691
  "help.v083.peft.body": "El <code>get_peft_model(base, config)</code> de PEFT crea un adapter NUEVO — no carga pesos guardados desde una ruta. Quien pega código de tutorial e intenta reanudar desde un checkpoint tira silenciosamente su entrenamiento. peft #2115 tiene el bug report canónico. Este linter escanea tu script buscando el patrón + 3 issues relacionados (orden QLoRA, mismatch target_modules/arch, ratio lora_alpha) y reporta hallazgos con números de línea y sugerencias. <em>Caso de uso</em>: antes de lanzar un fine-tune LoRA de 10 horas, pega tu script — atrapa los bugs silenciosos en 200ms.",
1692
 
1693
+ // v0.8.4 — anti-bullshit pack #10: Prompt-Cache Diff Predictor
1694
+ "modes.cache": "🔁 Cache Diff",
1695
+ "mode_desc.cache": "Predice si una edición del prompt mantuvo viva la prompt cache del proveedor o la invalidó. Hit ratio por proveedor + delta $ vs sin caché.",
1696
+ "cache.title": "🔁 Predictor de Diff de Prompt-Cache",
1697
+ "cache.tip": "El <code>cache_control</code> de Anthropic se rompe al primer token diferente del prefijo marcado. OpenAI auto-cachea prefijos ≥1024 tokens pero invalida ante cualquier cambio. La context cache de Gemini requiere ≥32K tokens. Una edición mal puesta silenciosamente 10x tu factura — y la API nunca avisa. Pega prompt viejo + nuevo, ve el hit ratio por proveedor + delta de coste.",
1698
+ "cache.desc": "<strong>No 10x tu factura por un edit de un carácter.</strong> Pega tu prompt anterior y el actual — el predictor halla el prefijo común más largo, estima tokens, y muestra hit ratio por proveedor + delta $ vs sin caché.",
1699
+ "cache.old_label": "Prompt viejo:",
1700
+ "cache.new_label": "Prompt nuevo:",
1701
+ "cache.old.placeholder": "Eres un asistente útil. …",
1702
+ "cache.new.placeholder": "Eres un asistente útil. …",
1703
+ "cache.profile_label": "Perfil de tokenizer:",
1704
+ "cache.profile.english": "Inglés (chars/4)",
1705
+ "cache.profile.code": "Código (chars/3.5)",
1706
+ "cache.profile.mixed": "CJK / Cirílico (chars/2)",
1707
+ "cache.output_label": "Tokens de salida estimados:",
1708
+ "cache.diff_btn": "🔍 Predecir",
1709
+ "cache.example_good_btn": "↳ Ejemplo: hit 99%",
1710
+ "cache.example_broken_btn": "↳ Ejemplo: caché rota",
1711
+ "cache.example_belowmin_btn": "↳ Ejemplo: bajo mínimo OpenAI",
1712
+ "cache.status.done": "✅ {verdict} — {hit}% hit teórico",
1713
+ "cache.verdict.identical": "✅ Idénticos — hit completo",
1714
+ "cache.verdict.divergent_can_cache":"⚠ Hit parcial — varía por proveedor",
1715
+ "cache.verdict.divergent_below_min":"❌ Por debajo de mínimos — no hay caché posible",
1716
+ "cache.verdict.fully_divergent": "❌ Totalmente divergentes — caché invalidada",
1717
+ "cache.verdict.empty_input": "ℹ Entrada vacía",
1718
+ "cache.summary.tokens": "Prefijo común {common} / {total} tokens ({pct}% hit ratio teórico).",
1719
+ "cache.summary.diff_at": "Primera diferencia en la línea {line}.",
1720
+ "cache.col.provider": "Proveedor",
1721
+ "cache.col.hit": "Hit",
1722
+ "cache.col.cost": "Base → cached",
1723
+ "cache.col.savings": "Ahorro",
1724
+ "cache.note.requires_marker": "(requiere marcador cache_control)",
1725
+ "cache.note.below_min": "(prefijo < {min} tokens — mínimo del proveedor)",
1726
+ "cache.write_surcharge": "+ {cost} sobrecargo de cache-write la primera vez (Anthropic)",
1727
+ "cache.diff.title": "Dónde se rompe la caché",
1728
+ "cache.diff.legend": "Verde = prefijo compartido (cacheable). Rojo = primera edición (todo desde aquí se re-factura).",
1729
+ "cache.hint.empty": "Pega dos prompts, luego Predecir.",
1730
+ "cache.attribution": "Referencias:",
1731
+ "cache.attribution.snapshot": "Precios snapshot 2026-01; verifica con la doc actual del proveedor antes de actuar sobre $.",
1732
+ "inv.v084.cache": "<strong>🔁 Cache Diff</strong> — predice si un edit del prompt invalidó la prompt cache del proveedor. Hit ratio por proveedor + delta $.",
1733
+ "help.v084.cache.title": "🔁 Predictor de Diff de Prompt-Cache",
1734
+ "help.v084.cache.body": "Las prompt caches de cada proveedor tienen reglas distintas: el <code>cache_control</code> de Anthropic se rompe al primer token diferente del prefijo marcado; OpenAI auto-cachea prefijos ≥1024 tokens; las context caches de Gemini requieren ≥32K tokens. Una edición mal puesta silenciosamente 10x tu factura — la API no avisa, y el coste solo aparece en la siguiente factura. Pega prompt viejo + nuevo, el predictor halla el prefijo común más largo, estima tokens con tres perfiles de tokenizer (inglés / código / CJK), y muestra hit ratio por proveedor + delta $ vs sin caché para Claude Opus/Sonnet/Haiku, GPT-5/mini, y Gemini 2.5 Pro. <em>Caso de uso</em>: 'Tweaké el system prompt y la factura saltó — ¿qué se rompió?' → pega ambos prompts, ve exactamente qué proveedor dejó de cachear.",
1735
+
1736
  "inv.v081.hub": "<strong>🧭 Solutions Hub</strong> — cada pain documentado mapeado a un mode tafagent o herramienta externa curada. No reinventes — encuentra.",
1737
  "help.v081.hub.title": "🧭 Solutions Hub",
1738
  "help.v081.hub.body": "tafagent como integrador, no silo. 30+ pains en 7 categorías (eval reliability · diagnósticos · setup · training · retrieval · multimodal · observability), cada uno mapeado a (a) el mode tafagent que lo resuelve, si existe, y (b) las herramientas externas best-of-breed que la comunidad ya usa (RAGAS, MTEB, HELM, MCP Schema Validator, llm-stats, llguidance, GlitchMiner, etc.). Caja de búsqueda matchea pain, scenario, y nombre de herramienta. <em>Caso de uso</em>: 'tengo problema X — ¿lo resuelve tafagent, y si no, quién?'",
 
2650
  "help.v083.peft.title": "🔧 Vérificateur d'anti-patterns PEFT",
2651
  "help.v083.peft.body": "Le <code>get_peft_model(base, config)</code> de PEFT crée un NOUVEL adaptateur — il ne charge pas les poids sauvegardés depuis un chemin. Quiconque colle du code de tuto et essaie de reprendre depuis un checkpoint jette silencieusement son entraînement. peft #2115 contient le bug report canonique. Ce linter scanne votre script à la recherche du pattern + 3 problèmes liés (ordre QLoRA, mismatch target_modules/arch, ratio lora_alpha) et rapporte les découvertes avec numéros de ligne et corrections suggérées. <em>Cas d'usage</em> : avant de lancer un fine-tune LoRA de 10 heures, collez votre script — attrapez les bugs silencieux en 200ms.",
2652
 
2653
+ // v0.8.4 — anti-bullshit pack #10: Prompt-Cache Diff Predictor
2654
+ "modes.cache": "🔁 Cache Diff",
2655
+ "mode_desc.cache": "Prédit si une édition du prompt a gardé le cache prompt du fournisseur vivant ou l'a invalidé. Taux de hit par fournisseur + delta $ vs sans cache.",
2656
+ "cache.title": "🔁 Prédicteur de Diff Prompt-Cache",
2657
+ "cache.tip": "Le <code>cache_control</code> d'Anthropic casse au premier token différent du préfixe marqué. OpenAI auto-cache les préfixes ≥1024 tokens mais invalide à tout changement. Le context cache Gemini requiert ≥32K tokens. Une édition mal placée 10x silencieusement votre facture — et l'API ne prévient jamais. Collez ancien + nouveau prompt, voyez le taux de hit par fournisseur + delta de coût.",
2658
+ "cache.desc": "<strong>Ne 10x pas votre facture sur une édition d'un caractère.</strong> Collez votre prompt précédent et actuel — le prédicteur trouve le plus long préfixe commun, estime les tokens, et montre le taux de hit par fournisseur + delta $ vs sans cache.",
2659
+ "cache.old_label": "Ancien prompt :",
2660
+ "cache.new_label": "Nouveau prompt :",
2661
+ "cache.old.placeholder": "Vous êtes un assistant utile. …",
2662
+ "cache.new.placeholder": "Vous êtes un assistant utile. …",
2663
+ "cache.profile_label": "Profil de tokenizer :",
2664
+ "cache.profile.english": "Anglais (chars/4)",
2665
+ "cache.profile.code": "Code (chars/3.5)",
2666
+ "cache.profile.mixed": "CJK / Cyrillique (chars/2)",
2667
+ "cache.output_label": "Tokens de sortie estimés :",
2668
+ "cache.diff_btn": "🔍 Prédire",
2669
+ "cache.example_good_btn": "↳ Exemple : 99% hit",
2670
+ "cache.example_broken_btn": "↳ Exemple : cache cassé",
2671
+ "cache.example_belowmin_btn": "↳ Exemple : sous le minimum OpenAI",
2672
+ "cache.status.done": "✅ {verdict} — {hit}% hit théorique",
2673
+ "cache.verdict.identical": "✅ Identiques — hit complet",
2674
+ "cache.verdict.divergent_can_cache":"⚠ Hit partiel — varie selon fournisseur",
2675
+ "cache.verdict.divergent_below_min":"❌ En dessous des minimums — pas de cache possible",
2676
+ "cache.verdict.fully_divergent": "❌ Totalement divergents — cache invalidé",
2677
+ "cache.verdict.empty_input": "ℹ Entrée vide",
2678
+ "cache.summary.tokens": "Préfixe commun {common} / {total} tokens (taux de hit théorique {pct}%).",
2679
+ "cache.summary.diff_at": "Première différence à la ligne {line}.",
2680
+ "cache.col.provider": "Fournisseur",
2681
+ "cache.col.hit": "Hit",
2682
+ "cache.col.cost": "Base → cached",
2683
+ "cache.col.savings": "Économies",
2684
+ "cache.note.requires_marker": "(nécessite le marqueur cache_control)",
2685
+ "cache.note.below_min": "(préfixe < {min} tokens — min du fournisseur)",
2686
+ "cache.write_surcharge": "+ {cost} surcharge cache-write la première fois (Anthropic)",
2687
+ "cache.diff.title": "Où le cache casse",
2688
+ "cache.diff.legend": "Vert = préfixe partagé (cacheable). Rouge = première édition (tout à partir d'ici est re-facturé).",
2689
+ "cache.hint.empty": "Collez deux prompts, puis Prédire.",
2690
+ "cache.attribution": "Réfs :",
2691
+ "cache.attribution.snapshot": "Prix snapshot 2026-01 ; vérifiez avec la doc actuelle du fournisseur avant d'agir sur $.",
2692
+ "inv.v084.cache": "<strong>🔁 Cache Diff</strong> — prédit si une édition du prompt a invalidé le cache prompt du fournisseur. Taux de hit par fournisseur + delta $.",
2693
+ "help.v084.cache.title": "🔁 Prédicteur de Diff Prompt-Cache",
2694
+ "help.v084.cache.body": "Les caches prompt de chaque fournisseur ont des règles différentes : le <code>cache_control</code> d'Anthropic casse au premier token différent du préfixe marqué ; OpenAI auto-cache les préfixes ≥1024 tokens ; les context caches Gemini requièrent ≥32K tokens. Une édition mal placée 10x silencieusement votre facture — l'API ne prévient pas, et le coût n'apparaît qu'à la facture suivante. Collez ancien + nouveau prompt, le prédicteur trouve le plus long préfixe commun, estime les tokens avec trois profils de tokenizer (anglais / code / CJK), et montre le taux de hit par fournisseur + delta $ vs sans cache pour Claude Opus/Sonnet/Haiku, GPT-5/mini, et Gemini 2.5 Pro. <em>Cas d'usage</em> : 'J'ai modifié le system prompt et la facture a sauté — qu'est-ce qui a cassé ?' → collez les deux prompts, voyez exactement quel fournisseur a arrêté de cacher.",
2695
+
2696
  "inv.v081.hub": "<strong>🧭 Solutions Hub</strong> — chaque pain documenté mappé à un mode tafagent ou outil externe curé. Ne réinventez pas — trouvez.",
2697
  "help.v081.hub.title": "🧭 Solutions Hub",
2698
  "help.v081.hub.body": "tafagent comme intégrateur, pas silo. 30+ pains à travers 7 catégories (eval reliability · diagnostics · setup · training · retrieval · multimodal · observability), chacun mappé à (a) le mode tafagent qui le résout, s'il existe, et (b) les outils externes best-of-breed que la communauté utilise déjà (RAGAS, MTEB, HELM, MCP Schema Validator, llm-stats, llguidance, GlitchMiner, etc.). La barre de recherche matche pain, scénario, et nom d'outil. <em>Cas d'usage</em> : 'j'ai le problème X — tafagent le résout-il, et sinon, qui ?'",
 
3610
  "help.v083.peft.title": "🔧 PEFT 反模式检查器",
3611
  "help.v083.peft.body": "PEFT 的 <code>get_peft_model(base, config)</code> 创建一个新的 adapter——它不从路径加载已保存的权重。粘贴教程代码并尝试从 checkpoint 恢复的人会静默地丢掉训练。peft #2115 是规范的 bug 报告。这个 linter 扫描你的脚本查找该模式 + 3 个相关问题(QLoRA 顺序、target_modules/架构不匹配、lora_alpha 比率),并报告带行号和建议修复的发现。<em>用例</em>:在启动 10 小时的 LoRA fine-tune 之前,粘贴你的脚本——在 200ms 内捕获静默 bug。",
3612
 
3613
+ // v0.8.4 — anti-bullshit pack #10: Prompt-Cache Diff Predictor
3614
+ "modes.cache": "🔁 缓存差异",
3615
+ "mode_desc.cache": "预测 prompt 编辑是否保留了提供商的 prompt cache 还是使其失效。每个提供商的命中率 + 与无缓存的 $ 差额。",
3616
+ "cache.title": "🔁 Prompt-Cache 差异预测器",
3617
+ "cache.tip": "Anthropic 的 <code>cache_control</code> 缓存在标记前缀的第一个 token 差异处中断。OpenAI 自动缓存 ≥1024 token 的前缀,但任何更改都会使其失效。Gemini context cache 需要 ≥32K token。位置不当的编辑会悄悄使你的账单 10 倍——API 永远不会警告。粘贴新旧 prompt,查看每个提供商的命中率 + 成本差额。",
3618
+ "cache.desc": "<strong>不要因一个字符的编辑使账单 10 倍。</strong> 粘贴你之前和当前的 prompt——预测器找到最长公共前缀,估算 token,并显示每个提供商的命中率 + 与无缓存的 $ 差额。",
3619
+ "cache.old_label": "旧 prompt:",
3620
+ "cache.new_label": "新 prompt:",
3621
+ "cache.old.placeholder": "你是一个有帮助的助手。…",
3622
+ "cache.new.placeholder": "你是一个有帮助的助手。…",
3623
+ "cache.profile_label": "Tokenizer 配置:",
3624
+ "cache.profile.english": "英语(chars/4)",
3625
+ "cache.profile.code": "代码(chars/3.5)",
3626
+ "cache.profile.mixed": "中日韩 / 西里尔(chars/2)",
3627
+ "cache.output_label": "估计输出 token:",
3628
+ "cache.diff_btn": "🔍 预测",
3629
+ "cache.example_good_btn": "↳ 示例:99% 命中",
3630
+ "cache.example_broken_btn": "↳ 示例:缓存失效",
3631
+ "cache.example_belowmin_btn": "↳ 示例:低于 OpenAI 最小值",
3632
+ "cache.status.done": "✅ {verdict} — {hit}% 理论命中",
3633
+ "cache.verdict.identical": "✅ 完全相同——完整命中",
3634
+ "cache.verdict.divergent_can_cache":"⚠ 部分命中——按提供商不同",
3635
+ "cache.verdict.divergent_below_min":"❌ 低于所有提供商最小值——无法缓存",
3636
+ "cache.verdict.fully_divergent": "❌ 完全不同——缓存失效",
3637
+ "cache.verdict.empty_input": "ℹ 空输入",
3638
+ "cache.summary.tokens": "公共前缀 {common} / {total} token({pct}% 理论命中率)。",
3639
+ "cache.summary.diff_at": "第一个差异在第 {line} 行。",
3640
+ "cache.col.provider": "提供商",
3641
+ "cache.col.hit": "命中",
3642
+ "cache.col.cost": "基础 → 缓存",
3643
+ "cache.col.savings": "节省",
3644
+ "cache.note.requires_marker": "(需要 cache_control 标记)",
3645
+ "cache.note.below_min": "(前缀 < {min} token——提供商最小值)",
3646
+ "cache.write_surcharge": "+ {cost} 首次缓存写入附加费(Anthropic)",
3647
+ "cache.diff.title": "缓存在哪里中断",
3648
+ "cache.diff.legend": "绿色 = 共享前缀(可缓存)。红色 = 首次编辑(从这里开始全部重新计费)。",
3649
+ "cache.hint.empty": "粘贴两个 prompt,然后预测。",
3650
+ "cache.attribution": "参考:",
3651
+ "cache.attribution.snapshot": "价格快照 2026-01;在按 $ 行动前请用提供商当前文档验证。",
3652
+ "inv.v084.cache": "<strong>🔁 缓存差异</strong> — 预测 prompt 编辑是否使提供商的 prompt cache 失效。每个提供商的命中率 + $ 差额。",
3653
+ "help.v084.cache.title": "🔁 Prompt-Cache 差异预测器",
3654
+ "help.v084.cache.body": "每个提供商的 prompt cache 有不同规则:Anthropic 的 <code>cache_control</code> 在标记前缀的第一个 token 差异处中断;OpenAI 自动缓存 ≥1024 token 的前缀;Gemini context cache 需要 ≥32K token。位置不当的编辑会悄悄使你的账单 10 倍——API 不会警告,成本只在下张账单上出现。粘贴新旧 prompt,预测器找到最长公共前缀,用三种 tokenizer 配置(英语/代码/CJK)估算 token,并显示每个提供商的命中率 + 与无缓存的 $ 差额,包括 Claude Opus/Sonnet/Haiku、GPT-5/mini 和 Gemini 2.5 Pro。<em>用例</em>:『我调整了 system prompt 后账单暴涨——什么坏了?』→ 粘贴两个 prompt,看到底哪个提供商停止缓存。",
3655
+
3656
  "inv.v081.hub": "<strong>🧭 Solutions Hub</strong> — 每个文档化的问题都映射到一个 tafagent 模式或精选外部工具。别重复发明 — 去找。",
3657
  "help.v081.hub.title": "🧭 Solutions Hub",
3658
  "help.v081.hub.body": "tafagent 作为集成者而非孤岛。30+ 问题跨 7 类别(评估可靠性 · 诊断 · 设置 · 训练 · 检索 · 多模态 · 可观测性),每个映射到(a)解决它的 tafagent 模式(若存在),以及(b)社区已信任的最佳外部工具(RAGAS、MTEB、HELM、MCP Schema Validator、llm-stats、llguidance、GlitchMiner 等)。搜索框匹配 pain、场景和工具名称。<em>用例</em>:'我有问题 X — tafagent 解决它吗,如果不,谁解决?'",
js/main.js CHANGED
@@ -29,6 +29,7 @@ import {
29
  } from "./solutions_hub.js";
30
  import { lintJsonCot, reorderJsonText, classifyFieldName } from "./json_cot_linter.js";
31
  import { lintPeftCode, ARCH_TARGET_MODULES } from "./peft_anti_pattern.js";
 
32
 
33
  // Attach HF Hub search-as-you-type to all 5 model id inputs (Profile, Recipe,
34
  // Unmask, Template, Quant). Hits public huggingface.co/api/models. Idempotent.
@@ -220,6 +221,7 @@ document.addEventListener("click", (e) => {
220
  saturation: "saturation-section",
221
  cot: "cot-section",
222
  peft: "peft-section",
 
223
  hub: "hub-section",
224
  }[targetMode];
225
  if (sectionId) {
@@ -245,7 +247,7 @@ document.querySelectorAll(".mode-btn").forEach(btn => {
245
  "diagnose-section", "phase-section", "unmask-section",
246
  "template-section", "arena-section", "contam-section",
247
  "quant-section", "drift-section", "niah-section",
248
- "saturation-section", "cot-section", "peft-section", "hub-section"].forEach(id => {
249
  const el = $(id);
250
  if (el) el.style.display = "none";
251
  });
@@ -259,6 +261,7 @@ document.querySelectorAll(".mode-btn").forEach(btn => {
259
  saturation: "saturation-section",
260
  cot: "cot-section",
261
  peft: "peft-section",
 
262
  hub: "hub-section",
263
  };
264
  const sectionId = sectionMap[mode];
@@ -268,6 +271,7 @@ document.querySelectorAll(".mode-btn").forEach(btn => {
268
  if (mode === "saturation") initSaturation();
269
  if (mode === "cot") initCot();
270
  if (mode === "peft") initPeft();
 
271
  if (mode === "hub") initHub();
272
  });
273
  });
@@ -3712,6 +3716,200 @@ $("peft-example-clean-btn")?.addEventListener("click", () => {
3712
  runPeftLint();
3713
  });
3714
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3715
  // ════════════════════════════════════════════════════════════════════
3716
  // Bootstrap
3717
  // ════════════════════════════════════════════════════════════════════
 
29
  } from "./solutions_hub.js";
30
  import { lintJsonCot, reorderJsonText, classifyFieldName } from "./json_cot_linter.js";
31
  import { lintPeftCode, ARCH_TARGET_MODULES } from "./peft_anti_pattern.js";
32
+ import { diffPromptCache, PROVIDERS as CACHE_PROVIDERS } from "./prompt_cache_diff.js";
33
 
34
  // Attach HF Hub search-as-you-type to all 5 model id inputs (Profile, Recipe,
35
  // Unmask, Template, Quant). Hits public huggingface.co/api/models. Idempotent.
 
221
  saturation: "saturation-section",
222
  cot: "cot-section",
223
  peft: "peft-section",
224
+ cache: "cache-section",
225
  hub: "hub-section",
226
  }[targetMode];
227
  if (sectionId) {
 
247
  "diagnose-section", "phase-section", "unmask-section",
248
  "template-section", "arena-section", "contam-section",
249
  "quant-section", "drift-section", "niah-section",
250
+ "saturation-section", "cot-section", "peft-section", "cache-section", "hub-section"].forEach(id => {
251
  const el = $(id);
252
  if (el) el.style.display = "none";
253
  });
 
261
  saturation: "saturation-section",
262
  cot: "cot-section",
263
  peft: "peft-section",
264
+ cache: "cache-section",
265
  hub: "hub-section",
266
  };
267
  const sectionId = sectionMap[mode];
 
271
  if (mode === "saturation") initSaturation();
272
  if (mode === "cot") initCot();
273
  if (mode === "peft") initPeft();
274
+ if (mode === "cache") initCacheDiff();
275
  if (mode === "hub") initHub();
276
  });
277
  });
 
3716
  runPeftLint();
3717
  });
3718
 
3719
+ // ════════════════════════════════════════════════════════════════════
3720
+ // 🔁 Prompt-Cache Diff Predictor (v0.8.4 anti-bullshit pack #10)
3721
+ // ════════════════════════════════════════════════════════════════════
3722
+ const CACHE_VERDICT_BG = {
3723
+ identical: "#3fb950",
3724
+ divergent_can_cache: "#d29922",
3725
+ divergent_below_min: "#f0883e",
3726
+ fully_divergent: "#f85149",
3727
+ empty_input: "#8b949e",
3728
+ };
3729
+
3730
+ let __cacheInited = false;
3731
+
3732
+ function initCacheDiff() {
3733
+ if (__cacheInited) return;
3734
+ __cacheInited = true;
3735
+ // No-op (no async data); placeholder kept for symmetry.
3736
+ }
3737
+
3738
+ function fmtUsd(n) {
3739
+ if (n == null || isNaN(n)) return "—";
3740
+ if (n === 0) return "$0";
3741
+ if (n < 0.01) return `$${n.toFixed(6)}`;
3742
+ if (n < 1) return `$${n.toFixed(4)}`;
3743
+ return `$${n.toFixed(2)}`;
3744
+ }
3745
+
3746
+ function fmtPct(n) {
3747
+ if (n == null || isNaN(n)) return "—";
3748
+ return `${Math.round(n * 100)}%`;
3749
+ }
3750
+
3751
+ function renderCacheProvider(p) {
3752
+ const bgRow = p.reason === "below_min" ? "#21262d" : "#161b22";
3753
+ const noteHtml = [];
3754
+ if (p.requires_explicit && p.reason !== "below_min") {
3755
+ noteHtml.push(`<span class="subtle" style="font-size:0.8em;">${t("cache.note.requires_marker") || "(requires cache_control marker)"}</span>`);
3756
+ }
3757
+ if (p.reason === "below_min") {
3758
+ noteHtml.push(`<span class="subtle" style="font-size:0.8em;color:#f0883e;">${tFmt("cache.note.below_min", { min: p.min_cache_tokens.toLocaleString() }) || `(prefix < ${p.min_cache_tokens.toLocaleString()} tokens — provider min)`}</span>`);
3759
+ }
3760
+ const noteCell = noteHtml.length ? `<br>${noteHtml.join(" ")}` : "";
3761
+
3762
+ const ttlMin = p.cache_ttl_seconds >= 3600
3763
+ ? `${Math.round(p.cache_ttl_seconds / 3600)}h`
3764
+ : `${Math.round(p.cache_ttl_seconds / 60)}min`;
3765
+
3766
+ const savingsColor = p.savings_usd > 0 ? "#3fb950" : (p.reason ? "#8b949e" : "#d29922");
3767
+ const writeRow = p.cache_write_surcharge_usd && p.cache_write_surcharge_usd > 0
3768
+ ? `<tr style="background:${bgRow};"><td colspan="4" class="subtle" style="font-size:0.8em;padding-left:1em;">${tFmt("cache.write_surcharge", { cost: fmtUsd(p.cache_write_surcharge_usd) }) || `+ ${fmtUsd(p.cache_write_surcharge_usd)} cache-write surcharge first time (Anthropic)`}</td></tr>`
3769
+ : "";
3770
+
3771
+ return `
3772
+ <tr style="background:${bgRow};">
3773
+ <td><strong>${escapeHtml(p.provider_name)}</strong>${noteCell}<br><span class="subtle" style="font-size:0.78em;">TTL ${ttlMin}</span></td>
3774
+ <td style="text-align:right;">${fmtPct(p.hit_ratio)}</td>
3775
+ <td style="text-align:right;">${fmtUsd(p.base_cost_usd)} → ${fmtUsd(p.cached_cost_usd)}</td>
3776
+ <td style="text-align:right;color:${savingsColor};"><strong>${fmtUsd(p.savings_usd)}</strong> (${fmtPct(p.savings_pct ?? 0)})</td>
3777
+ </tr>
3778
+ ${writeRow}
3779
+ `;
3780
+ }
3781
+
3782
+ function renderCacheDiffVisualization(oldText, newText, lcpChars) {
3783
+ // Truncate context — show last 200 chars of common prefix, and the
3784
+ // first 200 chars of each diverging suffix. Keeps UI tight.
3785
+ const ctxBefore = 200;
3786
+ const startCommon = Math.max(0, lcpChars - ctxBefore);
3787
+ const commonTail = oldText.slice(startCommon, lcpChars);
3788
+ const oldDiv = oldText.slice(lcpChars);
3789
+ const newDiv = newText.slice(lcpChars);
3790
+ const commonLeader = startCommon > 0 ? "…" : "";
3791
+
3792
+ return `
3793
+ <details style="margin-top:1em;">
3794
+ <summary style="cursor:pointer;"><strong>${t("cache.diff.title") || "Where the cache breaks"}</strong></summary>
3795
+ <div style="background:#0d1117;padding:0.75em;border-radius:4px;font-family:monospace;font-size:0.85em;line-height:1.4;overflow-x:auto;white-space:pre-wrap;">
3796
+ <span style="color:#3fb950;">${escapeHtml(commonLeader + commonTail)}</span><span style="color:#f85149;text-decoration:underline;">${escapeHtml(oldDiv.slice(0, 200))}</span><span class="subtle"> ← old</span>
3797
+ <span style="color:#3fb950;">${escapeHtml(commonLeader + commonTail)}</span><span style="color:#3fb950;text-decoration:underline;">${escapeHtml(newDiv.slice(0, 200))}</span><span class="subtle"> ← new</span>
3798
+ </div>
3799
+ <p class="subtle" style="font-size:0.82em;">${t("cache.diff.legend") || "Green = shared prefix (cacheable). Red = first edit (everything from here is re-billed)."}</p>
3800
+ </details>
3801
+ `;
3802
+ }
3803
+
3804
+ function renderCacheResult(result, oldText, newText) {
3805
+ const verdict = t(`cache.verdict.${result.code}`) || result.code;
3806
+ const verdictBg = CACHE_VERDICT_BG[result.code] || "#8b949e";
3807
+ const verdictBadge = `<span class="badge" style="background:${verdictBg};">${verdict}</span>`;
3808
+
3809
+ if (result.code === "empty_input") {
3810
+ return `<div class="arena-result">
3811
+ <p style="font-size:1.1em;">${verdictBadge}</p>
3812
+ <p class="recipe-desc">${t("cache.hint.empty") || "Paste two prompts, then Predict."}</p>
3813
+ </div>`;
3814
+ }
3815
+
3816
+ const p = result.params;
3817
+ const summary = `
3818
+ <p class="recipe-desc">
3819
+ ${tFmt("cache.summary.tokens", { common: p.tokens_common.toLocaleString(), total: p.tokens_total.toLocaleString(), pct: Math.round(p.hit_ratio * 100) })
3820
+ || `Common prefix ${p.tokens_common.toLocaleString()} / ${p.tokens_total.toLocaleString()} tokens (${Math.round(p.hit_ratio * 100)}% theoretical hit ratio).`}
3821
+ </p>
3822
+ <p class="recipe-desc subtle">
3823
+ ${tFmt("cache.summary.diff_at", { line: p.diff_point.line }) || `First difference at line ${p.diff_point.line}.`}
3824
+ </p>
3825
+ `;
3826
+
3827
+ const rows = (result.providers || []).map(renderCacheProvider).join("");
3828
+ const table = rows ? `
3829
+ <table class="lean-table" style="margin-top:1em;width:100%;">
3830
+ <thead><tr>
3831
+ <th style="text-align:left;">${t("cache.col.provider") || "Provider"}</th>
3832
+ <th style="text-align:right;">${t("cache.col.hit") || "Hit"}</th>
3833
+ <th style="text-align:right;">${t("cache.col.cost") || "Base → cached"}</th>
3834
+ <th style="text-align:right;">${t("cache.col.savings") || "Savings"}</th>
3835
+ </tr></thead>
3836
+ <tbody>${rows}</tbody>
3837
+ </table>
3838
+ ` : "";
3839
+
3840
+ const diffViz = result.code !== "identical"
3841
+ ? renderCacheDiffVisualization(oldText, newText, p.lcp_chars)
3842
+ : "";
3843
+
3844
+ const attribution = `
3845
+ <p class="recipe-desc subtle" style="font-size:0.82em;margin-top:1em;">
3846
+ ${t("cache.attribution") || "Refs:"}
3847
+ <a href="https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching" target="_blank" rel="noopener noreferrer">Anthropic prompt caching</a> ·
3848
+ <a href="https://platform.openai.com/docs/guides/prompt-caching" target="_blank" rel="noopener noreferrer">OpenAI prompt caching</a> ·
3849
+ <a href="https://ai.google.dev/gemini-api/docs/caching" target="_blank" rel="noopener noreferrer">Gemini context caching</a>
3850
+ <br><em>${t("cache.attribution.snapshot") || "Prices snapshot 2026-01; verify against current provider docs before acting on $."}</em>
3851
+ </p>
3852
+ `;
3853
+
3854
+ return `<div class="arena-result">
3855
+ <p style="font-size:1.1em;">${verdictBadge}</p>
3856
+ ${summary}
3857
+ ${table}
3858
+ ${diffViz}
3859
+ ${attribution}
3860
+ </div>`;
3861
+ }
3862
+
3863
+ function runCacheDiff() {
3864
+ const oldText = $("cache-old")?.value || "";
3865
+ const newText = $("cache-new")?.value || "";
3866
+ const profile = $("cache-profile")?.value || "english";
3867
+ const outputTokens = parseInt($("cache-output-tokens")?.value || "500", 10);
3868
+
3869
+ const result = diffPromptCache(oldText, newText, {
3870
+ profile,
3871
+ outputTokensEstimate: outputTokens,
3872
+ });
3873
+ $("cache-output").innerHTML = renderCacheResult(result, oldText, newText);
3874
+ $("cache-status").textContent = tFmt("cache.status.done", {
3875
+ verdict: t(`cache.verdict.${result.code}`) || result.code,
3876
+ hit: Math.round((result.params?.hit_ratio || 0) * 100),
3877
+ });
3878
+ }
3879
+
3880
+ const CACHE_LONG_SYS = "You are a helpful, harmless, and honest assistant. " +
3881
+ "Always cite your sources. ".repeat(40) +
3882
+ "Always show your reasoning step by step. ".repeat(40) +
3883
+ "Be concise. Format code with backticks. ".repeat(40) +
3884
+ "\n\nUser tools available:\n- search\n- calculator\n- code_runner\n";
3885
+
3886
+ const CACHE_EXAMPLE_GOOD_OLD = CACHE_LONG_SYS + "\nUser: What is 2 + 2?";
3887
+ const CACHE_EXAMPLE_GOOD_NEW = CACHE_LONG_SYS + "\nUser: What is 2 + 3?";
3888
+
3889
+ const CACHE_EXAMPLE_BROKEN_OLD = CACHE_LONG_SYS.replace("helpful, harmless, and honest", "helpful AND honest")
3890
+ + "\nUser: What is 2 + 2?";
3891
+ const CACHE_EXAMPLE_BROKEN_NEW = CACHE_LONG_SYS + "\nUser: What is 2 + 2?";
3892
+
3893
+ const CACHE_EXAMPLE_BELOWMIN_OLD = "Q: name 3 colors";
3894
+ const CACHE_EXAMPLE_BELOWMIN_NEW = "Q: name 4 colors";
3895
+
3896
+ $("cache-diff-btn")?.addEventListener("click", runCacheDiff);
3897
+ $("cache-example-good-btn")?.addEventListener("click", () => {
3898
+ $("cache-old").value = CACHE_EXAMPLE_GOOD_OLD;
3899
+ $("cache-new").value = CACHE_EXAMPLE_GOOD_NEW;
3900
+ runCacheDiff();
3901
+ });
3902
+ $("cache-example-broken-btn")?.addEventListener("click", () => {
3903
+ $("cache-old").value = CACHE_EXAMPLE_BROKEN_OLD;
3904
+ $("cache-new").value = CACHE_EXAMPLE_BROKEN_NEW;
3905
+ runCacheDiff();
3906
+ });
3907
+ $("cache-example-belowmin-btn")?.addEventListener("click", () => {
3908
+ $("cache-old").value = CACHE_EXAMPLE_BELOWMIN_OLD;
3909
+ $("cache-new").value = CACHE_EXAMPLE_BELOWMIN_NEW;
3910
+ runCacheDiff();
3911
+ });
3912
+
3913
  // ════════════════════════════════════════════════════════════════════
3914
  // Bootstrap
3915
  // ════════════════════════════════════════════════════════════════════
js/prompt_cache_diff.js ADDED
@@ -0,0 +1,308 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ // Prompt-Cache Diff Predictor (v0.8.4 anti-bullshit pack #10)
2
+ //
3
+ // Pain: small prompt edits silently invalidate provider prompt caches,
4
+ // turning a 50% discount into a 0% discount and 10x'ing the bill.
5
+ // Users debug this blind because:
6
+ // - Anthropic's `cache_control` cache breaks at the first token diff
7
+ // in the marked prefix (TTL 5 min default, 1 hour beta).
8
+ // - OpenAI auto-caches prefixes ≥1024 tokens but invalidates on any
9
+ // prefix change; the 50% read discount only applies on hit.
10
+ // - Gemini's context cache requires explicit creation, ≥32K tokens,
11
+ // and any prefix edit forces a new cache.
12
+ //
13
+ // Tool: paste old + new prompt → compute longest common prefix in
14
+ // tokens → predict per-provider cache hit ratio + $ delta vs no-cache.
15
+ //
16
+ // Pure logic — no human strings; main.js does i18n. Returns
17
+ // {code, params, providers: [{provider_id, ...}]}.
18
+
19
+ // =============================================================================
20
+ // Token estimation — heuristic, browser-only
21
+ // =============================================================================
22
+ //
23
+ // Real tokenizers vary by ±15% between Llama / GPT / Claude / Qwen and
24
+ // running them in-browser would mean shipping a 5-10 MB WASM blob. For a
25
+ // cache-diff predictor the absolute count doesn't matter — what matters
26
+ // is the RATIO of common-prefix to divergent-suffix tokens, which is
27
+ // robust to estimator choice. The three profiles below cover 95% of
28
+ // real prompts; users with extreme cases can paste pre-tokenized counts.
29
+ const TOKEN_PROFILES = {
30
+ english: { chars_per_token: 4.0, label_key: "cache.profile.english" },
31
+ code: { chars_per_token: 3.5, label_key: "cache.profile.code" },
32
+ mixed: { chars_per_token: 2.0, label_key: "cache.profile.mixed" }, // CJK / Cyrillic
33
+ };
34
+
35
+ export function estimateTokens(text, profile = "english") {
36
+ if (typeof text !== "string" || !text) return 0;
37
+ const cpt = TOKEN_PROFILES[profile]?.chars_per_token ?? 4.0;
38
+ return Math.ceil(text.length / cpt);
39
+ }
40
+
41
+ // =============================================================================
42
+ // Provider rules — pricing + cache mechanics
43
+ // =============================================================================
44
+ //
45
+ // Prices are USD per million tokens, snapshot 2026-01 (knowledge cutoff).
46
+ // `cache_read_multiplier` is the fraction of input price billed on a
47
+ // cache hit (Anthropic 0.10 = 10%; OpenAI/Gemini 0.50 = 50%; etc).
48
+ // `cache_write_multiplier` accounts for Anthropic's 25% write surcharge
49
+ // the first time a prefix is seen.
50
+ //
51
+ // `min_cache_tokens` is the floor below which the provider cannot cache
52
+ // (OpenAI auto-cache requires ≥1024; Gemini context cache ≥32K).
53
+ // Anthropic has no min token floor but requires explicit cache_control
54
+ // marker — we treat that as min=0 with a `requires_explicit` flag for UI.
55
+ export const PROVIDERS = {
56
+ anthropic_opus: {
57
+ name: "Claude Opus 4.7",
58
+ min_cache_tokens: 0,
59
+ requires_explicit: true,
60
+ cache_ttl_seconds: 300, // 5 min default
61
+ input_per_mt: 15.00,
62
+ output_per_mt: 75.00,
63
+ cache_write_multiplier: 1.25,
64
+ cache_read_multiplier: 0.10, // 10% of input
65
+ },
66
+ anthropic_sonnet: {
67
+ name: "Claude Sonnet 4.6",
68
+ min_cache_tokens: 0,
69
+ requires_explicit: true,
70
+ cache_ttl_seconds: 300,
71
+ input_per_mt: 3.00,
72
+ output_per_mt: 15.00,
73
+ cache_write_multiplier: 1.25,
74
+ cache_read_multiplier: 0.10,
75
+ },
76
+ anthropic_haiku: {
77
+ name: "Claude Haiku 4.5",
78
+ min_cache_tokens: 0,
79
+ requires_explicit: true,
80
+ cache_ttl_seconds: 300,
81
+ input_per_mt: 1.00,
82
+ output_per_mt: 5.00,
83
+ cache_write_multiplier: 1.25,
84
+ cache_read_multiplier: 0.10,
85
+ },
86
+ openai_gpt5: {
87
+ name: "OpenAI GPT-5",
88
+ min_cache_tokens: 1024,
89
+ requires_explicit: false,
90
+ cache_ttl_seconds: 600, // ~5-10 min observed
91
+ input_per_mt: 5.00,
92
+ output_per_mt: 15.00,
93
+ cache_write_multiplier: 1.00,
94
+ cache_read_multiplier: 0.50, // 50% of input
95
+ },
96
+ openai_gpt5_mini: {
97
+ name: "OpenAI GPT-5 mini",
98
+ min_cache_tokens: 1024,
99
+ requires_explicit: false,
100
+ cache_ttl_seconds: 600,
101
+ input_per_mt: 0.30,
102
+ output_per_mt: 1.20,
103
+ cache_write_multiplier: 1.00,
104
+ cache_read_multiplier: 0.50,
105
+ },
106
+ gemini_25_pro: {
107
+ name: "Gemini 2.5 Pro",
108
+ min_cache_tokens: 32768,
109
+ requires_explicit: true,
110
+ cache_ttl_seconds: 3600, // 1 hour default for context cache
111
+ input_per_mt: 1.25,
112
+ output_per_mt: 10.00,
113
+ cache_write_multiplier: 1.00,
114
+ cache_read_multiplier: 0.25, // 25% of input
115
+ },
116
+ };
117
+
118
+ // =============================================================================
119
+ // Longest common prefix — character-level
120
+ // =============================================================================
121
+
122
+ export function longestCommonPrefix(a, b) {
123
+ if (typeof a !== "string" || typeof b !== "string") return 0;
124
+ const n = Math.min(a.length, b.length);
125
+ let i = 0;
126
+ while (i < n && a.charCodeAt(i) === b.charCodeAt(i)) i++;
127
+ return i;
128
+ }
129
+
130
+ // First differing line — useful for the UI "your edit landed here" hint.
131
+ function firstDifferingLine(a, b, prefixLen) {
132
+ // Walk back to the start of the line containing the diff
133
+ let i = prefixLen;
134
+ while (i > 0 && a[i - 1] !== "\n" && b[i - 1] !== "\n") i--;
135
+ // Count line number (1-indexed)
136
+ let line = 1;
137
+ for (let j = 0; j < i; j++) {
138
+ if (a[j] === "\n") line++;
139
+ }
140
+ return { offset: i, line };
141
+ }
142
+
143
+ // =============================================================================
144
+ // Per-provider cache analysis
145
+ // =============================================================================
146
+
147
+ function analyseProvider(
148
+ providerId,
149
+ totalTokensNew,
150
+ commonTokens,
151
+ divergeTokens,
152
+ outputTokens,
153
+ ) {
154
+ const p = PROVIDERS[providerId];
155
+ if (!p) return null;
156
+
157
+ const inputPrice = p.input_per_mt / 1_000_000;
158
+ const outputPrice = p.output_per_mt / 1_000_000;
159
+ const baseCost =
160
+ totalTokensNew * inputPrice + outputTokens * outputPrice;
161
+
162
+ // Can the provider cache anything? Two failure modes:
163
+ // (a) common prefix below provider's minimum cacheable size
164
+ // (b) provider requires an explicit marker AND the user almost
165
+ // certainly didn't include one in the paste — we still report
166
+ // the best-case savings but tag the result as `requires_marker`.
167
+ let canCache = true;
168
+ let reason = null;
169
+ if (commonTokens < p.min_cache_tokens) {
170
+ canCache = false;
171
+ reason = "below_min";
172
+ }
173
+
174
+ if (!canCache) {
175
+ return {
176
+ provider_id: providerId,
177
+ provider_name: p.name,
178
+ base_cost_usd: baseCost,
179
+ cached_cost_usd: baseCost,
180
+ savings_usd: 0,
181
+ hit_ratio: 0,
182
+ tokens_cached: 0,
183
+ tokens_billed_input: totalTokensNew,
184
+ reason,
185
+ min_cache_tokens: p.min_cache_tokens,
186
+ requires_explicit: p.requires_explicit,
187
+ cache_ttl_seconds: p.cache_ttl_seconds,
188
+ };
189
+ }
190
+
191
+ // Cost on cache HIT for the prefix:
192
+ // cache-read: commonTokens × inputPrice × cache_read_multiplier
193
+ // fresh: divergeTokens × inputPrice
194
+ // output: outputTokens × outputPrice
195
+ const cachedInputCost =
196
+ commonTokens * inputPrice * p.cache_read_multiplier +
197
+ divergeTokens * inputPrice;
198
+ const cachedCost = cachedInputCost + outputTokens * outputPrice;
199
+
200
+ // Cache write surcharge (Anthropic). Surfaced as `cache_write_cost`
201
+ // separately so users see the amortization picture.
202
+ const cacheWriteSurcharge =
203
+ commonTokens * inputPrice * (p.cache_write_multiplier - 1.0);
204
+
205
+ const savings = baseCost - cachedCost;
206
+ const hitRatio = totalTokensNew === 0 ? 0 : commonTokens / totalTokensNew;
207
+
208
+ return {
209
+ provider_id: providerId,
210
+ provider_name: p.name,
211
+ base_cost_usd: baseCost,
212
+ cached_cost_usd: cachedCost,
213
+ cache_write_surcharge_usd: cacheWriteSurcharge,
214
+ savings_usd: savings,
215
+ savings_pct: baseCost === 0 ? 0 : savings / baseCost,
216
+ hit_ratio: hitRatio,
217
+ tokens_cached: commonTokens,
218
+ tokens_billed_input: divergeTokens,
219
+ reason: null,
220
+ min_cache_tokens: p.min_cache_tokens,
221
+ requires_explicit: p.requires_explicit,
222
+ cache_ttl_seconds: p.cache_ttl_seconds,
223
+ };
224
+ }
225
+
226
+ // =============================================================================
227
+ // Public entry point
228
+ // =============================================================================
229
+
230
+ export function diffPromptCache(
231
+ oldPrompt,
232
+ newPrompt,
233
+ {
234
+ profile = "english",
235
+ outputTokensEstimate = 500,
236
+ providers = null,
237
+ } = {},
238
+ ) {
239
+ if (typeof oldPrompt !== "string" || typeof newPrompt !== "string") {
240
+ return { code: "empty_input", params: {} };
241
+ }
242
+ const oldTrim = oldPrompt;
243
+ const newTrim = newPrompt;
244
+ if (!oldTrim && !newTrim) {
245
+ return { code: "empty_input", params: {} };
246
+ }
247
+
248
+ const lcpChars = longestCommonPrefix(oldTrim, newTrim);
249
+ const isIdentical = oldTrim === newTrim;
250
+ const totalCharsNew = newTrim.length;
251
+ const divergeChars = totalCharsNew - lcpChars;
252
+
253
+ const tokensCommon = estimateTokens(oldTrim.slice(0, lcpChars), profile);
254
+ const tokensDiverge = estimateTokens(newTrim.slice(lcpChars), profile);
255
+ const tokensTotal = tokensCommon + tokensDiverge;
256
+
257
+ const providerIds = providers ?? Object.keys(PROVIDERS);
258
+ const providerResults = providerIds
259
+ .map(id => analyseProvider(id, tokensTotal, tokensCommon, tokensDiverge, outputTokensEstimate))
260
+ .filter(r => r !== null);
261
+
262
+ const diffPoint = isIdentical
263
+ ? { offset: oldTrim.length, line: oldTrim.split("\n").length }
264
+ : firstDifferingLine(oldTrim, newTrim, lcpChars);
265
+
266
+ let code;
267
+ if (isIdentical) {
268
+ code = "identical";
269
+ } else if (lcpChars === 0) {
270
+ code = "fully_divergent";
271
+ } else if (providerResults.every(r => r.reason === "below_min")) {
272
+ code = "divergent_below_min";
273
+ } else {
274
+ code = "divergent_can_cache";
275
+ }
276
+
277
+ return {
278
+ code,
279
+ params: {
280
+ profile,
281
+ lcp_chars: lcpChars,
282
+ diverge_chars: divergeChars,
283
+ tokens_common: tokensCommon,
284
+ tokens_diverge: tokensDiverge,
285
+ tokens_total: tokensTotal,
286
+ hit_ratio: tokensTotal === 0 ? 0 : tokensCommon / tokensTotal,
287
+ diff_point: diffPoint,
288
+ output_tokens: outputTokensEstimate,
289
+ },
290
+ providers: providerResults,
291
+ };
292
+ }
293
+
294
+ // Helper used by the UI: short summary string per provider, suitable for
295
+ // rendering in a table row (i18n-substituted in main.js).
296
+ export function summariseProvider(result) {
297
+ if (!result) return null;
298
+ return {
299
+ name: result.provider_name,
300
+ hit_pct: Math.round(result.hit_ratio * 100),
301
+ base: result.base_cost_usd,
302
+ cached: result.cached_cost_usd,
303
+ savings: result.savings_usd,
304
+ savings_pct: result.savings_pct ?? 0,
305
+ requires_explicit: result.requires_explicit,
306
+ reason: result.reason,
307
+ };
308
+ }