Spaces:

karlexmarin
/

taf-agent

Running

karlexmarin Claude Opus 4.7 (1M context) commited on 5 days ago

Commit

cd27f27

1 Parent(s): e9f9ac5

v0.8.7 Multilingual Tokenizer Tax Calculator — anti-bullshit pack #13

Pain: tokenizers tax non-English text asymmetrically. The same
paragraph might be 100 tokens in English but 250+ tokens in Chinese
on a Latin-trained tokenizer (Llama, Phi). Both per-request cost
AND effective context degrade silently. tiktokenizer.vercel.app
covers OpenAI's cl100k only; nothing public compares Llama vs Qwen
vs Phi vs Gemma vs GPT vs Claude in one interface.

🌍 Token Tax (20th mode):
- Lazy-imports HuggingFace transformers.js (~750 KB pinned to 3.0.2,
jsdelivr CDN). First-mode-open pays the cost; subsequent runs
instant after browser cache.
- Tokenizes user-pasted text against 6 preset open-weight tokenizers
(Qwen/Qwen2.5-7B-Instruct, microsoft/Phi-3.5-mini-instruct,
unsloth/Meta-Llama-3.1-8B-Instruct, unsloth/gemma-2-9b-it,
Xenova/gpt-4 cl100k port, Xenova/claude-tokenizer community port).
All open — no HF auth required. Llama/Gemma use the unsloth open
mirrors that ship the byte-identical tokenizer.json (quantization
touches weights, not tokens).
- Output: per-tokenizer token count, chars-per-token, ratio vs
baseline, color-coded (red ≥1.5×, amber ≥1.15×, green within 5%).
Worst-tax interpretation surfaces the loudest mismatch
automatically.
- Auto-detects Unicode script blocks (Latin / CJK / Korean / Arabic
/ Cyrillic / Devanagari / Thai / Greek / Hebrew) so users see
"92% CJK" alongside "Phi-3.5 = 2.27×" → instantly understand
the WHY.
- 5 sample buttons (English / 中文 / عربى / mixed / code) for
one-click demo coverage.

Pure logic in `js/tokenizer_tax.js` (lazy CDN import + tokenizer
cache + parallel tokenize + script detection). 36 i18n keys × 4
langs (EN/ES/FR/ZH) = 144 keys, parity clean. Help modal v0.8.7
entry + Inventory + "Set up an eval correctly" task tile.

Privacy-by-design: all tokenization is local — pasted text never
leaves the browser. Status note explains first-load latency
(~5-15s for 6 tokenizers in parallel, then cached).

Verified locally: ZH sample (92% CJK) yields:
Qwen2.5 baseline 44 tokens (1.43 chars/tok) 1.00×
Phi-3.5 100 tokens (0.63 chars/tok) 2.27× ⚠
Llama-3.1 60 tokens (1.05 chars/tok) 1.36×
Gemma-2 49 tokens (1.29 chars/tok) 1.11×
GPT-4 cl100k 81 tokens (0.78 chars/tok) 1.84×
Claude (approx) 70 tokens (0.90 chars/tok) 1.59×

Phi's BPE (32k vocab, no CJK pre-training) charges 2.27× over
Qwen for the SAME Chinese paragraph. That is the silent tax this
tool surfaces.

Refs:
- https://github.com/huggingface/transformers.js
- https://huggingface.co/Qwen/Qwen2.5-7B-Instruct
- https://huggingface.co/Xenova/gpt-4

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Files changed (4) hide show

index.html +35 -0
js/i18n.js +156 -0
js/main.js +163 -1
js/tokenizer_tax.js +221 -0

index.html CHANGED Viewed

@@ -228,6 +228,9 @@
       <p><strong data-i18n="help.v085.speculative.title">🔬 Speculative-Decode Compatibility</strong></p>
       <p data-i18n="help.v085.speculative.body">Speculative decoding only works if target and draft share the exact same vocabulary. Mismatched vocabs cause every draft token to be rejected — you pay BOTH compute costs and get worse throughput than baseline. Worse, the system still emits correct output (just slower), so the bug is invisible in unit tests. vLLM #4570 / #16757 / #20409 / #12488 all surface variants. This tool fetches `tokenizer.json` from HF Hub for both model ids, compares tokenizer type, vocab size, full token→id map, special tokens, and added tokens, then estimates a speedup band based on param ratio and typical α=0.5/0.7/0.85 acceptance rates. <em>Use case</em>: before you launch a vLLM cluster with spec-dec enabled, verify the pair is actually compatible.</p>
       <p><strong data-i18n="help.v081.hub.title">🧭 Solutions Hub</strong></p>
       <p data-i18n="help.v081.hub.body">tafagent as integrator, not silo. 30+ pains across 7 categories (eval reliability · diagnostics · setup · training · retrieval · multimodal · observability), each mapped to (a) the tafagent mode that addresses it, if any, and (b) the best-of-breed external tools the community already trusts (RAGAS, MTEB, HELM, MCP Schema Validator, llm-stats, llguidance, GlitchMiner, etc.). Search box matches across pain, scenario, and tool name. <em>Use case</em>: 'I have problem X — does tafagent solve it, and if not, who does?'</p>
@@ -344,6 +347,7 @@
             <li data-i18n="inv.v083.peft"><strong>🔧 PEFT Lint</strong> — catches the silent <code>get_peft_model</code> base-load (peft #2115) + QLoRA order + target_modules / arch mismatch.</li>
             <li data-i18n="inv.v084.cache"><strong>🔁 Cache Diff</strong> — predicts whether a prompt edit invalidated the provider's prompt cache. Per-provider hit ratio + $ delta.</li>
             <li data-i18n="inv.v085.speculative"><strong>🔬 Spec-Decode</strong> — verifies tokenizer vocab compatibility between target + draft before you ship speculative decoding (the bug that gives WORSE throughput silently).</li>
             <li data-i18n="inv.v081.hub"><strong>🧭 Solutions Hub</strong> — every documented pain mapped to a tafagent mode or curated external tool. Don't reinvent — find.</li>
           </ul>
         </details>
@@ -419,6 +423,7 @@
             <button data-mode-link="peft" data-i18n="modes.peft">🔧 PEFT Lint</button>
             <button data-mode-link="cache" data-i18n="modes.cache">🔁 Cache Diff</button>
             <button data-mode-link="speculative" data-i18n="modes.speculative">🔬 Spec-Decode</button>
           </div>
         </div>
         <div class="task-tile">
@@ -479,6 +484,7 @@
         <button class="mode-btn" data-mode="peft" role="tab" aria-selected="false" data-i18n="modes.peft">🔧 PEFT Lint</button>
         <button class="mode-btn" data-mode="cache" role="tab" aria-selected="false" data-i18n="modes.cache">🔁 Cache Diff</button>
         <button class="mode-btn" data-mode="speculative" role="tab" aria-selected="false" data-i18n="modes.speculative">🔬 Spec-Decode</button>
         <button class="mode-btn" data-mode="hub" role="tab" aria-selected="false" data-i18n="modes.hub">🧭 Solutions</button>
       </div>
       <p id="mode-desc" class="recipe-desc" data-i18n="modes.desc">
@@ -1143,6 +1149,35 @@
       <div id="spec-output" style="margin-top: 1em;"></div>
     </section>
     <section id="hub-section" style="display:none;">
       <h2><span data-i18n="hub.title">🧭 Solutions Hub</span>
         <span class="info"><span class="tooltip" data-i18n="hub.tip">

       <p><strong data-i18n="help.v085.speculative.title">🔬 Speculative-Decode Compatibility</strong></p>
       <p data-i18n="help.v085.speculative.body">Speculative decoding only works if target and draft share the exact same vocabulary. Mismatched vocabs cause every draft token to be rejected — you pay BOTH compute costs and get worse throughput than baseline. Worse, the system still emits correct output (just slower), so the bug is invisible in unit tests. vLLM #4570 / #16757 / #20409 / #12488 all surface variants. This tool fetches `tokenizer.json` from HF Hub for both model ids, compares tokenizer type, vocab size, full token→id map, special tokens, and added tokens, then estimates a speedup band based on param ratio and typical α=0.5/0.7/0.85 acceptance rates. <em>Use case</em>: before you launch a vLLM cluster with spec-dec enabled, verify the pair is actually compatible.</p>
+      <p><strong data-i18n="help.v087.tax.title">🌍 Multilingual Tokenizer Tax</strong></p>
+      <p data-i18n="help.v087.tax.body">Tokenizers tax non-English text asymmetrically. The same paragraph might be 100 tokens in English but 250+ in Chinese on a Latin-trained tokenizer (Llama, Phi). Both cost-per-request AND effective context degrade silently. This tool loads HuggingFace transformers.js in your browser (~750 KB CDN) and tokenizes pasted text against 6 preset vendor tokenizers (Qwen2.5, Phi-3.5, Llama-3.1, Gemma-2, GPT-4 cl100k, Claude approx). <em>Use case</em>: 'My multilingual support added 30% to the bill — which language costs the most?' → paste real production text, see exact per-tokenizer breakdown.</p>
       <p><strong data-i18n="help.v081.hub.title">🧭 Solutions Hub</strong></p>
       <p data-i18n="help.v081.hub.body">tafagent as integrator, not silo. 30+ pains across 7 categories (eval reliability · diagnostics · setup · training · retrieval · multimodal · observability), each mapped to (a) the tafagent mode that addresses it, if any, and (b) the best-of-breed external tools the community already trusts (RAGAS, MTEB, HELM, MCP Schema Validator, llm-stats, llguidance, GlitchMiner, etc.). Search box matches across pain, scenario, and tool name. <em>Use case</em>: 'I have problem X — does tafagent solve it, and if not, who does?'</p>
             <li data-i18n="inv.v083.peft"><strong>🔧 PEFT Lint</strong> — catches the silent <code>get_peft_model</code> base-load (peft #2115) + QLoRA order + target_modules / arch mismatch.</li>
             <li data-i18n="inv.v084.cache"><strong>🔁 Cache Diff</strong> — predicts whether a prompt edit invalidated the provider's prompt cache. Per-provider hit ratio + $ delta.</li>
             <li data-i18n="inv.v085.speculative"><strong>🔬 Spec-Decode</strong> — verifies tokenizer vocab compatibility between target + draft before you ship speculative decoding (the bug that gives WORSE throughput silently).</li>
+            <li data-i18n="inv.v087.tax"><strong>🌍 Token Tax</strong> — real BPE encoding across 6 vendor tokenizers. Surfaces the silent cost asymmetry across languages (CJK / Arabic / mixed).</li>
             <li data-i18n="inv.v081.hub"><strong>🧭 Solutions Hub</strong> — every documented pain mapped to a tafagent mode or curated external tool. Don't reinvent — find.</li>
           </ul>
         </details>
             <button data-mode-link="peft" data-i18n="modes.peft">🔧 PEFT Lint</button>
             <button data-mode-link="cache" data-i18n="modes.cache">🔁 Cache Diff</button>
             <button data-mode-link="speculative" data-i18n="modes.speculative">🔬 Spec-Decode</button>
+            <button data-mode-link="tax" data-i18n="modes.tax">🌍 Token Tax</button>
           </div>
         </div>
         <div class="task-tile">
         <button class="mode-btn" data-mode="peft" role="tab" aria-selected="false" data-i18n="modes.peft">🔧 PEFT Lint</button>
         <button class="mode-btn" data-mode="cache" role="tab" aria-selected="false" data-i18n="modes.cache">🔁 Cache Diff</button>
         <button class="mode-btn" data-mode="speculative" role="tab" aria-selected="false" data-i18n="modes.speculative">🔬 Spec-Decode</button>
+        <button class="mode-btn" data-mode="tax" role="tab" aria-selected="false" data-i18n="modes.tax">🌍 Token Tax</button>
         <button class="mode-btn" data-mode="hub" role="tab" aria-selected="false" data-i18n="modes.hub">🧭 Solutions</button>
       </div>
       <p id="mode-desc" class="recipe-desc" data-i18n="modes.desc">
       <div id="spec-output" style="margin-top: 1em;"></div>
     </section>
+    <!-- Multilingual Tokenizer Tax (mode=tax, v0.8.7 anti-bullshit pack #13) -->
+    <section id="tax-section" style="display:none;">
+      <h2><span data-i18n="tax.title">🌍 Multilingual Tokenizer Tax</span>
+        <span class="info"><span class="tooltip" data-i18n="tax.tip">
+          <strong>Why this matters</strong>: tokenizers tax non-English text asymmetrically. The same paragraph might be 100 tokens in English but 250+ tokens in Chinese on a Latin-trained tokenizer (Llama, Phi). Cost per request and effective context BOTH degrade silently. Paste your text, see actual token counts across vendor tokenizers — no estimation, real BPE encoding via transformers.js in your browser.
+        </span></span>
+      </h2>
+      <p class="recipe-desc" data-i18n="tax.desc">
+        <strong>Don't 3× your bill on Chinese support.</strong> Paste any text → real per-tokenizer BPE encoding across Qwen / Phi / Llama / Gemma / GPT-4 / Claude → see the cost asymmetry vs your baseline.
+      </p>
+      <div class="form-row">
+        <label for="tax-input" data-i18n="tax.input_label">Text to tokenize:</label>
+        <textarea id="tax-input" rows="8" style="width:100%;font-family:monospace;font-size:0.9em;" data-i18n-placeholder="tax.input.placeholder" placeholder="Paste any text — English, Chinese, Arabic, code, …"></textarea>
+      </div>
+      <div class="form-row">
+        <button type="button" id="tax-tokenize-btn" data-i18n="tax.tokenize_btn">🔬 Tokenize all</button>
+        <button type="button" id="tax-sample-en-btn" class="secondary" data-i18n="tax.sample_en_btn">↳ Sample: English</button>
+        <button type="button" id="tax-sample-zh-btn" class="secondary" data-i18n="tax.sample_zh_btn">↳ Sample: 中文</button>
+        <button type="button" id="tax-sample-ar-btn" class="secondary" data-i18n="tax.sample_ar_btn">↳ Sample: عربى</button>
+        <button type="button" id="tax-sample-mixed-btn" class="secondary" data-i18n="tax.sample_mixed_btn">↳ Sample: mixed</button>
+        <button type="button" id="tax-sample-code-btn" class="secondary" data-i18n="tax.sample_code_btn">↳ Sample: code</button>
+      </div>
+      <p id="tax-status" class="recipe-desc" style="font-size:0.92em;"></p>
+      <div id="tax-output" style="margin-top: 1em;"></div>
+      <p class="recipe-desc subtle" style="font-size:0.82em;margin-top:1em;" data-i18n="tax.firstload_note">
+        💡 <strong>First-time load:</strong> the tool fetches transformers.js (~750 KB) + each tokenizer's vocab on demand (~5-15 MB per tokenizer, cached after). Subsequent runs are instant. All processing is local — your text never leaves the browser.
+      </p>
+    </section>
     <section id="hub-section" style="display:none;">
       <h2><span data-i18n="hub.title">🧭 Solutions Hub</span>
         <span class="info"><span class="tooltip" data-i18n="hub.tip">

js/i18n.js CHANGED Viewed

@@ -714,6 +714,45 @@ export const TRANSLATIONS = {
     "help.v085.speculative.title": "🔬 Speculative-Decode Compatibility",
     "help.v085.speculative.body":  "Speculative decoding only works if target and draft share the exact same vocabulary. Mismatched vocabs cause every draft token to be rejected — you pay BOTH compute costs and get worse throughput than baseline. Worse, the system still emits correct output (just slower), so the bug is invisible in unit tests. vLLM #4570 / #16757 / #20409 / #12488 all surface variants. This tool fetches `tokenizer.json` from HF Hub for both model ids, compares tokenizer type, vocab size, full token→id map, special tokens, and added tokens, then estimates a speedup band based on param ratio and typical α=0.5/0.7/0.85 acceptance rates. <em>Use case</em>: before you launch a vLLM cluster with spec-dec enabled, verify the pair is actually compatible.",
     "inv.v081.hub":                "<strong>🧭 Solutions Hub</strong> — every documented pain mapped to a tafagent mode or curated external tool. Don't reinvent — find.",
     "help.v081.hub.title":         "🧭 Solutions Hub",
     "help.v081.hub.body":          "tafagent as integrator, not silo. 30+ pains across 7 categories (eval reliability · diagnostics · setup · training · retrieval · multimodal · observability), each mapped to (a) the tafagent mode that addresses it, if any, and (b) the best-of-breed external tools the community already trusts (RAGAS, MTEB, HELM, MCP Schema Validator, llm-stats, llguidance, GlitchMiner, etc.). Search box matches across pain, scenario, and tool name. <em>Use case</em>: 'I have problem X — does tafagent solve it, and if not, who does?'",
@@ -1887,6 +1926,45 @@ export const TRANSLATIONS = {
     "help.v085.speculative.title": "🔬 Compatibilidad de Speculative-Decode",
     "help.v085.speculative.body":  "El speculative decoding solo funciona si target y draft comparten exactamente el mismo vocabulario. Vocabs mismatched hacen que cada token del draft sea rechazado — pagas AMBOS computes y obtienes peor throughput que baseline. Peor: el sistema sigue emitiendo output correcto (solo más lento), así que el bug es invisible en tests unitarios. vLLM #4570 / #16757 / #20409 / #12488 surfacen variantes. Esta tool hace fetch de `tokenizer.json` desde HF Hub para ambos ids, compara tipo de tokenizer, tamaño de vocab, mapa completo token→id, special tokens, y added tokens, luego estima una banda de speedup basada en ratio de params y tasas típicas α=0.5/0.7/0.85 de aceptación. <em>Caso de uso</em>: antes de lanzar un cluster vLLM con spec-dec habilitado, verifica que el par sea compatible.",
     "inv.v081.hub":                "<strong>🧭 Solutions Hub</strong> — cada pain documentado mapeado a un mode tafagent o herramienta externa curada. No reinventes — encuentra.",
     "help.v081.hub.title":         "🧭 Solutions Hub",
     "help.v081.hub.body":          "tafagent como integrador, no silo. 30+ pains en 7 categorías (eval reliability · diagnósticos · setup · training · retrieval · multimodal · observability), cada uno mapeado a (a) el mode tafagent que lo resuelve, si existe, y (b) las herramientas externas best-of-breed que la comunidad ya usa (RAGAS, MTEB, HELM, MCP Schema Validator, llm-stats, llguidance, GlitchMiner, etc.). Caja de búsqueda matchea pain, scenario, y nombre de herramienta. <em>Caso de uso</em>: 'tengo problema X — ¿lo resuelve tafagent, y si no, quién?'",
@@ -2924,6 +3002,45 @@ export const TRANSLATIONS = {
     "help.v085.speculative.title": "🔬 Compatibilité Speculative-Decode",
     "help.v085.speculative.body":  "Le speculative decoding ne marche que si target et draft partagent exactement le même vocabulaire. Des vocabs mismatched font que chaque token du draft est rejeté — vous payez LES DEUX coûts de compute et obtenez un pire débit que la baseline. Pire : le système émet toujours une sortie correcte (juste plus lente), donc le bug est invisible aux tests unitaires. vLLM #4570 / #16757 / #20409 / #12488 surfent les variantes. Cet outil récupère `tokenizer.json` depuis HF Hub pour les deux model ids, compare le type de tokenizer, la taille du vocab, la map complète token→id, les special tokens, et les added tokens, puis estime une bande de speedup basée sur le ratio de params et les taux α=0.5/0.7/0.85 d'acceptation typiques. <em>Cas d'usage</em> : avant de lancer un cluster vLLM avec spec-dec activé, vérifiez que la paire est compatible.",
     "inv.v081.hub":                "<strong>🧭 Solutions Hub</strong> — chaque pain documenté mappé à un mode tafagent ou outil externe curé. Ne réinventez pas — trouvez.",
     "help.v081.hub.title":         "🧭 Solutions Hub",
     "help.v081.hub.body":          "tafagent comme intégrateur, pas silo. 30+ pains à travers 7 catégories (eval reliability · diagnostics · setup · training · retrieval · multimodal · observability), chacun mappé à (a) le mode tafagent qui le résout, s'il existe, et (b) les outils externes best-of-breed que la communauté utilise déjà (RAGAS, MTEB, HELM, MCP Schema Validator, llm-stats, llguidance, GlitchMiner, etc.). La barre de recherche matche pain, scénario, et nom d'outil. <em>Cas d'usage</em> : 'j'ai le problème X — tafagent le résout-il, et sinon, qui ?'",
@@ -3961,6 +4078,45 @@ export const TRANSLATIONS = {
     "help.v085.speculative.title": "🔬 Speculative-Decode 兼容性",
     "help.v085.speculative.body":  "Speculative decoding 仅当 target 和 draft 共享完全相同的词汇表时才能工作。Vocab 不匹配导致每个 draft token 被拒绝——你支付双倍计算成本且吞吐量比 baseline 更差。更糟：系统仍输出正确（只是更慢），所以 bug 在单元测试中不可见。vLLM #4570 / #16757 / #20409 / #12488 都显示了变种。这个工具从 HF Hub 获取两个 model id 的 `tokenizer.json`，比较 tokenizer 类型、vocab 大小、完整 token→id 映射、special token 和 added token，然后基于参数比和典型 α=0.5/0.7/0.85 接受率估算 speedup 范围。<em>用例</em>：在启动启用了 spec-dec 的 vLLM 集群之前，验证这对模型是否真的兼容。",
     "inv.v081.hub":                "<strong>🧭 Solutions Hub</strong> — 每个文档化的问题都映射到一个 tafagent 模式或精选外部工具。别重复发明 — 去找。",
     "help.v081.hub.title":         "🧭 Solutions Hub",
     "help.v081.hub.body":          "tafagent 作为集成者而非孤岛。30+ 问题跨 7 类别（评估可靠性 · 诊断 · 设置 · 训练 · 检索 · 多模态 · 可观测性），每个映射到（a）解决它的 tafagent 模式（若存在），以及（b）社区已信任的最佳外部工具（RAGAS、MTEB、HELM、MCP Schema Validator、llm-stats、llguidance、GlitchMiner 等）。搜索框匹配 pain、场景和工具名称。<em>用例</em>：'我有问题 X — tafagent 解决它吗，如果不，谁解决？'",

     "help.v085.speculative.title": "🔬 Speculative-Decode Compatibility",
     "help.v085.speculative.body":  "Speculative decoding only works if target and draft share the exact same vocabulary. Mismatched vocabs cause every draft token to be rejected — you pay BOTH compute costs and get worse throughput than baseline. Worse, the system still emits correct output (just slower), so the bug is invisible in unit tests. vLLM #4570 / #16757 / #20409 / #12488 all surface variants. This tool fetches `tokenizer.json` from HF Hub for both model ids, compares tokenizer type, vocab size, full token→id map, special tokens, and added tokens, then estimates a speedup band based on param ratio and typical α=0.5/0.7/0.85 acceptance rates. <em>Use case</em>: before you launch a vLLM cluster with spec-dec enabled, verify the pair is actually compatible.",
+    // v0.8.7 — anti-bullshit pack #13: Multilingual Tokenizer Tax
+    "modes.tax":                   "🌍 Token Tax",
+    "mode_desc.tax":               "Real BPE encoding (browser-side via transformers.js) of pasted text across 6 vendor tokenizers. Surfaces the silent cost asymmetry across languages.",
+    "tax.title":                   "🌍 Multilingual Tokenizer Tax",
+    "tax.tip":                     "Tokenizers tax non-English text asymmetrically. The same paragraph might be 100 tokens in English but 250+ tokens in Chinese on a Latin-trained tokenizer (Llama, Phi). Cost per request and effective context BOTH degrade silently. Paste your text, see actual token counts across vendor tokenizers — no estimation, real BPE encoding via transformers.js in your browser.",
+    "tax.desc":                    "<strong>Don't 3× your bill on Chinese support.</strong> Paste any text → real per-tokenizer BPE encoding across Qwen / Phi / Llama / Gemma / GPT-4 / Claude → see the cost asymmetry vs your baseline.",
+    "tax.input_label":             "Text to tokenize:",
+    "tax.input.placeholder":       "Paste any text — English, Chinese, Arabic, code, …",
+    "tax.tokenize_btn":            "🔬 Tokenize all",
+    "tax.sample_en_btn":           "↳ Sample: English",
+    "tax.sample_zh_btn":           "↳ Sample: 中文",
+    "tax.sample_ar_btn":           "↳ Sample: عربى",
+    "tax.sample_mixed_btn":        "↳ Sample: mixed",
+    "tax.sample_code_btn":         "↳ Sample: code",
+    "tax.status.loading":          "⏳ Loading transformers.js + tokenizers (first run can take 5-15s)…",
+    "tax.status.done":             "✅ {n}/{total} tokenizers ran in {ms}ms",
+    "tax.col.tokenizer":           "Tokenizer",
+    "tax.col.tokens":              "Tokens",
+    "tax.col.cpt":                 "Chars/tok",
+    "tax.col.ratio":                "Ratio",
+    "tax.summary.input":           "Input: {chars} chars, {bytes} bytes",
+    "tax.script_breakdown":        "scripts",
+    "tax.interp.worst":            "{label} costs {pct}% more tokens than baseline for this text.",
+    "tax.interp.uniform":          "✓ All tokenizers within ±5% — text is well-handled across vendors.",
+    "tax.hint.empty":              "Paste some text and click Tokenize.",
+    "tax.all_failed":              "All tokenizers failed to load.",
+    "tax.error.gated":             "model gated (HF auth required — try the open mirror)",
+    "tax.error.not_found":         "model id not found",
+    "tax.error.timeout":           "timeout (large tokenizer or slow connection)",
+    "tax.error.network":           "network error",
+    "tax.error.fetch_failed":      "fetch failed",
+    "tax.error.invalid_input":     "invalid input",
+    "tax.attribution":             "Tokenizers via",
+    "tax.attribution.privacy":     "Text is tokenized locally — never leaves the browser.",
+    "tax.firstload_note":          "💡 <strong>First-time load:</strong> the tool fetches transformers.js (~750 KB) + each tokenizer's vocab on demand (~5-15 MB per tokenizer, cached after). Subsequent runs are instant. All processing is local — your text never leaves the browser.",
+    "inv.v087.tax":                "<strong>🌍 Token Tax</strong> — real BPE encoding across 6 vendor tokenizers. Surfaces the silent cost asymmetry across languages (CJK / Arabic / mixed).",
+    "help.v087.tax.title":         "🌍 Multilingual Tokenizer Tax",
+    "help.v087.tax.body":          "Tokenizers tax non-English text asymmetrically. The same paragraph might be 100 tokens in English but 250+ in Chinese on a Latin-trained tokenizer (Llama, Phi). Both cost-per-request AND effective context degrade silently. This tool loads HuggingFace transformers.js in your browser (~750 KB CDN) and tokenizes pasted text against 6 preset vendor tokenizers (Qwen2.5, Phi-3.5, Llama-3.1, Gemma-2, GPT-4 cl100k, Claude approx). Output: per-tokenizer token count + chars-per-token + ratio vs baseline + cost-asymmetry interpretation. Auto-detects script blocks (Latin / CJK / Arabic / Cyrillic / Devanagari / Thai / Greek / Hebrew / Korean) so users see why one tokenizer is 3× another. <em>Use case</em>: 'My multilingual support added 30% to the bill — which language costs the most?' → paste real production text, see exact per-tokenizer breakdown.",
     "inv.v081.hub":                "<strong>🧭 Solutions Hub</strong> — every documented pain mapped to a tafagent mode or curated external tool. Don't reinvent — find.",
     "help.v081.hub.title":         "🧭 Solutions Hub",
     "help.v081.hub.body":          "tafagent as integrator, not silo. 30+ pains across 7 categories (eval reliability · diagnostics · setup · training · retrieval · multimodal · observability), each mapped to (a) the tafagent mode that addresses it, if any, and (b) the best-of-breed external tools the community already trusts (RAGAS, MTEB, HELM, MCP Schema Validator, llm-stats, llguidance, GlitchMiner, etc.). Search box matches across pain, scenario, and tool name. <em>Use case</em>: 'I have problem X — does tafagent solve it, and if not, who does?'",
     "help.v085.speculative.title": "🔬 Compatibilidad de Speculative-Decode",
     "help.v085.speculative.body":  "El speculative decoding solo funciona si target y draft comparten exactamente el mismo vocabulario. Vocabs mismatched hacen que cada token del draft sea rechazado — pagas AMBOS computes y obtienes peor throughput que baseline. Peor: el sistema sigue emitiendo output correcto (solo más lento), así que el bug es invisible en tests unitarios. vLLM #4570 / #16757 / #20409 / #12488 surfacen variantes. Esta tool hace fetch de `tokenizer.json` desde HF Hub para ambos ids, compara tipo de tokenizer, tamaño de vocab, mapa completo token→id, special tokens, y added tokens, luego estima una banda de speedup basada en ratio de params y tasas típicas α=0.5/0.7/0.85 de aceptación. <em>Caso de uso</em>: antes de lanzar un cluster vLLM con spec-dec habilitado, verifica que el par sea compatible.",
+    // v0.8.7 — anti-bullshit pack #13: Multilingual Tokenizer Tax
+    "modes.tax":                   "🌍 Token Tax",
+    "mode_desc.tax":               "BPE real (transformers.js en browser) sobre texto pegado a través de 6 tokenizers de vendor. Surface la asimetría de coste silenciosa entre idiomas.",
+    "tax.title":                   "🌍 Impuesto de Tokenizer Multilingüe",
+    "tax.tip":                     "Los tokenizers gravan el texto no-inglés de forma asimétrica. El mismo párrafo puede ser 100 tokens en inglés pero 250+ en chino en un tokenizer entrenado en Latin (Llama, Phi). Coste por request Y contexto efectivo degradan silenciosamente. Pega tu texto, ve token counts reales a través de tokenizers de vendor — sin estimación, BPE real vía transformers.js en tu navegador.",
+    "tax.desc":                    "<strong>No 3× tu factura en soporte chino.</strong> Pega cualquier texto → BPE real por-tokenizer a través de Qwen / Phi / Llama / Gemma / GPT-4 / Claude → ve la asimetría de coste vs tu baseline.",
+    "tax.input_label":             "Texto a tokenizar:",
+    "tax.input.placeholder":       "Pega cualquier texto — inglés, chino, árabe, código, …",
+    "tax.tokenize_btn":            "🔬 Tokenizar todos",
+    "tax.sample_en_btn":           "↳ Ejemplo: English",
+    "tax.sample_zh_btn":           "↳ Ejemplo: 中文",
+    "tax.sample_ar_btn":           "↳ Ejemplo: عربى",
+    "tax.sample_mixed_btn":        "↳ Ejemplo: mixto",
+    "tax.sample_code_btn":         "↳ Ejemplo: código",
+    "tax.status.loading":          "⏳ Cargando transformers.js + tokenizers (primera ejecución puede tardar 5-15s)…",
+    "tax.status.done":             "✅ {n}/{total} tokenizers en {ms}ms",
+    "tax.col.tokenizer":           "Tokenizer",
+    "tax.col.tokens":              "Tokens",
+    "tax.col.cpt":                 "Chars/tok",
+    "tax.col.ratio":                "Ratio",
+    "tax.summary.input":           "Entrada: {chars} caracteres, {bytes} bytes",
+    "tax.script_breakdown":        "scripts",
+    "tax.interp.worst":            "{label} cuesta {pct}% más tokens que baseline para este texto.",
+    "tax.interp.uniform":          "✓ Todos los tokenizers dentro de ±5% — texto bien manejado entre vendors.",
+    "tax.hint.empty":              "Pega texto y haz click en Tokenizar.",
+    "tax.all_failed":              "Todos los tokenizers fallaron.",
+    "tax.error.gated":             "modelo gated (auth HF requerida — prueba mirror open)",
+    "tax.error.not_found":         "model id no encontrado",
+    "tax.error.timeout":           "timeout (tokenizer grande o conexión lenta)",
+    "tax.error.network":           "error de red",
+    "tax.error.fetch_failed":      "fetch falló",
+    "tax.error.invalid_input":     "entrada inválida",
+    "tax.attribution":             "Tokenizers vía",
+    "tax.attribution.privacy":     "El texto se tokeniza localmente — nunca sale del navegador.",
+    "tax.firstload_note":          "💡 <strong>Primera carga:</strong> la tool descarga transformers.js (~750 KB) + el vocab de cada tokenizer bajo demanda (~5-15 MB por tokenizer, cacheados después). Ejecuciones siguientes son instantáneas. Todo el procesamiento es local — tu texto nunca sale del navegador.",
+    "inv.v087.tax":                "<strong>🌍 Token Tax</strong> — BPE real sobre 6 tokenizers de vendor. Surface la asimetría de coste silenciosa entre idiomas (CJK / árabe / mixto).",
+    "help.v087.tax.title":         "🌍 Impuesto de Tokenizer Multilingüe",
+    "help.v087.tax.body":          "Los tokenizers gravan el texto no-inglés de forma asimétrica. El mismo párrafo puede ser 100 tokens en inglés pero 250+ en chino en un tokenizer entrenado en Latin (Llama, Phi). Tanto coste-por-request COMO contexto efectivo degradan silenciosamente. Esta tool carga HuggingFace transformers.js en tu navegador (~750 KB CDN) y tokeniza el texto pegado contra 6 tokenizers preset de vendor (Qwen2.5, Phi-3.5, Llama-3.1, Gemma-2, GPT-4 cl100k, Claude aprox). Output: token count por tokenizer + chars-per-token + ratio vs baseline + interpretación de asimetría. Auto-detecta bloques de script (Latin / CJK / árabe / cirílico / devanagari / tailandés / griego / hebreo / coreano) para que veas por qué un tokenizer es 3× otro. <em>Caso de uso</em>: 'Mi soporte multilingüe añadió 30% a la factura — ¿qué idioma cuesta más?' → pega texto real de producción, ve breakdown exacto por tokenizer.",
     "inv.v081.hub":                "<strong>🧭 Solutions Hub</strong> — cada pain documentado mapeado a un mode tafagent o herramienta externa curada. No reinventes — encuentra.",
     "help.v081.hub.title":         "🧭 Solutions Hub",
     "help.v081.hub.body":          "tafagent como integrador, no silo. 30+ pains en 7 categorías (eval reliability · diagnósticos · setup · training · retrieval · multimodal · observability), cada uno mapeado a (a) el mode tafagent que lo resuelve, si existe, y (b) las herramientas externas best-of-breed que la comunidad ya usa (RAGAS, MTEB, HELM, MCP Schema Validator, llm-stats, llguidance, GlitchMiner, etc.). Caja de búsqueda matchea pain, scenario, y nombre de herramienta. <em>Caso de uso</em>: 'tengo problema X — ¿lo resuelve tafagent, y si no, quién?'",
     "help.v085.speculative.title": "🔬 Compatibilité Speculative-Decode",
     "help.v085.speculative.body":  "Le speculative decoding ne marche que si target et draft partagent exactement le même vocabulaire. Des vocabs mismatched font que chaque token du draft est rejeté — vous payez LES DEUX coûts de compute et obtenez un pire débit que la baseline. Pire : le système émet toujours une sortie correcte (juste plus lente), donc le bug est invisible aux tests unitaires. vLLM #4570 / #16757 / #20409 / #12488 surfent les variantes. Cet outil récupère `tokenizer.json` depuis HF Hub pour les deux model ids, compare le type de tokenizer, la taille du vocab, la map complète token→id, les special tokens, et les added tokens, puis estime une bande de speedup basée sur le ratio de params et les taux α=0.5/0.7/0.85 d'acceptation typiques. <em>Cas d'usage</em> : avant de lancer un cluster vLLM avec spec-dec activé, vérifiez que la paire est compatible.",
+    // v0.8.7 — anti-bullshit pack #13: Multilingual Tokenizer Tax
+    "modes.tax":                   "🌍 Token Tax",
+    "mode_desc.tax":               "Encodage BPE réel (côté navigateur via transformers.js) du texte collé sur 6 tokenizers de fournisseurs. Révèle l'asymétrie de coût silencieuse entre langues.",
+    "tax.title":                   "🌍 Taxe Tokenizer Multilingue",
+    "tax.tip":                     "Les tokenizers taxent le texte non-anglais de façon asymétrique. Le même paragraphe peut faire 100 tokens en anglais mais 250+ en chinois sur un tokenizer entraîné en Latin (Llama, Phi). Coût par requête ET contexte effectif dégradent silencieusement. Collez votre texte, voyez les vrais token counts à travers les tokenizers fournisseurs — pas d'estimation, BPE réel via transformers.js dans votre navigateur.",
+    "tax.desc":                    "<strong>Ne 3× pas votre facture sur le support chinois.</strong> Collez n'importe quel texte → encodage BPE réel par tokenizer (Qwen / Phi / Llama / Gemma / GPT-4 / Claude) → voyez l'asymétrie de coût vs votre baseline.",
+    "tax.input_label":             "Texte à tokenizer :",
+    "tax.input.placeholder":       "Collez n'importe quel texte — anglais, chinois, arabe, code, …",
+    "tax.tokenize_btn":            "🔬 Tokenizer tous",
+    "tax.sample_en_btn":           "↳ Exemple : English",
+    "tax.sample_zh_btn":           "↳ Exemple : 中文",
+    "tax.sample_ar_btn":           "↳ Exemple : عربى",
+    "tax.sample_mixed_btn":        "↳ Exemple : mixte",
+    "tax.sample_code_btn":         "↳ Exemple : code",
+    "tax.status.loading":          "⏳ Chargement transformers.js + tokenizers (la première exécution peut prendre 5-15s)…",
+    "tax.status.done":             "✅ {n}/{total} tokenizers en {ms}ms",
+    "tax.col.tokenizer":           "Tokenizer",
+    "tax.col.tokens":              "Tokens",
+    "tax.col.cpt":                 "Chars/tok",
+    "tax.col.ratio":                "Ratio",
+    "tax.summary.input":           "Entrée : {chars} caractères, {bytes} octets",
+    "tax.script_breakdown":        "scripts",
+    "tax.interp.worst":            "{label} coûte {pct}% de tokens en plus que la baseline pour ce texte.",
+    "tax.interp.uniform":          "✓ Tous les tokenizers à ±5% — texte bien géré par les fournisseurs.",
+    "tax.hint.empty":              "Collez du texte puis Tokenizer.",
+    "tax.all_failed":              "Tous les tokenizers ont échoué.",
+    "tax.error.gated":             "modèle gated (auth HF requise — essayez le mirror open)",
+    "tax.error.not_found":         "model id introuvable",
+    "tax.error.timeout":           "timeout (gros tokenizer ou connexion lente)",
+    "tax.error.network":           "erreur réseau",
+    "tax.error.fetch_failed":      "fetch échoué",
+    "tax.error.invalid_input":     "entrée invalide",
+    "tax.attribution":             "Tokenizers via",
+    "tax.attribution.privacy":     "Le texte est tokenizé localement — ne quitte jamais le navigateur.",
+    "tax.firstload_note":          "💡 <strong>Premier chargement :</strong> l'outil récupère transformers.js (~750 KB) + le vocab de chaque tokenizer à la demande (~5-15 MB par tokenizer, mis en cache après). Les exécutions suivantes sont instantanées. Tout le traitement est local — votre texte ne quitte jamais le navigateur.",
+    "inv.v087.tax":                "<strong>🌍 Token Tax</strong> — encodage BPE réel sur 6 tokenizers fournisseurs. Révèle l'asymétrie de coût silencieuse entre langues (CJK / arabe / mixte).",
+    "help.v087.tax.title":         "🌍 Taxe Tokenizer Multilingue",
+    "help.v087.tax.body":          "Les tokenizers taxent le texte non-anglais de façon asymétrique. Le même paragraphe peut faire 100 tokens en anglais mais 250+ en chinois sur un tokenizer entraîné en Latin (Llama, Phi). Coût-par-requête ET contexte effectif dégradent silencieusement. Cet outil charge HuggingFace transformers.js dans votre navigateur (~750 KB CDN) et tokenize le texte collé contre 6 tokenizers preset de fournisseurs (Qwen2.5, Phi-3.5, Llama-3.1, Gemma-2, GPT-4 cl100k, Claude approx). Sortie : token count par tokenizer + chars-per-token + ratio vs baseline + interprétation d'asymétrie. Auto-détecte les blocs de script (Latin / CJK / arabe / cyrillique / devanagari / thaï / grec / hébreu / coréen) pour voir pourquoi un tokenizer est 3× un autre. <em>Cas d'usage</em> : 'Mon support multilingue a ajouté 30% à la facture — quelle langue coûte le plus ?' → collez du texte de production réel, voyez le breakdown exact par tokenizer.",
     "inv.v081.hub":                "<strong>🧭 Solutions Hub</strong> — chaque pain documenté mappé à un mode tafagent ou outil externe curé. Ne réinventez pas — trouvez.",
     "help.v081.hub.title":         "🧭 Solutions Hub",
     "help.v081.hub.body":          "tafagent comme intégrateur, pas silo. 30+ pains à travers 7 catégories (eval reliability · diagnostics · setup · training · retrieval · multimodal · observability), chacun mappé à (a) le mode tafagent qui le résout, s'il existe, et (b) les outils externes best-of-breed que la communauté utilise déjà (RAGAS, MTEB, HELM, MCP Schema Validator, llm-stats, llguidance, GlitchMiner, etc.). La barre de recherche matche pain, scénario, et nom d'outil. <em>Cas d'usage</em> : 'j'ai le problème X — tafagent le résout-il, et sinon, qui ?'",
     "help.v085.speculative.title": "🔬 Speculative-Decode 兼容性",
     "help.v085.speculative.body":  "Speculative decoding 仅当 target 和 draft 共享完全相同的词汇表时才能工作。Vocab 不匹配导致每个 draft token 被拒绝——你支付双倍计算成本且吞吐量比 baseline 更差。更糟：系统仍输出正确（只是更慢），所以 bug 在单元测试中不可见。vLLM #4570 / #16757 / #20409 / #12488 都显示了变种。这个工具从 HF Hub 获取两个 model id 的 `tokenizer.json`，比较 tokenizer 类型、vocab 大小、完整 token→id 映射、special token 和 added token，然后基于参数比和典型 α=0.5/0.7/0.85 接受率估算 speedup 范围。<em>用例</em>：在启动启用了 spec-dec 的 vLLM 集群之前，验证这对模型是否真的兼容。",
+    // v0.8.7 — anti-bullshit pack #13: Multilingual Tokenizer Tax
+    "modes.tax":                   "🌍 Token Tax",
+    "mode_desc.tax":               "通过浏览器端 transformers.js 对粘贴文本进行 6 个供应商 tokenizer 的真实 BPE 编码。揭示语言间的静默成本不对称。",
+    "tax.title":                   "🌍 多语言 Tokenizer 税",
+    "tax.tip":                     "Tokenizer 对非英语文本的征税不对称。同一段落在英语中可能是 100 个 token，但在拉丁字母训练的 tokenizer（Llama、Phi）上的中文可能是 250+ 个 token。每次请求成本和有效上下文都会静默降级。粘贴你的文本，通过供应商 tokenizer 查看实际 token 数——没有估算，通过 transformers.js 在浏览器中真实 BPE 编码。",
+    "tax.desc":                    "<strong>不要因中文支持让账单 3 倍。</strong> 粘贴任意文本 → 通过 Qwen / Phi / Llama / Gemma / GPT-4 / Claude 的真实 BPE 编码 → 查看相对于 baseline 的成本不对称。",
+    "tax.input_label":             "要 tokenize 的文本：",
+    "tax.input.placeholder":       "粘贴任何文本——英语、中文、阿拉伯语、代码……",
+    "tax.tokenize_btn":            "🔬 Tokenize 全部",
+    "tax.sample_en_btn":           "↳ 示例：English",
+    "tax.sample_zh_btn":           "↳ 示例：中文",
+    "tax.sample_ar_btn":           "↳ 示例：عربى",
+    "tax.sample_mixed_btn":        "↳ 示例：混合",
+    "tax.sample_code_btn":         "↳ 示例：代码",
+    "tax.status.loading":          "⏳ 加载 transformers.js + tokenizer（首次运行可能需要 5-15 秒）…",
+    "tax.status.done":             "✅ {n}/{total} 个 tokenizer，用时 {ms}ms",
+    "tax.col.tokenizer":           "Tokenizer",
+    "tax.col.tokens":              "Token 数",
+    "tax.col.cpt":                 "字符/token",
+    "tax.col.ratio":                "比率",
+    "tax.summary.input":           "输入：{chars} 字符，{bytes} 字节",
+    "tax.script_breakdown":        "脚本",
+    "tax.interp.worst":            "{label} 对此文本的 token 数比 baseline 多 {pct}%。",
+    "tax.interp.uniform":          "✓ 所有 tokenizer 在 ±5% 范围内——文本在各供应商间处理良好。",
+    "tax.hint.empty":              "粘贴文本然后点击 Tokenize。",
+    "tax.all_failed":              "所有 tokenizer 都失败了。",
+    "tax.error.gated":             "模型受限（需要 HF auth——尝试 open mirror）",
+    "tax.error.not_found":         "找不到 model id",
+    "tax.error.timeout":           "超时（大 tokenizer 或慢速连接）",
+    "tax.error.network":           "网络错误",
+    "tax.error.fetch_failed":      "获取失败",
+    "tax.error.invalid_input":     "无效输入",
+    "tax.attribution":             "Tokenizer 通过",
+    "tax.attribution.privacy":     "文本在本地 tokenize——永远不会离开浏览器。",
+    "tax.firstload_note":          "💡 <strong>首次加载：</strong>工具按需获取 transformers.js（~750 KB）+ 每个 tokenizer 的词汇表（每个 ~5-15 MB，加载后缓存）。后续运行即时。所有处理都是本地的——你的文本永远不会离开浏览器。",
+    "inv.v087.tax":                "<strong>🌍 Token Tax</strong> — 6 个供应商 tokenizer 的真实 BPE 编码。揭示语言间（CJK / 阿拉伯语 / 混合）的静默成本不对称。",
+    "help.v087.tax.title":         "🌍 多语言 Tokenizer 税",
+    "help.v087.tax.body":          "Tokenizer 对非英语文本的征税不对称。同一段落在英语中可能是 100 个 token，但在拉丁字母训练的 tokenizer（Llama、Phi）上的中文可能是 250+ 个 token。每次请求成本和有效上下文都会静默降级。这个工具在你的浏览器中加载 HuggingFace transformers.js（~750 KB CDN），并对粘贴的文本运行 6 个预设供应商 tokenizer（Qwen2.5、Phi-3.5、Llama-3.1、Gemma-2、GPT-4 cl100k、Claude 近似）的 tokenize。输出：每个 tokenizer 的 token 数 + 字符/token + 相对于 baseline 的比率 + 成本不对称解读。自动检测脚本块（拉丁/CJK/阿拉伯/西里尔/天城/泰/希腊/希伯来/韩文）让你看到为什么一个 tokenizer 是另一个的 3 倍。<em>用例</em>：『我的多语言支持给账单加了 30%——哪种语言成本最高？』→ 粘贴真实生产文本，查看每个 tokenizer 的精确分解。",
     "inv.v081.hub":                "<strong>🧭 Solutions Hub</strong> — 每个文档化的问题都映射到一个 tafagent 模式或精选外部工具。别重复发明 — 去找。",
     "help.v081.hub.title":         "🧭 Solutions Hub",
     "help.v081.hub.body":          "tafagent 作为集成者而非孤岛。30+ 问题跨 7 类别（评估可靠性 · 诊断 · 设置 · 训练 · 检索 · 多模态 · 可观测性），每个映射到（a）解决它的 tafagent 模式（若存在），以及（b）社区已信任的最佳外部工具（RAGAS、MTEB、HELM、MCP Schema Validator、llm-stats、llguidance、GlitchMiner 等）。搜索框匹配 pain、场景和工具名称。<em>用例</em>：'我有问题 X — tafagent 解决它吗，如果不，谁解决？'",

js/main.js CHANGED Viewed

@@ -31,6 +31,10 @@ import { lintJsonCot, reorderJsonText, classifyFieldName } from "./json_cot_lint
 import { lintPeftCode, ARCH_TARGET_MODULES } from "./peft_anti_pattern.js";
 import { diffPromptCache, PROVIDERS as CACHE_PROVIDERS } from "./prompt_cache_diff.js";
 import { checkCompatibility as specCheckCompat, parseParamHint } from "./spec_decode_compat.js";
 // Attach HF Hub search-as-you-type to all 5 model id inputs (Profile, Recipe,
 // Unmask, Template, Quant). Hits public huggingface.co/api/models. Idempotent.
@@ -224,6 +228,7 @@ document.addEventListener("click", (e) => {
       peft: "peft-section",
       cache: "cache-section",
       speculative: "speculative-section",
       hub: "hub-section",
     }[targetMode];
     if (sectionId) {
@@ -249,7 +254,7 @@ document.querySelectorAll(".mode-btn").forEach(btn => {
      "diagnose-section", "phase-section", "unmask-section",
      "template-section", "arena-section", "contam-section",
      "quant-section", "drift-section", "niah-section",
-     "saturation-section", "cot-section", "peft-section", "cache-section", "speculative-section", "hub-section"].forEach(id => {
       const el = $(id);
       if (el) el.style.display = "none";
     });
@@ -265,6 +270,7 @@ document.querySelectorAll(".mode-btn").forEach(btn => {
       peft: "peft-section",
       cache: "cache-section",
       speculative: "speculative-section",
       hub: "hub-section",
     };
     const sectionId = sectionMap[mode];
@@ -276,6 +282,7 @@ document.querySelectorAll(".mode-btn").forEach(btn => {
     if (mode === "peft") initPeft();
     if (mode === "cache") initCacheDiff();
     if (mode === "speculative") initSpeculative();
     if (mode === "hub") initHub();
   });
 });
@@ -4248,6 +4255,161 @@ $("spec-example-bad-btn")?.addEventListener("click", () => {
 // (HF autocomplete on spec-target-id / spec-draft-id is registered via
 // the known-id list in hf_autocomplete.js; no extra wiring needed here.)
 // ════════════════��═══════════════════════════════════════════════════
 // Bootstrap
 // ════════════════════════════════════════════════════════════════════

 import { lintPeftCode, ARCH_TARGET_MODULES } from "./peft_anti_pattern.js";
 import { diffPromptCache, PROVIDERS as CACHE_PROVIDERS } from "./prompt_cache_diff.js";
 import { checkCompatibility as specCheckCompat, parseParamHint } from "./spec_decode_compat.js";
+import {
+  tokenizeAll, detectLanguageBlocks,
+  PRESET_TOKENIZERS as TAX_PRESETS, SAMPLE_TEXTS as TAX_SAMPLES,
+} from "./tokenizer_tax.js";
 // Attach HF Hub search-as-you-type to all 5 model id inputs (Profile, Recipe,
 // Unmask, Template, Quant). Hits public huggingface.co/api/models. Idempotent.
       peft: "peft-section",
       cache: "cache-section",
       speculative: "speculative-section",
+      tax: "tax-section",
       hub: "hub-section",
     }[targetMode];
     if (sectionId) {
      "diagnose-section", "phase-section", "unmask-section",
      "template-section", "arena-section", "contam-section",
      "quant-section", "drift-section", "niah-section",
+     "saturation-section", "cot-section", "peft-section", "cache-section", "speculative-section", "tax-section", "hub-section"].forEach(id => {
       const el = $(id);
       if (el) el.style.display = "none";
     });
       peft: "peft-section",
       cache: "cache-section",
       speculative: "speculative-section",
+      tax: "tax-section",
       hub: "hub-section",
     };
     const sectionId = sectionMap[mode];
     if (mode === "peft") initPeft();
     if (mode === "cache") initCacheDiff();
     if (mode === "speculative") initSpeculative();
+    if (mode === "tax") initTax();
     if (mode === "hub") initHub();
   });
 });
 // (HF autocomplete on spec-target-id / spec-draft-id is registered via
 // the known-id list in hf_autocomplete.js; no extra wiring needed here.)
+// ════════════════════════════════════════════════════════════════════
+// 🌍 Multilingual Tokenizer Tax (v0.8.7 anti-bullshit pack #13)
+// ════════════════════════════════════════════════════════════════════
+let __taxInited = false;
+function initTax() {
+  if (__taxInited) return;
+  __taxInited = true;
+  // No async preload — transformers.js + tokenizer.json are lazy-loaded
+  // on the first Tokenize click so users don't pay download cost just
+  // for opening the tab. Status string explains the wait.
+}
+function fmtBlocks(blocks) {
+  // Build a compact "60% latin · 35% cjk · 5% other" string from the
+  // detector output. Drops zero-counts and orders by descending size.
+  if (!blocks || !blocks.blocks || !blocks.total_chars) return "";
+  const total = blocks.total_chars;
+  const entries = Object.entries(blocks.blocks)
+    .filter(([, n]) => n > 0)
+    .sort((a, b) => b[1] - a[1]);
+  if (entries.length === 0) return "";
+  const parts = entries.map(([name, n]) => {
+    const pct = Math.round((n / total) * 100);
+    return `${pct}% ${name}`;
+  });
+  return parts.join(" · ");
+}
+function renderTaxResult(res, presetMeta) {
+  if (res.code === "empty_input") {
+    return `<div class="arena-result"><p>${t("tax.hint.empty") || "Paste some text and click Tokenize."}</p></div>`;
+  }
+  if (res.code === "all_failed") {
+    const errLines = res.results.map(r => {
+      const meta = presetMeta.find(p => p.id === r.modelId);
+      return `<li><code>${escapeHtml(r.modelId)}</code> ${meta ? `<span class="subtle">(${escapeHtml(meta.label)})</span>` : ""}: ${t(`tax.error.${r.error}`) || r.error}</li>`;
+    }).join("");
+    return `<div class="arena-result"><p style="color:#f85149;"><strong>❌ ${t("tax.all_failed") || "All tokenizers failed to load."}</strong></p><ul>${errLines}</ul></div>`;
+  }
+  const baselineCount = res.baseline_count;
+  const blocks = detectLanguageBlocks($("tax-input").value);
+  const ratioColor = (r) => {
+    if (r == null) return "#8b949e";
+    if (r >= 1.5)  return "#f85149";          // big tax — red
+    if (r >= 1.15) return "#f0883e";          // moderate
+    if (r >= 0.85) return "#3fb950";          // about same
+    return "#58a6ff";                         // BETTER than baseline (rare)
+  };
+  const fmtRatio = (r) => r == null ? "—" : `${r.toFixed(2)}×`;
+  const rows = res.results.map(r => {
+    const meta = presetMeta.find(p => p.id === r.modelId) || { label: r.modelId, family: "" };
+    if (!r.ok) {
+      return `<tr style="opacity:0.5;">
+        <td><strong>${escapeHtml(meta.label)}</strong><br><span class="subtle" style="font-size:0.8em;">${escapeHtml(meta.family)}</span></td>
+        <td colspan="3" style="color:#f0883e;">${t(`tax.error.${r.error}`) || r.error}</td>
+      </tr>`;
+    }
+    const isBaseline = r.modelId === res.baseline_id;
+    const baselineMark = isBaseline ? `<span class="subtle" style="font-size:0.8em;"> (baseline)</span>` : "";
+    return `<tr ${isBaseline ? 'style="background:#1f2933;"' : ""}>
+      <td><strong>${escapeHtml(meta.label)}</strong>${baselineMark}<br><span class="subtle" style="font-size:0.8em;">${escapeHtml(meta.family)}</span></td>
+      <td style="text-align:right;font-family:monospace;"><strong>${r.token_count.toLocaleString()}</strong></td>
+      <td style="text-align:right;font-family:monospace;">${r.chars_per_token != null ? r.chars_per_token.toFixed(2) : "—"}</td>
+      <td style="text-align:right;font-family:monospace;color:${ratioColor(r.ratio_vs_baseline)};"><strong>${fmtRatio(r.ratio_vs_baseline)}</strong></td>
+    </tr>`;
+  }).join("");
+  // Worst-tax explanation — find the tokenizer that scored ≥1.5× baseline.
+  const worst = res.results
+    .filter(r => r.ok && r.ratio_vs_baseline != null)
+    .sort((a, b) => b.ratio_vs_baseline - a.ratio_vs_baseline)[0];
+  let interpretation = "";
+  if (worst && worst.ratio_vs_baseline >= 1.3) {
+    const meta = presetMeta.find(p => p.id === worst.modelId);
+    const pct = Math.round((worst.ratio_vs_baseline - 1) * 100);
+    interpretation = `<p style="color:#f0883e;margin-top:0.5em;">⚠ <strong>${tFmt("tax.interp.worst", {
+      label: meta?.label || worst.modelId,
+      pct,
+    }) || `${meta?.label || worst.modelId} costs ${pct}% more tokens than baseline for this text.`}</strong></p>`;
+  } else if (worst && worst.ratio_vs_baseline <= 1.05) {
+    interpretation = `<p style="color:#3fb950;margin-top:0.5em;">${t("tax.interp.uniform") || "✓ All tokenizers within ±5% — text is well-handled across vendors."}</p>`;
+  }
+  return `<div class="arena-result">
+    <p>
+      <strong>${tFmt("tax.summary.input", { chars: res.chars.toLocaleString(), bytes: res.bytes.toLocaleString() }) || `Input: ${res.chars.toLocaleString()} chars, ${res.bytes.toLocaleString()} bytes`}</strong>
+      ${blocks.dominant ? `<span class="subtle"> · ${t("tax.script_breakdown") || "scripts"}: ${fmtBlocks(blocks)}</span>` : ""}
+    </p>
+    ${interpretation}
+    <table class="lean-table" style="margin-top:0.5em;width:100%;">
+      <thead><tr>
+        <th style="text-align:left;">${t("tax.col.tokenizer") || "Tokenizer"}</th>
+        <th style="text-align:right;">${t("tax.col.tokens") || "Tokens"}</th>
+        <th style="text-align:right;">${t("tax.col.cpt") || "Chars/tok"}</th>
+        <th style="text-align:right;">${t("tax.col.ratio") || "Ratio"}</th>
+      </tr></thead>
+      <tbody>${rows}</tbody>
+    </table>
+    <p class="recipe-desc subtle" style="font-size:0.82em;margin-top:1em;">
+      ${t("tax.attribution") || "Tokenizers via"}
+      <a href="https://github.com/huggingface/transformers.js" target="_blank" rel="noopener noreferrer">@huggingface/transformers</a>
+      (browser BPE runtime).
+      ${t("tax.attribution.privacy") || "Text is tokenized locally — never leaves the browser."}
+    </p>
+  </div>`;
+}
+async function runTaxTokenize() {
+  const text = $("tax-input")?.value || "";
+  if (!text) {
+    $("tax-status").textContent = t("tax.hint.empty") || "⚠ Paste some text first.";
+    return;
+  }
+  $("tax-status").textContent = t("tax.status.loading") || "⏳ Loading transformers.js + tokenizers (first run can take 5-15s)…";
+  $("tax-output").innerHTML = "";
+  const ids = TAX_PRESETS.map(p => p.id);
+  try {
+    const t0 = Date.now();
+    const res = await tokenizeAll(ids, text);
+    const ms = Date.now() - t0;
+    $("tax-output").innerHTML = renderTaxResult(res, TAX_PRESETS);
+    const okN = res.results.filter(r => r.ok).length;
+    $("tax-status").textContent = tFmt("tax.status.done", {
+      n: okN, total: ids.length, ms,
+    }) || `✅ ${okN}/${ids.length} tokenizers ran in ${ms}ms`;
+  } catch (e) {
+    $("tax-status").textContent = `❌ ${e.message || e}`;
+  }
+}
+$("tax-tokenize-btn")?.addEventListener("click", runTaxTokenize);
+$("tax-sample-en-btn")?.addEventListener("click", () => {
+  $("tax-input").value = TAX_SAMPLES.english;
+  runTaxTokenize();
+});
+$("tax-sample-zh-btn")?.addEventListener("click", () => {
+  $("tax-input").value = TAX_SAMPLES.chinese;
+  runTaxTokenize();
+});
+$("tax-sample-ar-btn")?.addEventListener("click", () => {
+  $("tax-input").value = TAX_SAMPLES.arabic;
+  runTaxTokenize();
+});
+$("tax-sample-mixed-btn")?.addEventListener("click", () => {
+  $("tax-input").value = TAX_SAMPLES.mixed;
+  runTaxTokenize();
+});
+$("tax-sample-code-btn")?.addEventListener("click", () => {
+  $("tax-input").value = TAX_SAMPLES.code;
+  runTaxTokenize();
+});
 // ════════════════��═══════════════════════════════════════════════════
 // Bootstrap
 // ════════════════════════════════════════════════════════════════════

js/tokenizer_tax.js ADDED Viewed

	@@ -0,0 +1,221 @@

+// Multilingual Tokenizer Tax Calculator (v0.8.7 anti-bullshit pack #13)
+//
+// Pain: "I bought 1M tokens of API credit for our English chatbot. Then
+// we added Chinese support and the bill 3x'd overnight." The tokenizer
+// tax is real and silently asymmetric across languages. tiktokenizer.
+// vercel.app shows OpenAI's tokenizer; nothing public compares Llama vs
+// Qwen vs Phi vs Gemma vs GPT for the SAME text in the SAME interface.
+//
+// This module loads HuggingFace's transformers.js (browser-side BPE
+// runtime) lazily and tokenizes user-pasted text against a preset list
+// of open-weight tokenizers. The output is REAL per-tokenizer token
+// counts plus the cost asymmetry ratio (vs the user's chosen baseline).
+//
+// Pure logic + lazy CDN import. Codes/params only; main.js renders i18n.
+// =============================================================================
+// transformers.js lazy loader
+// =============================================================================
+//
+// Pinned 3.x major because the API surface (AutoTokenizer.from_pretrained,
+// .encode) is stable. Loaded from jsdelivr CDN — same pattern used
+// across HF Spaces. ~3 MB compressed bundle, cached aggressively after
+// first load.
+const TRANSFORMERS_CDN_URL = "https://cdn.jsdelivr.net/npm/@huggingface/transformers@3.0.2/dist/transformers.min.js";
+let _autoTokenizer = null;
+let _loadPromise = null;
+async function loadTransformersJs() {
+  if (_autoTokenizer) return _autoTokenizer;
+  if (_loadPromise) return _loadPromise;
+  _loadPromise = (async () => {
+    const mod = await import(TRANSFORMERS_CDN_URL);
+    _autoTokenizer = mod.AutoTokenizer;
+    return _autoTokenizer;
+  })();
+  return _loadPromise;
+}
+// =============================================================================
+// Per-tokenizer cache (avoid re-downloading tokenizer.json on every encode)
+// =============================================================================
+const _tokenizerCache = new Map();
+async function loadTokenizer(modelId) {
+  if (_tokenizerCache.has(modelId)) return _tokenizerCache.get(modelId);
+  const AT = await loadTransformersJs();
+  const tok = await AT.from_pretrained(modelId);
+  _tokenizerCache.set(modelId, tok);
+  return tok;
+}
+// =============================================================================
+// Public: tokenize one model
+// =============================================================================
+export async function tokenizeWithModel(modelId, text) {
+  if (typeof text !== "string") {
+    return { ok: false, modelId, error: "invalid_input" };
+  }
+  try {
+    const tok = await loadTokenizer(modelId);
+    // transformers.js returns Int32Array | number[]. Use .length for count.
+    const ids = await tok.encode(text);
+    return { ok: true, modelId, token_count: ids.length };
+  } catch (e) {
+    return {
+      ok: false,
+      modelId,
+      error: classifyTokenizerError(e),
+      raw: String(e?.message || e).slice(0, 200),
+    };
+  }
+}
+function classifyTokenizerError(e) {
+  const msg = String(e?.message || e).toLowerCase();
+  if (msg.includes("401") || msg.includes("403") || msg.includes("gated")) return "gated";
+  if (msg.includes("404") || msg.includes("not found")) return "not_found";
+  if (msg.includes("timeout") || msg.includes("aborted")) return "timeout";
+  if (msg.includes("network") || msg.includes("failed to fetch")) return "network";
+  return "fetch_failed";
+}
+// =============================================================================
+// Public: tokenize many models in parallel + compute ratios
+// =============================================================================
+export async function tokenizeAll(modelIds, text, baseline_idx = 0) {
+  if (!Array.isArray(modelIds) || modelIds.length === 0 || typeof text !== "string") {
+    return { code: "empty_input", results: [], baseline: null };
+  }
+  const results = await Promise.all(
+    modelIds.map(id => tokenizeWithModel(id, text))
+  );
+  const okResults = results.filter(r => r.ok);
+  if (okResults.length === 0) {
+    return { code: "all_failed", results, baseline: null };
+  }
+  // Baseline: first OK tokenizer, or the user-specified index if it's OK.
+  let baseline = okResults[0];
+  if (baseline_idx >= 0 && baseline_idx < results.length && results[baseline_idx].ok) {
+    baseline = results[baseline_idx];
+  }
+  // Stamp ratio vs baseline + chars-per-token for each.
+  const charCount = text.length;
+  const byteCount = new TextEncoder().encode(text).length;
+  for (const r of results) {
+    if (!r.ok) continue;
+    r.chars_per_token = r.token_count > 0 ? charCount / r.token_count : null;
+    r.bytes_per_token = r.token_count > 0 ? byteCount / r.token_count : null;
+    r.ratio_vs_baseline = baseline.token_count > 0
+      ? r.token_count / baseline.token_count
+      : null;
+  }
+  return {
+    code: "ok",
+    results,
+    baseline_id: baseline.modelId,
+    baseline_count: baseline.token_count,
+    chars: charCount,
+    bytes: byteCount,
+  };
+}
+// =============================================================================
+// Language detection — Unicode block analysis (no external deps)
+// =============================================================================
+//
+// Surfaced as context next to the token counts so users see "this text
+// is 60% CJK, 40% Latin" — explains why one tokenizer is 3× another.
+const UNICODE_BLOCKS = [
+  // [name, regex_class]
+  ["latin",      /[A-z]/g],
+  ["cjk",        /[぀-ゟ゠-ヿ一-鿿ｦ-ﾝ]/g],
+  ["korean",     /[가-힯ᄀ-ᇿ]/g],
+  ["arabic",     /[؀-ۿݐ-ݿ]/g],
+  ["cyrillic",   /[Ѐ-ӿ]/g],
+  ["devanagari", /[ऀ-ॿ]/g],
+  ["thai",       /[฀-๿]/g],
+  ["greek",      /[Ͱ-Ͽ]/g],
+  ["hebrew",     /[֐-׿]/g],
+];
+export function detectLanguageBlocks(text) {
+  if (typeof text !== "string" || !text) {
+    return { total_chars: 0, blocks: {}, dominant: null };
+  }
+  const blocks = {};
+  for (const [name, re] of UNICODE_BLOCKS) {
+    re.lastIndex = 0;
+    const m = text.match(re);
+    blocks[name] = m ? m.length : 0;
+  }
+  const total = text.length;
+  const dominant = Object.entries(blocks)
+    .filter(([, n]) => n > 0)
+    .sort((a, b) => b[1] - a[1])[0]?.[0] || null;
+  return { total_chars: total, blocks, dominant };
+}
+// =============================================================================
+// Preset tokenizer list — all open-weight (no HF auth required)
+// =============================================================================
+//
+// Curated for breadth: one per major tokenizer family. For gated
+// originals (Llama, Mistral, Gemma) the unsloth open-mirror is used —
+// tokenizer.json is byte-identical to the original because quantization
+// touches weights, not tokens (see spec-decode docs for the same
+// argument).
+export const PRESET_TOKENIZERS = [
+  {
+    id: "Qwen/Qwen2.5-7B-Instruct",
+    label: "Qwen2.5",
+    family: "Qwen-BPE (152k vocab, CJK-aware)",
+  },
+  {
+    id: "microsoft/Phi-3.5-mini-instruct",
+    label: "Phi-3.5",
+    family: "tiktoken-style BPE (32k)",
+  },
+  {
+    id: "unsloth/Meta-Llama-3.1-8B-Instruct",
+    label: "Llama-3.1",
+    family: "Llama-3 BPE (128k)",
+  },
+  {
+    id: "unsloth/gemma-2-9b-it",
+    label: "Gemma-2",
+    family: "SentencePiece (256k)",
+  },
+  {
+    id: "Xenova/gpt-4",
+    label: "GPT-4 (cl100k)",
+    family: "OpenAI tiktoken cl100k_base",
+  },
+  {
+    id: "Xenova/claude-tokenizer",
+    label: "Claude (approx)",
+    family: "Anthropic open approx (community port)",
+  },
+];
+// Sample texts that demonstrate cost asymmetry — identical meaning
+// across languages so the user sees per-language tax directly.
+export const SAMPLE_TEXTS = {
+  english: "The quick brown fox jumps over the lazy dog. " +
+    "She sells seashells by the seashore. Pack my box with five dozen liquor jugs.",
+  chinese: "敏捷的棕色狐狸跳过了懒狗。她在海边卖海贝壳。请用五打酒壶装满我的箱子。" +
+    "中文用字符表示词义,所以一段文字所需的字符数远少于英文。",
+  arabic: "الثعلب البني السريع يقفز فوق الكلب الكسول. " +
+    "تبيع أصدافًا بحرية على شاطئ البحر. عبئ صندوقي بخمسين إبريقًا من الخمر.",
+  mixed: "Hello world! 你好世界 مرحبا بالعالم Привет мир नमस्ते दुनिया",
+  code: "def quick_brown_fox(jumps_over: int) -> str:\n" +
+    "    return f'The fox jumped {jumps_over} times'\n\n" +
+    "for i in range(10):\n    print(quick_brown_fox(i))",
+};