Spaces:

mukunda1729
/

token-counter

Sleeping

App Files Files Community

mukunda1729 commited on 25 days ago

Commit

66448b4

verified ·

1 Parent(s): 8c4373e

Initial: token counter across model families

Browse files

Files changed (3) hide show

README.md +27 -5
app.py +85 -0
requirements.txt +3 -0

README.md CHANGED Viewed

@@ -1,12 +1,34 @@
 ---
 title: Token Counter
-emoji: 🐠
-colorFrom: green
-colorTo: blue
 sdk: gradio
-sdk_version: 6.13.0
 app_file: app.py
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
 title: Token Counter
+emoji: 🔢
+colorFrom: yellow
+colorTo: gray
 sdk: gradio
+sdk_version: "5.49.1"
+python_version: "3.12"
 app_file: app.py
 pinned: false
+license: mit
+short_description: "Count tokens across Claude, GPT, Llama tokenizers."
+tags:
+  - tokenization
+  - llm
+  - context-window
+  - agentfit
 ---
+# Token Counter
+Paste any text and see how it tokenizes across Claude, GPT, and other model families. Powered by [`agentfit`](https://pypi.org/project/agentfit-py/).
+## Why?
+- Different tokenizers split the same string very differently — Claude often uses ~half the tokens GPT does for the same Chinese / emoji input.
+- Useful when budgeting prompts and deciding which model to use for non-English content.
+- Sanity-check your own token counter against a reference.
+## Related
+- [`agentfit` on PyPI](https://pypi.org/project/agentfit-py/)
+- [The Agent Reliability Stack](https://mukundakatta.github.io/agent-stack/)
+- Companion dataset: [`token-counting-edge-cases`](https://huggingface.co/datasets/mukunda1729/token-counting-edge-cases)

app.py ADDED Viewed

	@@ -0,0 +1,85 @@

+"""Token counter — count tokens for text across multiple model families.
+Uses agentfit's pluggable counter. Falls back to char-based estimates when
+exact tokenizers aren't available.
+"""
+import gradio as gr
+from agentfit import count
+MODELS = [
+    "claude-sonnet-4-6",
+    "claude-haiku-4-5",
+    "gpt-5",
+    "gpt-4.1",
+    "default",
+]
+def count_tokens(text: str, comparison: str):
+    """Count tokens for the given text across selected models."""
+    if not text.strip():
+        return "_Enter some text to count tokens._"
+    models = [m.strip() for m in comparison.split(",") if m.strip()] if comparison else MODELS
+    rows = ["| Model | Tokens | Chars/token |", "|---|---:|---:|"]
+    char_count = len(text)
+    for m in models:
+        try:
+            n = count([{"role": "user", "content": text}], model=m)
+            ratio = f"{char_count / n:.2f}" if n else "—"
+            rows.append(f"| `{m}` | {n} | {ratio} |")
+        except Exception as e:
+            rows.append(f"| `{m}` | — | error: {e} |")
+    return "\n".join(rows) + f"\n\n**Input:** {char_count} chars · {len(text.split())} words"
+with gr.Blocks(title="Token Counter — across model families", theme=gr.themes.Soft()) as demo:
+    gr.Markdown(
+        """
+        # Token Counter
+        Paste any text and see how it tokenizes across Claude, GPT, and other model families.
+        Powered by [`agentfit`](https://pypi.org/project/agentfit-py/) — pure Python, no API calls.
+        💡 Useful for: budgeting context windows, comparing tokenizer efficiency for non-English text, sanity-checking your own counter.
+        """
+    )
+    with gr.Row():
+        with gr.Column():
+            txt = gr.Textbox(
+                value="The quick brown fox jumps over the lazy dog.",
+                label="Text",
+                lines=10,
+                placeholder="Paste text here...",
+            )
+            models_in = gr.Textbox(
+                value=", ".join(MODELS),
+                label="Models (comma-separated)",
+            )
+            btn = gr.Button("Count", variant="primary")
+        out = gr.Markdown()
+    btn.click(count_tokens, inputs=[txt, models_in], outputs=out)
+    gr.Examples(
+        examples=[
+            ["Hello, world!", "claude-sonnet-4-6, gpt-5, default"],
+            ["你好世界 こんにちは 안녕하세요", "claude-sonnet-4-6, gpt-5, default"],
+            ["🚀🎉🤖✨", "claude-sonnet-4-6, gpt-5, default"],
+            ["function add(a, b) {\n  return a + b;\n}", "claude-sonnet-4-6, gpt-5"],
+        ],
+        inputs=[txt, models_in],
+    )
+    gr.Markdown(
+        """
+        ---
+        Part of [The Agent Reliability Stack](https://mukundakatta.github.io/agent-stack/) · MIT licensed
+        """
+    )
+if __name__ == "__main__":
+    demo.launch(server_name="0.0.0.0", server_port=7860)

requirements.txt ADDED Viewed

	@@ -0,0 +1,3 @@

+gradio==5.49.1
+huggingface_hub>=0.30,<1.0
+agentfit-py>=0.1.0